I’ve become fascinated with the idea of amassing personal data to conduct some basic self-analysis (a.k.a. self-PRISMing). We can learn a lot about our habits, and it is inevitable that some time in the near future people will routinely gather all kinds of data on themselves (many tech companies already do this, not to mention the NSA). Since I started experimenting with Python yesterday it seemed like a good idea to abandon my never-ending university workload and write up a small Python program that could analyse my email behaviour.

Fast-forward a few hours and I’ve got a working program that can access any email account via IMAP, retrieve all messages and provide a new perspective on one’s life that may not have otherwise been obvious. It’s currently very basic, but delivers some initial food for thought.

As a side note, all of the code is available on GitHub (just type git clone https://github.com/jaijuneja/email-analytics.git into Terminal/Command Prompt or download it from the link below).

Download Code

So, has this program provided me with enlightenment and newfound self-awareness? Not quite, but it has affirmed some of my suspicions. The first thing that is clear from my university email account is my non-existent/nocturnal sleep cycle. It appears that I regularly send mail between midnight and 6am, as do some of my peers (especially during term time). “I should probably go to bed” is not a thought I’m particularly receptive to…

However, this past term (October-December 2013) was perhaps the first where I successfully maintained a regular human sleep cycle for the most part – and it actually shows from the graph!

Upon deeper inspection we find that the probability distribution for my sent mail is essentially bimodal, with one peak in the late afternoon/evening and another around 1-2am.

What is also evident is the sharp drop in incoming mail during the summer. This is further emphasised in the plot below: you can see the termly cycles of email traffic, with distinct troughs during holiday seasons. Note that I can only see received mail from up to 200 days ago because Oxford’s MS Exchange server was giving me problems.

Venturing into my Gmail account (below) provided some odd revelations. I noticed that I had a massive surge in outgoing emails in my Gmail account around February 2009. It didn’t strike me as a particularly busy time, so I went back to check and found that my account had been hacked and was sending out masses of raunchy spam mail. Particular gems include “your partner will be amazed at your manhood” and “like a steel rod”.

It also turns out that I sent myself almost 50 emails relating to my GCSE Geography coursework back in May 2007 – weird.

A look at the most common words used in the subject field reveals that, ironically, my university account primarily serves my extracurricular interests (web design, Bang! Magazine, editing etc.), while my personal account is used for work and internship applications. Gone are the days of using email as a means of casual social interaction.

Perhaps if I have a procrastination relapse I’ll extend the program to include more advanced features and a web frontend. We’ve only just scratched the surface of what can be extracted and inferred from our email messages. What they can tell us about our lifestyle, values, preferences, and even insecurities, is information that would no doubt have advertisers licking their lips. But perhaps less appreciated is what we ourselves can learn from this external, quantitative and very objective perspective on our (cyber-)life.