MoodBot: A Long Term Mood Tracking Project

By Alex Beal

January 19, 2018

1 Introduction

In August of 2016 I started keeping close track of my mood. I used a technique called experience sampling that I had first heard about in Thinking, Fast and Slow. The essence of this technique is to capture data about an experience as the experience is happening rather than reflecting on it after the fact. The main difference between experience sampling and retrospection is that experience sampling does not rely on memory. If I want to answer the question “How was my day?” I don’t need to think about my day, recalling the good parts and the bad parts and weighing them, I only need to look at my day’s mood curve, and perhaps take the average of its points. This strategy has some important benefits over the retrospective method.

The first benefit is methodological: Experience sampling gives a more accurate picture of how I felt in the moment. I can hardly remember what I had for lunch yesterday, so why should I trust my ability to accurately reflect on my day as a whole? Even worse, these errors in memory aren’t random. Many are systematic biases. One example is duration neglect where the duration of an event is ignored when making a retrospective judgment. In other words, if I judge a painful event based purely off memory, I will rate 5 minutes of pain to be the same as 10 minutes of the same pain.

The second reason is that I’m a data nerd and, I find this type of navel-gazing fascinating. Being able to answer questions like “Was I more or less happy before I left my job?” is terribly exciting.

Finally, I can leverage the data when making decisions. There’s reason to believe that we’re bad at remembering what makes us happy, so if we’re trying to optimize for happiness, this is useful data to have.

Without further ado, I present to you over one and a half years of mood data, totaling over 2,400 samples:

(Top) Mood samples are aggregated per day and the mean value is plotted. Mood is on a scale of 1 to 5, where higher is better. (Bottom) The number of samples captured on that day.

2 Capturing the Data

The majority of this data was captured over SMS. I periodically received a text message asking me “On a scale 1-5, where 1 is negative and 5 is positive, how are you feeling?” and “What are you doing?”. I answered these questions by responding to the message. A common response would be something like “3 work, programming” if my mood was about average and I was programming at work. The idea was to capture my sense of well being in a general way (which I refer to as ‘mood’), and also any context that could be affecting it. Mood is of course more than one dimensional, but I didn’t want to burden myself with too many questions, so this seemed like a good place to start. A five point scale was used because I didn’t believe I could answer consistently on a larger scale, and since I wanted to capture this data over a long period of time, consistency is key. In the end, the five points map to (1) Very negative (2) Negative (3) Neutral (4) Positive and (5) Very positive. In practice, I found that it was easy to place my mood at one of these points.

What a typical MoodBot exchange would look like. Toward the end, I started to get lazy and skipped answering the second question to save time. My new sampling system addresses this. See Next Steps.

2.1 Sampling

Sampling followed a Bernoulli process. Every five minutes, the service flipped a weighted coin and sent a query if the coin was heads. I chose a Bernoulli process because individual events are independent. No matter how many samples have already occurred, the probability that the next period will trigger a sample is the same. This is desirable because it made sampling harder to game. Contrast this with a method that, for example, always samples 5 times a day. Once I receive all 5 samples, I know I will not be sampled for the rest of the day and this knowledge might influence my behavior. This is less of a problem for mood sampling, because it’s not clear I can or would want to game my mood data, but it’s something to keep in mind if you decide to set up your own experience sampling project.

2.2 Technical Details

The process that flipped a coin every 5 minutes and received responses was an AWS Lambda service. This is where the project got its name, as I called this service MoodBot. It used the Twilio API to send and receive text messages, and stored responses to S3 as text files. Lambda removed the need to leave a server running, saving quite a bit of money. In fact, the Lambda bill was essentially zero making the primary cost Twilio, which averaged around $5 per month.

3 Analysis

When I started this project I had a few questions in mind: First, what contexts affect my mood? Are there certain activities that tend to boost my mood or dampen my mood? If so, perhaps I could maximize the positive activities and minimize the negative.

Second, how are my moods distributed? Do I spend most of my time in a state of despair, or do I have a generally positive outlook? In some ways, this is a rather frightening question to ask. What if it turned out I had a generally dour demeanor? And what if baseline demeanor is a stable trait that can’t easily be changed? I might discover that I’m doomed to a depressive outlook forever. No matter, I’m not the type of person to turn away information, even when that news might be bad news.

Finally, I was also curious to see how major life events impacted my mood. For example, had I been keeping data at the time, would it have been possible to see a major difference in mood between when I was in college and when I entered industry? If so, perhaps this data could help me with decisions about my career and other significant changes. I would have a quantitative way of comparing different periods in my life. Further, it raised philosophical questions about the hedonic treadmill. Perhaps it would turn out that there is no difference and my baseline mood isn’t responsive circumstance.

3.1 Mood Distribution

I’ll start with the easy question. What is my average mood?

Mean 2.99 Std Dev 0.85 p25 3 p50 3 p75 4

Happily, this distribution looks good. I don’t appear too neurotic, whatever my friends may tell you. My average is about neutral (mean = 2.99) and I spend the majority of my time there, within 1 point of neutral (standard deviation = 0.85). The percentiles tell an even more positive story. I’m slightly biased toward happiness, with the middle 50% of the distribution lying between 3 and 4 (p25 = 3, p75=4). So although life knocks me around a bit, I spend most of my time neutral to happy, and am rarely completely depressed or completely overjoyed.

The histogram below tells a similar story. 1 and 5 are rare. I mostly occupy 2 through 4 with a bias toward 4. Unfortunately, it looks like at the extremes, I’m more biased to negative moods than positive moods. Indeed, very negative moods account for around 5% of samples.

Histogram of moods across all data. Y axis is normalized to 1.

Thankfully I do not have to confront the question of a perpetually negative mood. I live life on a mostly even keel. That said, the proportion of 1s is disappointing. At 5%, that’s around 18 days out of the year! Needless to say, I want those 18 days back. Thus it is still worth wondering if I can further improve my mood. Perhaps I can reduce the 1s a bit, and tip the scale in favor of a few more 5s. In Life Events I present evidence that suggests change is possible.

3.2 Cyclical Features

Rather than viewing the samples as a time series, an alternative is to look at it cyclically. I can use this view to show what the average day or week looks like. To see how the average day looks, I gather up all the samples taken during 8 AM for any day and average them. So a sample taken on January 1st at 8:15 AM will be averaged with a sample taken on January 3rd at 8:45 AM. Then I do the same for 9 AM and so on until I have a 24 hour time series. I’ve produced that graph below.

Samples are bucketed into hours (irrespective of day) and averaged. This is what an average day looks like.

So the average day is about what you’d expect. During working hours, I stay close to neutral with a slight bump in mood around 5 PM when I’d leave the office and come home. After dinner my mood dips as I begin to worry about the next day (or the previous one). It’s no surprise to me that the data shows that I’m most emotionally vulnerable in the late evening.

The average week also shows a reasonable pattern.

Samples are bucketed into days (irrespective of the date) and averaged. This is what an average week looks like.

The work week proceeds at a steady 3, until Saturday when there’s a noticeable weekend uptick. Sundays looks slightly less enjoyable, and this makes sense. Sunday is when I’d get a lot of chores and housework done, not to mention the pre-Monday dread.

3.3 Life Events

Were there any significant life events during the data collection period? Yes, I left my job at a big tech company to take sabbatical (which is still ongoing) . Look again at the time series and see if you can spot it:

In early September my mood exhibits a distinct shift upward, going below three much less often. That was around the time I announced my resignation, and November 3rd is when I had my last day. It’s interesting how much simply announcing my resignation affected my mood despite the fact that I didn’t stop working until 8 weeks later. My hypothesis is that just thinking about all the great stuff I was going to do on sabbatical was enough to give everything a positive glow. Further, I believe my stress level decreased knowing that I was on the way out.

Given this distinct shift, I repeated many of the previous analyses, but split the data on September 3rd. In the histogram and distribution summaries below, there’s a dramatic shift away from 1s and 2s and toward 3s and 4s. My average mood increases by nearly half a point and the 25th percentile moves from 2 to 3.

Further, the proportion of 1s is reduced from around 6% to 1%. Per year, that a reduction from 22 days to 4 days! I’ll count that as a success. 2s show an even more dramatic shift from above 20% to around 7%. Remember, each percent is around 3.5 days per year. Incredible!

Distribution summary data split between pre and post resignation periods. Pre-resignation Post-resignation Mean 2.95 3.32 Std Dev 0.85 0.70 p25 2 3 p50 3 3 p75 4 4

Histogram of moods before and after announcing my resignation.

In the cyclical data, I spend most of the day above a three, and don’t experience such a significant dip in the evening after work.

Mood on an average day, split between pre-resignation and post-resignation time periods.

In the weekly data, my baseline is again higher. Interestingly, I don’t experience as much of a weekend uptick, most likely because weekends and weekdays blend together.

Mood during an average week, split between pre-resignation and post-resignation.

So it turns out sabbatical has been great for my mood. Even I was surprised by the magnitude of the shift. I should point out that it wasn’t clear that this would be the case. There’s plenty of folk wisdom that contradicts this. The arguments go something like this: Work gives you something to dedicate yourself to, and through that it gives your life meaning. Without the structure it imposes and goals it sets, you’ll feel aimless and squander what little time you have. Needless to say, this hasn’t turned out to be the case with me. The key, I believe, is that I spend my time on intellectually satisfying projects, and that I’m using this time for personal and professional development. In other words, I’m not waking up at noon and playing World of Warcraft all day. One objection is that these bits of folk wisdom only apply to involuntary unemployment, and what I have is closer to vacation than being out of a job. There’s some truth to that, but I think there’s still a belief out there that unemployment, even when it’s voluntary, isn’t psychologically healthy.

More generally, this suggests that a meaningful change to baseline mood is possible, at least for the nearly 5 months since I announced my resignation. Perhaps, if I could stretch my sabbatical out to multiple years, I would start to see a return to pre-resignation mood levels. But for now I’ll take this to mean that circumstance matters quite a bit, even when my circumstance was already quite good to begin with. The bad news is that this is a rather drastic and impermanent change. I fully expect to be re-joining the workforce. Hopefully my next job will have a more positive effect on my mood. Or perhaps I could start taking more flexible part time and contract positions to minimize work as much as possible. Another thought relates to how simply announcing my resignation boosted my mood even though I worked for 8 more weeks. Perhaps I can reduce the impact of having a job by always having something to look forward to, like side projects I’m passionate about. Whatever happens, I’ll at least have the data to measure the impact of these experiments.

Finally, I want to mention that my previous job was not at all bad. By many measures, it was great. I was well compensated (I made enough to save up for sabbatical). I worked on highly visible and impactful projects that were deployed to hundreds of millions of users. There were never risks to my health or safety. My colleagues were all whip-smart and kind. Instead I think the correct takeaway is (1) sabbatical has just been very very very good and also (2) that all jobs, even the very good ones, aren’t without some measure of stress and tedium.

3.4 Labels

The context included along with the samples can answer two questions. First, how do I tend to spend my time? Second, how do different contexts affect my mood?

To answer these questions, I first needed to clean the context data. The free form nature of the responses (I could type anything I wanted into the SMS message) led to some inconsistent labeling. I first corrected any typos, for example, renaming all instances of “worj” to “work”. Because the label space was rather sparse, I then did my best to manually compress the space, mapping labels like “dinner” and “lunch” to a common label like “eating”. I realize this is not very scientific, and I could have inadvertently biased the data depending on how I set up the groupings, but for now I’ll press on. I have ideas about how to improve this going forward in Next Steps. I also discarded any samples that lacked context data. A lack of label was more likely due to laziness than a lack of no context. Again, this could lead biases and I’ve since made some changes to the way I collect this data, again outlined in section Next Steps.

I look first at the proportion of labeled samples containing a given label. This answers the question of how I spend my time.

The proportion of samples with context that contain a given label. Note the reports can contain more than one label, and some reports contain no labels (reports without labels are not counted toward the total). Only labels that occur in at least 20 samples are included.

Work is by far the place were I spent the bulk of my time, taking up nearly 45% of my waking hours. In light of this, it’s no surprise that a change to work could have such a drastic effect on my overall mood, as discussed in section Life Events. After a steep drop off, the next most common labels are leisure, eating, spending time on hobbies, and cooking. Overall, these labels aren’t too much of a surprise. I’m an introvert, so it makes sense that after work I spend much of my free time on solitary activities like reading and hobbies and less than 5% of my time socializing. It’s also nice to see that I actually have quite a bit of leisure time, even though it often doesn’t feel that way (at least pre-sabbatical it didn’t). Leisure, hobby, and reading add up to over 25% (although, there is probably some overlap in these labels, so I cannot strictly speaking add them up).

Next I look at the average mood associated with a given label. This shows how different labels affect my mood. For example, if the average sample with the label “socializing” is a 4, I can have some confidence that socializing increases my mood.

Samples are grouped by label. The average mood of each group is calculated and plotted. Because samples can contain multiple labels, samples can also be in multiple groups. This show how different labels are correlated with different moods.

Drinking takes the top spot, with an average mood slightly over 4. This is followed by outdoors (which is mostly composed of hiking), and coffee (which I used if I was currently drinking coffee or had drank it in the past hour). In fourth is socializing. All of these labels are significantly above 3, in the 3.5 to 4 range. On the other end of the spectrum, starting with the worst, is worrying, being in pain, being tired, and working. The first three are all in the 1.5 to 2 range, and then there’s a steep climb to work, which is slightly over 2.5.

As the ancient proverb says, “Avoid drugs not because they’re bad, but because they’re very good.” Substances make a good, perhaps too good, showing with alcohol and caffeine taking two of the three top spots. This isn’t too surprising. Both melt away anxiety and induce a mild euphoria. But this is disappointing, as they’re not activities I want to optimize for given the obvious health risks.

Thankfully, outdoors comes in second, which is something I can optimize for. My hypothesis is that it’s mostly the exercise component of being outdoors that boosts my mood. Having beautiful surroundings helps too, but anecdotally, I’ve noticed a similar boost after going to the gym. But if exercise is a mood booster, why does walking show up so low? My guess is that walking doesn’t raise my heart rate enough to trigger endorphins. Another thought is that much of the walking was either to or from work, which is a mood reducer.

Socializing comes in fourth. Being an introvert, this was surprising, but upon reflection, I think my aversion to socializing is myopic. I fret in the moments leading up to it, but once I’m actually doing it, it’s usually enjoyable. It’s not something that comes naturally to me, but the data suggests I should make more of an effort here.

On the other end of the spectrum is work, worrying, pain, and tiredness. No surprises here. As discussed in Life Events work exerts a negative effect on mood and this analysis further confirms that.

In the future, I’d like to run a regression on these variables and see if I can tease apart any of the effects further. For example, perhaps drinking ranks highly partly because it’s often combined with socializing and eating. And perhaps I can see how much of the outdoors effect is due to exercise.

3.5 Markov Process

How does my current mood affect my future mood? If I’m a 3 now, what am I likely to be in a hour? One way of answering this question is by modeling the data as a Markov process. This works by counting transitions between moods and deriving probabilities from those counts. If I’m a 3 now, and during the next sample I’m a 4, I add one to the 3 -> 4 transition count. I count all transitions in this way, and use those counts to calculate the odds of specific transitions. If 30% of transitions from the 3 state go to 4, then I can conclude that, if I’m currently a 3, there’s a 30% chance I’ll be a 4 during the next sample. Running all these calculations, I can produce a graph of the Markov process.

My mood as a Markov process. Each node is a mood, 1 through 5, and each edge represents a transition. The edges are weighted by probability of transition. Edges with less than 5% probability are elided.

There appear to be two properties that underlie this process. The first is that 3, the neutral mood, seems to exert gravity. The second is that nearly all states exhibit inertia.

3 is a gravity well in the sense that other states tend to flow into it. The most likely transition is always to move toward it. If I’m currently a 3, I will most likely remain a 3 on the next sample with 58% probability. If I’m a 2 or a 4, I will most likely move back to 3 on the next sample with 39% and 45% probability. If I’m a 1 or 5, I’ll most likely transition to 2 or 4, which will most likely send me back to 3. There’s a sense in which 3 is downhill of everything and as I move to the extremes, the further uphill I climb.

Moods exhibit inertia in the sense that remaining in the current state is usually the second most probable transition, behind moving toward 3. For example, if I’m currently a 2, there’s a good chance I’ll stay a 2. The most probable transition is 3 (39%), but remaining a 2 comes in a close second (36%). 4 also exhibits this pattern. I’m most likely to be pulled in by the gravity of 3 (45%), but the second most likely transition is to remain a 4 (44%). 1 again exhibits this. I most likely move toward 3 via 2 (37%), but I’m liable to stay a 1 with 32% probability. 5 is the only state that diverges from this pattern. Alas, joy is so fleeting that remaining a 5 is only the third, rather than second, most probable transition (12%) behind transitions to 4 (58%) and 3 (21%).

This graph also tells a story similar to the histograms, where at the extremes 1 has a higher count than 5, but in the middle 4 has a higher count than 3. Similarly, in the Markov process, 1 has more inertia than 5, but moving in one level, 4 has more inertia than 3. In other words, despair is more self perpetuating than joy, but mild happiness is more self perpetuating than mild sadness. I suppose that’s not a terrible deal.

Overall, I am quite pleased with the signal to noise ratio in this analysis. Since I wasn’t logging all transitions between moods, I was worried deriving transition probabilities wouldn’t be meaningful. After all, the probabilities in the graph are not technically transition probabilities, they’re probabilities of what the next sample will be. In the time between samples, it’s possible many transitions could have occurred. Despite that, the Markov model appears to pick up on something real. It paints a picture of a sort of psychological homeostasis–a system always working to return to its neutral state. This makes me wonder if it could be used as a diagnostic tool. Do people suffering from depression exhibit gravity wells in state 1? Do bipolar people exhibit wells at 1 and 5? Do the anhedonic exhibit particular strong gravity at state 3? I am not a psychologist, so I can only speculate, but the possibilities are tantalizing.

4 Next Steps

Going forward, I have several improvements in mind, some of which have already been implemented.

4.1.1 Reporter

First, I’ve improved the way I do data collection. I’ve started using Reporter, an iOS app, and have since retired my bespoke AWS and Twilio setup. The main reason for switching is Reporter drastically reduces the friction to filling out a survey. Rather than having to enter everything manually into a text message, it supports multiple choice dropdowns. In particular, it has a ‘Tokens’ feature which is something like a dropdown plus a text box. If the item I’m looking for isn’t already in the dropdown, I can enter it in the text box and the next time I’m prompted it’s added to the dropdown. This is perfect for sampling context. I rarely need to type a new label, and it keeps my labels consistent. Hopefully this will help with some of the issues mentioned in Labels.

If you plan on setting up your own experience sampling project, I can’t emphasize enough the importance of lowering sample friction. In my opinion, keyboard entry is a non-starter and is part of the reason samples dropped off in the later part of the sampling period (see the time series graph).

The reduction in sample friction has also allowed me to add many more questions to my survey. I now have questions about stress, energy level, exercise, meditation, and sleep quality. It also uses the sensors to automatically capture location, sound level, and some other things.

I will say that Reporter’s built in data analytics features are pretty weak. Plan on exporting the data and exploring it yourself, otherwise I’m not sure it will deliver much value.

It’s also worth mentioning that it’s not possible to set up Reporter for independent sampling as discussed in Sampling, but given how much easier it is to set up, I think the trade off is worth it. In practice, I don’t find that I’m tempted to game the system.

4.1.2 Sleep Cycle

I’ve also started using some automated tracking tools. In particular, Sleep Cycle claims to be able to analyze my sleep patterns by enabling the microphone while I sleep. I haven’t used it for long, so it remains to be seen how well it works, but it does seem to at least identify when I’m sleeping and when I wake up in the middle of the night.

Example of a single night’s sleep analysis.

4.1.3 iOS Health

iOS’s Health app has been keeping track of all my step data for several years now. I plan digging through this data soon. This also seems like a good place to enter things like weight, body fat, and exercise. All data appears to be exportable.

4.1.4 MyFitnessPal

MyFitnessPal is perhaps the most ambitious of the tools I’m using. I’m attempting to keep track of diet, but food logging is tedious to say the least. I think it would be neat to be able to dig up correlations between nutrition and mood, or sleep quality, or energy level, but staying motivated to use this app is a struggle.

5 Conclusion

I’ve spent quite a bit of time and effort on this project. From setting up and maintaining the MoodBot service, to writing custom tools to pull down and parse the data, to exploring and graphing the results, to producing the write up you see here. Given all that, I’m happy to report that I think it was worth it. Some results aren’t that interesting. For example, it’s not too surprising that pre-resignation I enjoyed my weekends. But other results were quite surprising, such as how big of a mood booster sabbatical has been. Further, I think a lot of the value lies in seeing how this data changes over time. I plan to continue tracking as much as I can and look forward to seeing this project evolve as I grow older.