2:35pm “Thanks for the cookie!”

2:37pm “Oh thank you so much for that cookie, it was delicious!”

2:46pm “I must thank you for the yummy cookie!”

For those of us in contact with Alzheimer’s patients, this series of responses is familiar and saddening. For data scientists, however, a window opens into the disease itself.

“How many minutes did it take this patient to forget they’d expressed their gratitude for the cookie?”

“Could we use the timing between similar statements as a diagnostic indicator of Alzheimer’s disease progression?”

“Over time, did responses become shorter or more similar?”

“Could a sudden decline in sentence length be an indication of Alzheimer’s disease in older adults?”

The data scientists bubble with questions! Post initial excitement, however, they become discouraged — strapping tape recorders to Alzheimer’s patients 24/7 is both inconvenient and expensive, and brings the burden of documenting thousands of hours of audio recordings, furiously scribbling every spoken word into a coffee-stained notebook.

Then, in a spark of genius, the data scientists realize that email accounts across the globe are overflowing with this type of data – it’s brilliant! Why is it so brilliant? Email is simultaneously a record and a vehicle: a record of changes in human communication over time as Alzheimer’s disease develops, and a vehicle for early detection of the disease.

Understandably, our team of data scientists was thrilled to obtain emails written from an Alzheimer’s patient to her granddaughter over a period of four years, from before the onset of disease, to the end of her email career. Here’s a real example of our data:

From this example, one can see the parallels between aural and written conversations with an Alzheimer’s patient, which indicates our data is top notch!

Before sharing what we’ve learned from the data, we’ll introduce a bit of terminology for ease of explanation. An email “set” is one or more emails sent in the same thread without an intervening response from the recipient. For instance, the previous example would be a “multi-email set”, as it contains more than one email in sequence.

Now on to the reason you’ve read this far – our results.

The first and most obvious point of interest is the increase in repetition over time. As her disease progresses, she sends more emails in a row (i.e. email sets get larger).

Before the age of 80, multi-email sets were due to a) deliberately reformulating her response, and b) sending a series of recipes one after the other. After the age of 80, on the other hand, multi-email sets were entirely due to forgetfulness.

It’s easy to imagine that as more emails are sent in a row, the similarity between them increases. The Alzheimer’s patient seems to confirm this notion (Note: the lower the Fuzzy Match Score, the higher the similarity between a particular email and the previous email).

As the figure shows, if only one email is sent at a time, its similarity to the previous email is quite varied, suggesting that “email sent –> thought often resolved, and we move on” (see the left-most side of the figure). However, when more and more emails are sent in the row, the similarity between them increases, suggesting that “email sent –> thought often unresolved, and we do not move on” (see the right-most side of the figure).

We hear you pipe in, “But maybe the similarity is increased not only because the thought takes longer to resolve, but because emails are being fired off to the same recipient at a faster rate?” We hear you loud and clear, so we set out to answer, “What is the time difference between these forgetful emails (within multi-email sets), and how does this compare to the time difference between email sets?”

To answer the first question, the average time difference within a multi-email set was about 3 hours. This indicates that even over a period of hours, a patient may hold on to an unresolved thought. On the other hand, the time difference between email sets was about 20.5 days. A period of days is therefore enough time for a patient to shift gears.

Our analysis of time differences doesn’t end there — we found another interesting trend: an increase in time between email sets as the patient ages.

This could be explained by a number of factors, including (a) she takes fewer opportunities to emails as she ages, or (b) people email her less frequently as she ages. Considering the exponential increase in received emails for most of us in the 21st century, we can probably rule out (b). Why, then, is she spending less time sending emails? Does she (a) forget emails exist? (b) find them irritating? (c) have less energy for technology in general? Perhaps something to be explored in future studies (sorry to keep you in suspense!).

Ok, so she responds less and less as the disease progresses, but moreover, do her emails shorten in length?

The data shown is broken up by location of the recipient, as email lengths change drastically when the recipient and sender are in different locations. Having removed this bias, it seems like the sender’s emails do shorten over time, no matter where the recipient may be (at home in Vancouver, or abroad). What about the lengths of sentences, do they shorten over time as well?

They do.

So let’s recap: a decrease in (1) email events, (2) email word counts, and (3) sentence lengths. With all of this deterioration, one might expect the quality of the writing to deteriorate as well. Surprisingly, this does not seem to be the case. Here we have the relative abundance of unique words in each email over time:

And here we have various readability scores over time:

It appears that the written quality of these emails remains the same over time, perhaps indicating that although her memory is deteriorating, her ability to communicate in a structured and meaningful manner remains steady during early progression.

Our team of data scientists is nowhere near finished with this exploration of Alzheimer’s patient email data. With more emails from more patients, we can form a more complete picture of the disease, and advance scientific discovery in this field. If you have email data to contribute, please do so!

Thanks for your interest in our work, and any thoughts or suggestions are greatly appreciated, and can be emailed to mas29@sfu.ca.

Stay tuned!

– Dave Coulthard, Frank Kelly and Maia Smith