Head over Heels: detecting Parkinson’s disease from accelerometer data

By Agata Budzillo, an alumna from the January 2016 session of Health Data Science who is now working at The Allen Institute for Brain Science in Seattle. In this post, she describes a model she developed to detect Parkinson’s disease from accelerometer data. This content originally appeared on Agata’s blog.

Parkinson’s disease (PD) is a chronic, neurodegenerative disorder that affects adults typically over the age of 60, and poses a significant burden to patients, their families and caretakers, and the healthcare system at large. PD is a disorder of brain structures essential for habitual control of action. It is characterized by the progressive loss of brain cells, or neurons that produce a signaling molecule called dopamine. Remarkably, the nervous system is able to compensate for the loss of these neurons until there are only approximately 20% of them remaining. Unfortunately, this means that the disease is often not diagnosed until late. The progressive loss of dopamine ultimately results in a complex spectrum of deficits, including muscle rigidity and jerkiness, impaired balance, difficulty initiating movements, involuntary muscle contractions (tremor), cognitive decline and emotional changes. These symptoms can significantly interfere with day-to-day activities.

Unfortunately there is no cure for PD.

Why? PD progression is complex at many levels: patients demonstrate heterogeneity in disease symptoms, progression over time, and their response to therapy. All of these sources of variability make it difficult to predict patient prognosis. In addition, current practice only captures a snapshot of behavioral data when patients visit the clinic, which impede continuous monitoring of the disease.

Sage Bionetworks has embarked on a unique app-based study powered by Apple ResearchKit to monitor the performance of individuals diagnosed with Parkinson’s disease. The mPower app collects demographic and sensor-based data from enrolled PD and healthy individuals on a set of tasks designed to measure neuronal function including dexterity (hand tasks), gait (walk patterns), speech and spatial memory.

While at Insight, I collaborated with Sage Bionetworks to analyze phone accelerometer data collected during the gait task. I constructed an algorithm that, to some extent, is able to distinguish healthy or PD individuals based on the gait task they performed and recorded by their smart phones.

What does accelerometer data look like? How do we analyze it?

For the gait task, the participants place their phone in their pocket, and walk up to 20 steps in a given direction.

Each resulting recording contained information including the time of measurement, tri-axial (X,Y,and Z direction) user acceleration and angular information. The file was also associated with an anonymized code that links to a demographics table to match disease diagnosis (a boolean value, i.e. Yes/No) with the recording.

Gait task on the mPower app

mPower user distribution across Apple products

In order to limit variability among different sensors, I chose to focus on recordings collected on the iPhone 6 and iPhone 6 Plus, which included 65% of users.

Below are two examples of raw accelerometer recordings from non PD (healthy) and PD individuals. The signal is highly periodic, with the vertical spikes corresponding to steps or “heel strikes”.

Example of”clean” accelerometer recordings

It’s difficult to see clear differences in these recordings by eye. Moreover, these are very robust representative examples. Many of the recordings from this dataset had less clear structure, like the examples below.

Example of a more common yet messy accelerometer recording

Because of the nature of this raw data, feature engineering was the main component and challenge of this project. The first problem that I ran into was that the tri-axial recordings are in reference to the phone’s coordinate space. It’s unlikely that every individual in the study had positioned their phone in the same orientation in their pocket during the experiment. Therefore, it would not be valid to directly compare acceleration in a given axis across multiple recordings. To address this issue, I used the angular information contained in the phone record (which tells you about the 3D rotation of the phone in space) to transform and rotate the acceleration measurement in each axis at each point in time. This is known as a quaternion rotation.

Now the acceleration recordings are in world space, where X is horizontal (as in swaying your hips side to side), Y is vertical (as in bouncing up and down), and Z is depth (as in swinging your leg from front to back).

Engineering Features from Accelerometer Data

What kinds of statistics can we use to describe the accelerometer data? And what kinds of features in this data set might be particularly interesting to investigate given what we know about PD? One could imagine that PD individuals have more difficulty with walking due to impaired balance and initiation of movement. This kind of effect could possibly be detected in the duration of recording, time elapsed before the first step, the total number of steps taken, or perhaps the rate of walking. PD is also characterized by 4‑6 Hz involuntary muscle contractions. While these are more prevalent at rest, for some individuals, tremor persists during movement. Finally, PD affects muscle rigidity, and research demonstrates changes in the stereotypy/variability of motor control.

I calculated some global features, such as the duration of the recording, as well as its total power, a common measurement for time series data that reflects the amount of energy consumed per unit time. I also wanted to describe the timing of steps taken during the recording in a quantitative fashion. First I applied a Savitzky Golay filter to smooth the signal. This is effectively a low-pass filter that does a nice job of preserving structure present in the original data by fitting polynomial functions across windows of time throughout the signal. I then found peaks in the signal by searching for local maxima within the constraints of a minimum peak size, and a minimum interval between peaks. The trace below is a representative example of a raw acceleration signal in blue, overlaid with detected peaks in red.