For our summer Strava Jam I used our new bulk access activity stream dataset to improve our current GAP (Grade Adjusted Pace) model. GAP is a running pace adjustment that corrects for the difficulty of running at different elevation gradients.

Our previous model was derived from a laboratory study [1] of 30 elite athletes which measured the metabolic energy cost of running at different inclines. As noted in our previous blog post on GAP, there is a known discrepancy for this GAP model in predictions of running performance on downhill gradients. Crucially, the original paper acknowledges this error as well, noting that the model accurately predicted uphill race performance, but errs on downhill race performance by as much as a factor of 3x.

We think that providing a GAP model that is accurate at predicting real world running performance can be more helpful to athletes than one predicting the metabolic energy cost of running on a treadmill. To test this hypothesis, we used a massive dataset of real world running data to construct a new model that is based on equivalent heart rate — GAP should be the pace that a runner could achieve at the same heart rate while running on level ground. This model deviates from the goal of predicting energy expenditure, but we think it more accurately models the perceived effort of running. As of today, this new GAP model is now enabled for all new runs on Strava.

Methodology

From a dataset of ~6 million runs from 240 thousand athletes, having both significant elevation change and heart rate data, all 60 second windows of stream data were collected. Windows with large variances or unusual averages of heart rate, gradient, or speed were removed, and only a small fraction of windows were kept. These remaining windows exhibit smooth data in all measurements of concern, meaning that the runner was in a relatively steady state. Then the mean gradient, speed and efficiency (heartrate/speed) for each window was calculated. For each activity, window efficiencies were normalized by dividing by the median efficiency of that activity at a nearly flat gradient. This normalization reduces the variation of efficiency due to athlete fitness and other per activity variables such as temperature.

A simple statistical model was fit to this data to establish a relationship between gradient and normalized running efficiency. The model uses a variable width bucketing scheme that reflects the available quantity and variance of the data. Lines of one standard deviation bounds are also given. The running model from Minetti 2002 [1] that Strava previously used is plotted for comparison.