From Fellrnr.com, Running tips

This graph looks at the best performance from runners who had both a positive and negative split result (12,425 or 1.6% of the finishes). Of those runners, 52% did better with a negative split and 48% on a positive split, but notice the peak for the slightly negative splits.

The results of 26 marathons covering 876,703 results for 754,851 runners were analyzed to look at the differences between positive splits (going slower in the second half) and negative splits (running the second half faster). Overall 13% of finishes were negative splits, with a mean split of positive 8.25%. Faster runners have a narrower distribution of splits and a mean closer to even. Looking at the 192,585 runners who have more than one finish in the same city, only 10% of the best performances are with a negative split, though a higher portion of the 2:00 to 2:30 and 4:00 to 5:00 finishes were negative. The subset of runners who have both a positive and negative split shows that 52% had their best performance with a negative split and the most common split for a best performance is negative 0-1%. Not surprisingly the biggest change in performance relates to extreme positive splits being much slower than more even splits, probably due to Going out too fast. Overall, the data suggests that it's beneficial to have a split time that is close to even, with a slightly negative split possibly being optimal.

1 Method

The New York City and Chicago marathons use electronic timing that records the start, finish and half way times for runners. These races have little elevation change; NYC has around 900 feet (300m) of ascent and descent, and Chicago is effectively flat. Data is publically available for the New York marathon from 2000 to 2011 and the Chicago marathon from 1998 to 2012. (The results of the 2007 Chicago marathon were excluded as that year the race was unusually hot and temperature has a big impact on marathon performance.) The results were grouped within race location using name and age so that multiple finishes by the same individual could be compared. This gives data for 26 races, covering 876,703 results for 754,851 runners.

2 Runners with multiple entries

We have 72,259 runners with multiple entries, with the distribution shown below.

Count of finishes 2 3 4 5 6 7 8 9 10 11 12 Number of runners finishing 48,548 13,253 5,405 1,963 1,076 609 394 336 240 220 215

3 Distribution of Splits

This is section looks at the distribution of splits. Each finish has its split percentage calculated based on the overall finish time and the half way time. The distribution is analyzed for all runners, then for different groups of runners based on their finish time.

3.1 Distribution of All Splits

This section looks at the distribution of splits for all finishes. Overall, 87% finishes were positive splits and 13% were negative splits. This ratio is similar when the splits are divided by finish time, with the exception of those finishing in over five hours, where the percentage of negative splits is somewhat lower. However, the distribution of splits varies with finish time, with the fastest runners having a quite narrow distribution, and the distribution widening with each successively slower group. The peak of the curve is also at a more positive split value for slower groups. This suggests that faster runners tend to run a more even pace than slower runners.

The distribution of all splits for all runners. The distribution of all splits for runners finishing in under 2:30. The distribution of all splits for runners finishing between 2:30 and 3:00. The distribution of all splits for runners finishing between 3:00 and 4:00. The distribution of all splits for runners finishing between 4:00 and 5:00. The distribution of all splits for runners finishing in over 5:00.

Below is a table of statistics for the breakdown of splits, which give another view on the data. Again you can see that the faster runners have a narrower spread of splits (standard deviation) and their average splits are closer to even.

Finish Time Negative Split Positive Split Count Mean Interquartile Mean Median Standard Deviation 2:00 to 2:30 10% 89% 1,544 2.65 2.13 2.07 2.75 2:30 to 3:00 11% 89% 16,528 3.35 2.83 2.74 3.43 3:00 to 4:00 15% 85% 225,004 4.53 3.89 3.78 5.32 4:00 to 5:00 16% 84% 368,165 7.17 6.13 5.87 7.94 over 5:00 7% 93% 263,936 13.25 11.66 11.04 11.56 total 13% 87% 875,177 8.25 6.49 6.13 9.3

Below is an alternative breakdown of splits into positive, even and negative splits by finish time. For this table a split is considered even if it is +/-0.5%.

Finish Time Negative Split (<-0.5%) Even Split (-0.5% to 0.5%) Positive Split (>0.5%) 2:00 to 2:30 6% 14% 80% 2:30 to 3:00 7% 10% 83% 3:00 to 4:00 12% 7% 81% 4:00 to 5:00 13% 5% 82% over 5:00 6% 2% 92% total 11% 5% 85%

3.2 Distribution of Best Splits

This section looks at the finishes of runners with more than one finish in a city. This allows us to look at which split percentage resulted in their best time. This is a much smaller subset of the results, including ~22% of the overall results (192,585 of 876,703). For this subset of finishes the elite runners (sub 2:30) have a higher percentage of negative splits, but runners finishing between 2:30 and 5:00 have a lower percentage of negative splits when compared with all finishes shown above. The most significant difference however is that the best splits have a much narrower spread; that is the best splits are generally closer to even, with fewer finishes in either the extreme negative or positive range.

The distribution of the best splits for all runners. The distribution of splits for runners finishing in under 2:30. The distribution of splits for runners finishing between 2:30 and 3:00. The distribution of splits for runners finishing between 3:00 and 4:00. The distribution of splits for runners finishing between 4:00 and 5:00. The distribution of splits for runners finishing in over 5:00.

Here is the data in tabular form, showing the lower standard deviation than the overall splits. While the splits are closer to even, the pattern of the mean becoming more positive and the standard deviation increasing as the runners become slower is the same as the overall results.

Finish Time Negative Split Positive Split Count Mean Interquartile Mean Median Standard Deviation 2:00 to 2:30 18% 82% 130 1.67 1.48 1.42 1.94 2:30 to 3:00 8% 92% 1,255 2.74 2.46 2.42 2.33 3:00 to 4:00 9% 91% 17,440 4.19 3.76 3.7 3.74 4:00 to 5:00 12% 88% 30,889 6.21 5.36 5.17 6.04 over 5:00 9% 91% 22,545 10.16 8.65 8.27 8.99 total 10% 90% 72,259 6.89 5.49 5.23 7.07

Below is the alternative breakdown of splits into positive, even and negative splits by finish time.

Finish Time Negative Split (<-0.5%) Even Split (-0.5% to 0.5%) Positive Split (>0.5%) 2:00 to 2:30 15% 16% 69% 2:30 to 3:00 4% 10% 87% 3:00 to 4:00 6% 6% 88% 4:00 to 5:00 9% 5% 86% over 5:00 7% 3% 90% total 7% 5% 88%

3.3 Distribution of Best Performance for Runners with Both Positive and Negative Split

This section looks at the best performance from runners who had both a positive and negative split result. This is a much smaller subset of the result, including just 12,425 finishes (1.4% of the finishes, 1.6% of the runners). Of those runners, 52% did better with a negative split and 48% on a positive split. If we look at the distribution by finish time, we see that not surprisingly the faster runners run close to even splits, with just a couple of percent variation. As the finish times get longer, so the distribution of the split percentage flattens out.

Distribution of the splits for the best finish of runners who had both positive and negative splits. The distribution of splits for runners finishing in under 2:30. The slight rise on the positive slope is an artifact of the low number of samples as few elites have run both a positive and negative split. The distribution of splits for runners finishing between 2:30 and 3:00. The distribution of splits for runners finishing between 3:00 and 4:00. The distribution of splits for runners finishing between 4:00 and 5:00. The distribution of splits for runners finishing in over 5:00.

Here's a tabular breakdown of the same data.

Finish Time Negative Split Positive Split Count Mean Interquartile Mean Median Standard Deviation 2:00 to 2:30 69% 31% 35 0.02 -0.29 -0.36 1.62 2:30 to 3:00 49% 51% 190 0.69 0.32 0.01 1.8 3:00 to 4:00 44% 56% 3182 1.71 1.03 0.74 3.65 4:00 to 5:00 53% 47% 6135 1.64 0.48 -0.12 5.11 over 5:00 61% 39% 2883 1.48 -0.13 -0.48 6.75 Total 52% 48% 12425 1.6 0.49 -0.09 5.19

Below is the alternative breakdown of splits into positive, even and negative splits by finish time.

Finish Time Negative Split (<-0.5%) Even Split (-0.5% to 0.5%) Positive Split (>0.5%) 2:00 to 2:30 37% 34% 29% 2:30 to 3:00 21% 36% 43% 3:00 to 4:00 30% 19% 52% 4:00 to 5:00 41% 15% 44% over 5:00 50% 13% 37% total 40% 16% 44%

4 Performance Changes against Splits Percentage

Next we look at how the magnitude of the performance change is related to the split percentage. The charts below plot all results by the split percentage against the relative performance of the result. The relative performance is based on a comparison with the average for that runner in that city's race. The color on the chart represents the density of results, as do the contour lines.

4.1 All finishes

These graphs show all finishes, and as you would expect given the overall distribution of splits, there is a general bias towards positive splits. Further details given under each image, but there are some general trends:

The negative split finishes have a similar distribution between being faster and slower than the runner's average time. The exception is runners finishing in over 5 hours, where the negative split appears to have slightly more benefit.

There is a strong bias towards the more positive splits being slower.

Faster runners tend to be more even in their splits.

The density of finishes by split percentage and relative performance change for all runners. Notice that the center of the hotspot (A) is just below and to the right of the mid lines, indicating that for most people their best time that comes with a very slightly positive split. Also notice how the contour lines extend up and to the right (B), suggesting that more significant levels of positive split are slower for many people. The density of finishes by split percentage and relative performance change for runners finishing in under 2:30. There are relatively few runners under 2:30, so it is hard to draw conclusions. Notice how tightly clustered the results are for these elite runners, indicating how close to even splits they usually run. The negative (left) side of the plot is quite symmetrical, suggesting that a negative split is as likely to be slower as it is faster. There are just a few data points that extend up and to the right that suggest the more positive splits are slower, and with these elite athletes even a few percent more positive may be detrimental. Note that it is not uncommon for an elite runner to drop from the race rather than finish slower. The density of finishes by split percentage and relative performance change for runners finishing between 2:30 and 3:00. Compared with the sub 2:30 runners, these runners have more points that extend up and to the right suggesting the more positive splits are slower. The density of finishes by split percentage and relative performance change for runners finishing between 3:00 and 4:00. Here you can see the plot become asymmetric, with an increase in points that are more positive and slower (A) counterbalanced by points that are slightly positive splits and faster (B). The density of finishes by split percentage and relative performance change for runners finishing between 4:00 and 5:00. The pattern shown in the 3 to 4 hour range is repeated here, but rather more spread out. The density of finishes by split percentage and relative performance change for runners finishing in over 5:00. For this group this is a slight indication that a negative split is more likely to be faster than slower. This group also has more extreme variations in performance and split percentage.

4.2 Comparing the best positive and negative finish

These graphs look at runners with both a positive and a negative split finish. If a runner has multiple positive or negative split finishes, then the best performance in each category is used. The runner's average performance from all finishes is used to determine which time group their results are placed in. By their nature, these graphs have an equal number of positive and negative split finishes.

The density of finishes by split percentage and relative performance change for the runners' best positive and negative split finishes. Here we see some interesting asymmetry between the negative and the positive splits. The negative splits are generally smaller in magnitude (closer to even splits), but also have a greater magnitude of the performance change. Runners finishing in under 2:30. There is little too data here to be significant and this image is shown for completeness only. Runners finishing between 2:30 and 3:00. This grouping is fairly symmetric, with a very slight elongation towards negative/slower and positive/faster, suggesting a small bias towards a positive split finish being better. This is reflected in the percentages for this group that did their best run 21% negative, 36% even, 43% positive. The greatest performance changes are from ~-1% to +3%. Runners finishing between 3:00 and 4:00. Again the negative splits produce the greatest variation in performance, with a small negative split (-1% to -2%) producing a performance changes in the range +9 to -9%. This group did their best runs on 30% negative, 19% even, 52% positive. The greatest performance changes are from ~-1% to +5%. Runners finishing between 4:00 and 5:00. This group of runners mirrors the overall population, with their best runs on 41% negative, 15% even, 44% positive. The greatest performance changes are from ~-3% to -1%. Runners finishing in over 5:00. This group exhibits a bias towards better performance with negative splits, with a performance change of more than 5% being highly skewed towards negative splits. This group did their best runs on 50% negative, 13% even, 37% positive.

5 Average performance change between splits

These graphs show the average change in performance for the combinations of splits. For example, if a runner has two finishes, a 5% positive split and a 2% negative split and the negative split is 10% faster, then an entry is added to the bin for -2% (x axis), 5% (y axis) and a -10% value. Each combination of positive and negative split is then averaged. The graph has the average performance change as the color and the size indicates the number of entries. (Note average performance changes greater than 20% are capped and shown red.)

5.1 All finishes

These charts map all combinations of finishes for all runners with more than one finish in a race.

Performance changes from all finishes. Here the greatest performance change is clearly associated with moving from a highly positive split to a less positive or slightly negative split. This area is shown in the red area and marked "A". The colored bands marked "B" suggest that moving to a less positive split has an performance change that is broadly proportional to the reduction in positive split. The area marked "C" shows that performance can be improved even with increases in the amount of positive split, the performance changes are generally smaller. The performance changes for runners finishing in under 2:30. There is little data for this grouping, but like the overall graph, it seems that reducing the magnitude of the positive split is related to the better changes in performance. The performance changes for runners finishing between 2:30 and 3:00. Compared with the sub 2:30 group there are greater changes in performance across all splits. The greatest performance change is between the slower 10_to_20% splits and the faster 0_to_10% splits. The performance changes for runners finishing between 3:00 and 4:00. Compared with the faster groups there is a much wider spread of split percentages and performance changes. The central areas have a similar performance change to the faster groups, but the more extreme splits have more extreme performance changes. The performance changes for runners finishing between 4:00 and 5:00. This group has a larger spread of splits than the 3:00 to 4:00 group, but the overlapping areas have broadly similar changes in performance. The performance changes for runners finishing in over 5:00. This group is broadly similar to the 4:00 to 5:00 group.

5.2 Comparing the best positive and negative finish

Like previous sections, these graphs look at runners' best positive and negative split finishes. Because only positive to negative splits are compared, only those two quadrants are populated.

Performance changes from all finishes. The graph is reasonably symmetrical, with the main difference between the two quadrants coming in the large performance changes between the slower highly positive splits (25%+) and the faster negative splits. While highly positive splits (20%+) can be faster than a negative split, this appears to be less common and the relative performance improvement significantly smaller. The performance changes for runners finishing in under 2:30. There is too little data to be meaningful and this graph is included for completeness only. The performance changes for runners finishing between 2:30 and 3:00. There is too little data to be meaningful and this graph is included for completeness only. The performance changes for runners finishing between 3:00 and 4:00. This graph suggest that when a moderate positive split (0_to_10%) is faster than a negative split, the performance change is greater than when the equivalent negative split is faster. Negative splits appear to provide similar improvements only over positive splits in the range 10_to_20%. The performance changes for runners finishing between 4:00 and 5:00. This group has a similar level of performance changes between the quadrants, suggesting no consistent benefit of positive over negative splits, or vice versa. The performance changes for runners finishing in over 5:00. This graph shows a stronger bias towards the negative splits having a greater performance benefit compared with positive splits. .

6 Variations by Age and Gender

The distribution of splits does not vary much by age or gender.

Distribution of splits by age. Distribution of splits by gender.

7 Sources of Error

There are a number of possible errors within the data.

Not all results included the half way split, probably due to an error with the electronic timing mechanism. These results are ignored.

Some runners may not be running to achieve their best time, such as runners acting as pacers.

Runners were identified by using their age on race day to provide a year of birth. However, because the date of the race varies slightly, people with their birthday around this time may not get their results grouped together.

The name provided on the race results for a given runner may vary depending on the use of nicknames or due to data entry errors.

8 Conclusions

Obviously it is not possible to draw causal conclusions from this type of data.

Negative splits are relatively unusual, and typically a runner will only be negative by 0-3%.

Most runners have their best performance with a slight positive split.

Faster runners have more even splits, probably because they are better at pacing.

Slow runners have a broader spread of splits than faster runners, and their mean split percentage increases.

Not surprisingly, a large positive split reflects a massive slowdown in the second half of the race and is associated with the poorest performances.

Overall, negative splits are slightly (52%) more likely to result in a better performance than a positive split. However, for the fastest runners (sub-2:30), the negative split is 69% of best performances and 61% of over-5:00.

Of runners how tried both positive and negative splits, a slightly negative split is the most common best performance.

9 Appendix - Table of Split Times

This table shows sample split percentages for various finish times.