Yesterday The Sunday Times newspaper (UK) had an article on page 5 of the main section entitled “40mph city cyclists defy speed limits” (in the paper) “City cyclists turn roads into racetracks” (on the website) written by Nicholas Hellen and Georgia Graham which repeatedly refers to a Segment in London where the average speed of the fastest riders is 41 mph. I was contacted by Georgia last Thursday and spoke at length to Nicholas about Strava and how it works and particularly about how you can’t trust the timings (and hence speed) of short segments. But from the beginning of the conversation it was very clear what their angle was going to be and basically wanted me (or someone) to be able to quote saying that Strava encourages me to break the law (speeding (which I pointed out to him isn’t actually breaking the law) and jumping red lights) in built up areas. It doesn’t and I don’t. In this post I’ll do my best to explain why that 41 mph should actually be more like 31 mph.

The key parts of the article (after the headline) are as follows:

Cyclists are racing around inner-city streets at speeds of up to 41mph

WHO does that cyclist hurtling past at 41mph think he is? Bradley Wiggins?

The record speed for one crowded section of London’s South Circular Road, which mostly has a 30mph speed limit, is 41mph. On a nearby road, nicknamed “Gunning it! On Armoury Way” by competitors, a rider has clocked a time of 33.3mph.

Two riders, identified as Tris M and George B, are recorded as averaging 41mph on a short section of the South Circular near Barnes. The only way of displacing them is by again breaking the speed limit.

The Sunday Times tested three routes in central London, each of them ridden more than 20,000 times by Strava users, to establish whether it was possible to match cyclists’ times without running red lights or breaching the Highway Code.

In each case, a motorbike, travelling at the 30mph speed limit, clocked slower times than those recorded by the cycling kings and queens, as well as cyclists much further down the leaderboard on each route.

Now, having had the opportunity to read the article (unfortunately it is behind a pay-wall but I’ve also posted the free section of it on my Facebook page) it is clear how this story came into being. The focus of the first quater of the article and the shock tactic headline is based around a segment in London whose KOM has a top speed of 41 mph and hence encourages every other Strava user to break the speed limit and jump red lights in order to try and challenge that time. Myself and two others are quoted saying how useful they find Strava of training, enjoyment and motivational purposes that go some whay to readdress the balance but the damage and misleading has already been done.

To put things into perspective, Chris Hoy can reach a top speed of 48.5 mph (78 kph) and that is with thighs the size of most people’s waist and in a climate control velodrome built to maximise speed (hot and humid to reduce wind resistance) so to think that a commuter can come anywhere close to 40 mph over a few hundred metres should be enough to make people question the numbers.

The Strava segment in question is called “upper rich road” and it’s current leaderboard is:

Up until a few moments ago there was another rider in 1st position with a time of 6 seconds and an average speed of 108.4 mph (174.4 km/h). We can forget about this rider’s data because it is clearly not correct to the extent that Nicholas actually asked me how a rider could get such a high speed. I suppose we should be pleased that he didn’t choose to use this speed in the article’s opening paragraph but he does appear to have assumed that all the other data must be correct even having just seen for himself how wildly inaccurate it is capable of being.

Timings (and therefore average speeds) on segments can be inaccurate for three main reasons:

GPS devices playing catch-up – the most common main cause of the fast times on this segment. Due to a combination of riding between tall buildings/trees or just poor quality recording equipment, often the device will lose track of where it is and then find it again causing the recorded track to jump unfeasibly quickly to the new location. Ride not actually covering the segment – this is most apparent with segments that form a loop. Say a segment covers 30 km but the start and finish points are only a few metres apart then in the past, riders crossing those 2 points over just a few seconds were being awarded the segment and often getting ludicrously fast average speeds. Strava have rectified this and now ensure that the ride covers at least 75% of the segment’s route. Not foolproof but they have to allow for genuine GPS drift to ensure the people genuinely riding the segment are awarded it. Unfortunately I believe historical data still applies to the segments leaderboards. I would imagine this could be rectified by flagging and recreating the segment concerned. Start and finish points of a rider’s effort for a segment are specific to points recorded by their GPS device rather than the start and finish points of the segment. This potentially results in different riders recording different times and speeds for the same segment even if they were riding next to each other throughout. I have previously written an in depth explanation of this which you can read at your leisure but basically the shorter (< 1km) and straighter the segment is the greater the effect this can have to twist the leaderboard.

So here is the map of the segment:

Seeing as this is a very short segment at only 290.68 m and is dead straight, lets plug the segment id into my Alternative Leaderboard and see what comes out:

First off the rider 2nd in the list, Tris M is showing up as NaN. This is due to this segment not being “popular” as far as he is concerned. Maybe he has hidden it from his list personally or Strava have hidden for other reasons. Chances are he doesn’t know it exists and so would never have knowingly tried to get a time on it.

The table shows the following details:

Time Pos – this is the position shown in Strava’s own leaderboard. It is purely based on the time taken between the matched start and end points of the rider’s GPS trace.

Time – the number of seconds taken.

Speed Pos – the position based on the Actual Speed of the rider calculated using the Actual Distance they covered.

Actual Speed – the average speed the rider travelled over the Actual Distance they covered.

Seg Speed – this is the average speed shown by Strava. It is calculated using the Time and the distance of the original segment.

Actual Distance – due to the way Strava match a riders start and finishing positions of a segment this can differ greatly from that of the segment. They are restricted to the points recorded by the rider’s GPS device. At present they are not interpolated to the points where the rider crossed the segment’s start and end points.

It is clear from the list that for those top 10 riders the actual distance travelled varies massively (183.6 m right up to 293.7 m) with only a single rider having covered the entire 290m of the segment.

Numbers can be tricky for lots of people to visualise so here are some pictures that illustrate it pretty well. First up a rider whose data has been matched up pretty well (the red line is the route of their ride and the blue line is the section of their ride matched up to the segment. This rider passed in both directions down this stretch of road on this ride):

The start and end points match up almost exactly with that of the segment and his distance is recorded at 308.5 m, just a little over the segment itself.

Now, a rider whose data matching isn’t quite so great:

This rider’s distance is considerably shorter. It looks like the gps device is struggling and only locks onto a position midway along the segment.

And finally one that hasn’t really worked at all (but is still matched):

A very confused gps device. This GPS trace is actually from our KOM George B. No wonder he got such a fast time although I can’t explain how he covered 275.6 m in the process!

Ordering the Alternative Leaderboard by position speed (so we don’t actually care how far they travelled, just how fast they were going) then we get the following:

We know we can strike out George B due to his very poor GPS trace but we can actually also strike out James S and Mark E for similar reasons. This leaves us with a new King Of the Mountain: james b. Well done James!

The actual KOM’s average speed? 31.9 mph (51.33 km/h)

Not such a shock headline now is it?

Just to see how much better Strava’s segment matching is now compared to the past I decided to create a near duplicate of this segment and although it has cleared a number of the spurious rides from the list George B. still sits at the KOM position. Putting it into the Alternative Leaderboard only requires you to ignore George and you once again get the true KOM: jame b.

Don’t blame Strava

This post might well seem to be pointing the finger at Strava but it certainly isn’t. Strava can only do so much with the data it is given and as you have seen from the images above, often the data can be terrible, but people still want all their segments matching! Strava introduced the 75% matching rule so hopefully that will remove a large number of the spurious rides from the leader boards but in my opinion they also need to interpolate those start and finish points and retrospectively apply that to all their data or more of these type of articles will inevitably appear. A huge data processing task though and lots of people will lose their KOM’s in the process but a necessary pain I feel.

As for whether certain segments should exist in the first place I’ll leave that for another day. If people think they are dangerous then flag them, that is what that button is for.

Update: An interesting point made to me by Mr Hellen on the 12th Feb (after the article was published) is whether Strava needs to make it more obvious to users that the times and speeds on their leaderboard (and hence the placings) can be subject to error due to everything I’ve mentioned in this article. A new user coming to the system might well take these speeds at face value.

Conclusion

If you’ve made it down this far then well done for enduring my logic/evidence based rant. Ever since discovering Ben Goldacre’s “Bad Science” column in the Guardian a few years back I can’t bring myself to read or believe in much “news” any more without the niggling, or sometimes blindingly obvious doubt that the journalist involved either haven’t done their research properly or are just representing the statistics in a way to shock rather than educate. This article in The Sunday Times is no exception. If I hadn’t been interviewed for it or it had been hidden away in a supplement somewhere I might have let it lie but seeing as Nicholas specifically chose to ignore a number of my points about the reliability of the data and go for the shock headline instead then I’ve had to make a point of putting this together.

I don’t doubt that some people attempt to improve their Strava placings on their commutes to/from work and some of them probably jump red lights in doing so but I’d hazard a guess that they would jump those red lights even if they weren’t recording their ride. If you stopped every rider who jumped a red light and asked if they were recording their ride for Strava and jumped the light specifically to improve their time on a segment (even if none of them lied) then your yes percentage would be near to zero.

This particular segment is nothing remarkable and regular users of Strava will probably know that without their GPS devices recording incorrectly in their favour they will never make it into the top 10 of the leaderboard and Strava not being able to show a completely accurate leaderboard is obviously down to the bad data it is provided with, but it has a damn good try. The moment this became a problem was when The Sunday Times (a rather large and influential newspaper and home to the one and only David Walsh) decided to take its data at face value as the basis for their headline and article in a way clearly designed to further aggravate the relationship between motorists and cyclists.