Last week I wrote some stories about cyclists involved in accidents while undertaking.

The stories generated a LOT of comment and debate – both far more and far more negative than I was expecting.

I’ll address that in a moment, but first, here’s a useful exercise in how to get the Government’s STATS19 data and filter down to what you want.

What is STATS19?

We’ve come across STATS19 before. It is the police records of road traffic accidents in which at least one person was injured or killed.

Each accident has a unique Accident Index. The data is divided into three files for each year: accidents, vehicles and casualties. The accident index links the data across these three files.

It is published year by year, so the first thing to do is to get the data and amalgamate it together into one huge data frame.

#clean data and build one big file vehicles2015 <- read.csv("Vehicles_2015.csv") vehicles2014 <- read.csv("Vehicles_2014.csv") vehicles2013 <- read.csv("Vehicles_2013.csv") vehicles2012 <- read.csv("Vehicles_2012.csv") vehicles2011 <- read.csv("Vehicles_2011.csv") vehicles2010 <- read.csv("Vehicles_2010.csv") accidents2015 <- read.csv("Accidents_2015.csv") accidents2014 <- read.csv("DfTRoadSafety_Accidents_2014.csv") accidents2013 <- read.csv("DfTRoadSafety_Accidents_2013.csv") accidents2012 <- read.csv("DfTRoadSafety_Accidents_2012.csv") accidents2011 <- read.csv("DfTRoadSafety_Accidents_2011.csv") accidents2010 <- read.csv("DfTRoadSafety_Accidents_2010.csv") casualties2015 <- read.csv("Casualties_2015.csv") casualties2014 <- read.csv("DfTRoadSafety_Casualties_2014.csv") casualties2013 <- read.csv("DfTRoadSafety_Casualties_2013.csv") casualties2012 <- read.csv("DfTRoadSafety_Casualties_2012.csv") casualties2011 <- read.csv("DfTRoadSafety_Casualties_2011.csv") casualties2010 <- read.csv("DfTRoadSafety_Casualties_2010.csv") vehicles2013 <- as.data.frame(append(vehicles2013, list(Age_of_Driver = NA), after = 15)) vehicles2012 <- as.data.frame(append(vehicles2012, list(Age_of_Driver = NA), after = 15)) vehicles2011 <- as.data.frame(append(vehicles2011, list(Age_of_Driver = NA), after = 15)) vehicles2010 <- as.data.frame(append(vehicles2010, list(Age_of_Driver = NA), after = 15)) names(vehicles2013)[1] <- "Accident_Index" names(vehicles2012)[1] <- "Accident_Index" names(vehicles2011)[1] <- "Accident_Index" names(vehicles2010)[1] <- "Accident_Index" names(vehicles2010)[13] <- "Was_Vehicle_Left_Hand_Drive" names(vehicles2011)[13] <- "Was_Vehicle_Left_Hand_Drive" names(vehicles2012)[13] <- "Was_Vehicle_Left_Hand_Drive" names(vehicles2013)[13] <- "Was_Vehicle_Left_Hand_Drive" names(vehicles2014)[13] <- "Was_Vehicle_Left_Hand_Drive" names(vehicles2015)[13] <- "Was_Vehicle_Left_Hand_Drive" vehicles2015 <- vehicles2015[, 1:22] vehicles <- rbind(vehicles2010, vehicles2011, vehicles2012, vehicles2013, vehicles2014, vehicles2015) names(accidents2014)[1] <- "Accident_Index" accidents <- rbind(accidents2010, accidents2011, accidents2012, accidents2013, accidents2014, accidents2015) casualties2010 <- as.data.frame(append(casualties2010, list(Age_of_Casualty = NA), after = 5)) casualties2011 <- as.data.frame(append(casualties2011, list(Age_of_Casualty = NA), after = 5)) casualties2012 <- as.data.frame(append(casualties2012, list(Age_of_Casualty = NA), after = 5)) casualties2013 <- as.data.frame(append(casualties2013, list(Age_of_Casualty = NA), after = 5)) names(casualties2010)[1] <- "Accident_Index" names(casualties2011)[1] <- "Accident_Index" names(casualties2012)[1] <- "Accident_Index" names(casualties2013)[1] <- "Accident_Index" names(casualties2014)[1] <- "Accident_Index" casualties2015 <- casualties2015[, 1:15] casualties <- rbind(casualties2010, casualties2011, casualties2012, casualties2013, casualties2014, casualties2015) full <- merge(x = accidents, y = vehicles, by = "Accident_Index") full <- merge(x = full, y = casualties, by = "Accident_Index")

Above we are cleaning the data. The rbind function only works if you have the same number of columns, identically labelled. These spreadsheets are not all the same length; in later years additional data was added. So we add some dummy columns in where necessary and change the labels.

Take a look at the append function for adding in columns in between others.

Cyclists only

The next step is to just get the cases where the vehicle was a bike. We filter ‘1’ for pedal cycle as per the accompanying instructions.

cyclists <- full[full$Vehicle_Type == 1, ]

Undertaking

The term in the data is ‘undertaking – nearside’, code 15. It is a simple task to filter down for this code in the ‘vehicle manoeuvre’ column.

undertaking <- cyclists[cyclists$Vehicle_Manoeuvre == 15, ]

That is really all there is to getting the data we want.

To make it faster to write the stories, I printed the data using write.table and plotted it in a Google Fusion table to see where the accidents were.

Some thoughts on the data

There were 2,823 accidents while cyclists were undertaking between 2010 and 2015.

More than half of them were in London.

According to the latest Department for Transport survey, 14.7 per cent of Londoners cycled at least one a month in 2014/15, exactly on a par with the England average.

So people in the capital are no more or less likely to cycle than average.

London makes up 13.7 per cent of the population but somehow 58 per cent of the accidents while undertaking.

The rest of the accidents, as you might expect, are in cities – including ones where cycling is popular such as Cambridge, York and Exeter.

Some thoughts on my stories

The publicly-available yearly STATS19 data is neutral on ‘blame’ for accidents.

There really is no way to tell whether one party, another or a third is at fault in any way from the data.

The Highway Code seems to allow ‘filtering’ by cyclists – passing through slow-moving or stationary vehicle traffic either on the left or the right.

Obviously on the left (nearside) leaves little room for manoeuvre for the cyclist – they can quickly run out of room between a car and the kerb if a driver makes a turn or a move to the left.

After the story was published I had a chat with a transport campaigner – he wants, among other things, to encourage more people to take to cycling on the road. He told me that cyclists often get frustrated with stories about cycling in the news. They often feel that it is portrayed as more dangerous than it actually is.

I am a cyclist myself – you can find me pedaling along Oldham Road in Manchester in the mornings and afternoons to and from work – but it wasn’t a point I understood as well before. The data is accurate, but of course it’s just one small facet of accidents on the roads in Britain.

Picture © Copyright Albert Bridge and licensed for reuse under this Creative Commons Licence.