It has been a big summer for transportation open data. First, the TLC released a dataset with every taxi and green-cab trip taken since 2014 (a release which I had advocated for in my talk on TED). This is a great step forward and will likely save the TLC much time in filling requests for their data. I’d love to see other government agencies follow suit.

And another exciting development happened on the Open Data front, and it was not something the government did this time- it was FiveThirtyEight. You see, the folks there ran a couple of pieces on Uber and Taxis based on some data they FOILed from the government. Usually, this is where newsrooms move on to the next project. But, instead, FiveThirtyEeight posted the raw FOILed data on github for us all to analyze. That is big news and I’d love to see more data journalists follow suit for two main reasons:

Releasing your data along with your work amplifies your messaging. It allows others to follow up on what you do, and add on to it More importantly, reproducibility is an important part of the scientific method. If “data science” is truly a science, it should be meeting basic tenets of the scientific method. Releasing data allows others to verify and follow up on your work.

What happens when data journalists don’t release their data? You get silly back and forths like the one between the Wall Street Journal and Bratton over the Times Square Pedestrian Plaza. We, as citizens, have to sit on the sidelines as neither has backed up their positions with any data releases. If the WSJ posted the data they used, we could cut through the rhetoric and see what is really going on.

With all that being said, I made a few quick observations which I’ll put on the blog this week. In the battle between Uber and Lyft, Uber is obviously dominant with about 7 times more rides being given. But, it seems that Lyft has made some serious inroads on the late night scene:

Surprisingly, at 2AM the ratio is nearly evenly split between the two.

In fact, a quick look at the hours that each service provides the most rides shows incredibly different shapes- Uber maxing out at the evening rush and Lyft maxing out at around Midnight.

The drastically different shapes point to a very different clientele - I always assumed that Lyft’s pink mustaches might appeal more to a younger crowd, but the data backs that up. So if you are wondering what the difference is between driving for Uber and Lyft, its night and day.

This small insight was made possible by FiveThirtyEight’s release of FOILed data, and I can only imagine how many more data insights will flow if other journalists follow their lead. In New York City, FiveThirtyEight, The Upshot and WNYC seem to be setting the data-release trend. Will your favorite news agency be next?

—–

Data available here.