Tour de Farce

Analyzing every ride that ever happened…and creating a race that never did.

In May of 2013, New York City unveiled its large-scale bike share program, Citi Bike, in an effort to reduce traffic and pollution, the typical scourges of a dense urban area. In the three years since, the program has expanded to over 500 stations and enjoys daily ridership in the tens of thousands. These days the ubiquitous bright blue bicycles are as distinguished an emblem for public transportation in this city as their counterparts aboveground and below.

A boon to commuters and tourists alike, the program has also brought joy to New York’s statistical community, and not because nerds like to ride bicycles. Rather, Citi Bike’s comprehensive documentation and publication of ride logs dating back to its inception have served as an open invitation to a new, neat, and relevant big data project.

Ride logs from January 1, 2015

Already, the ever-accumulating Citi Bike data has inspired some shrewd examinations of New Yorkers’ biking tendencies. Ben Wellington of I Quant NY turned subscribers’ gender and age information into a demographically colored map of Manhattan’s neighborhoods that resembles stained glass. On his personal website, Todd Schneider tackled some of the more basic trends: where and when people took Citi Bikes, how long and how fast they went.

Noting these efforts is more than a promotion. It’s to say that much of the analytical legwork has already been executed, freeing us to stray from the beaten path as we inspect…well — the beaten paths. Let’s grab every Citi Bike ride from the last full year, 2015, and figure out the most common routes, as defined by pick-up and drop-off station. In total there were nearly 10 million rides in 2015 that covered over 150,000 unique routes, or about 60% of all possible combinations given the 500 or so active docks.

Here are the ten most popular:

Some interesting patterns are immediately available. The top four routes all begin and end in the same place, which at first may seem like a misprint until you realize that many people take out bikes, ride around for a while, and then bring them right back. Their total displacement may have been zero, but they almost certainly didn’t just sit idly at the station. The frequency of these “net static” routes is likely boosted by Citi Bike’s popularity with tourists, who, using a transportation program for the first time in a new city, are inclined to put their bikes back exactly where they found them.

It’s also no coincidence that the top three nodes are all within a half mile of each other along Central Park South. Circling the park is the quintessential bike trip, so much that the Citi Bike home page lists it first under its collection of “Popular Rides,” noting that “it’s a scenic, easy ride, and car-free on weekends.”

The problem with these types of trips is that they make comparison impossible. There’s no way to tell if a forty-minute session was initiated by some EPO-fueled cyclist who ripped the bike from its harness and circumvented the park five times, his eyes bloodshot with excessive hemoglobin, or, alternatively, if a nine-year-old girl wheeled the bike over to a local ice cream stand, sat in the shade happily spooning a sundae for a half hour as chocolate dribbled on her knee, and then returned her ride.

In fact, the only route in the ten listed above that covers significant distance is the one that starts on 12th Avenue and West 40th Street and finishes on West Street and Chambers Street. The route runs roughly three miles and was biked over four thousand times in 2015. Its popularity isn’t surprising. The Hudson River Greenway features a dedicated bike lane, provides a charming vista of the waterfront, offers access to various adjacent parks, and serves a real purpose in an area of the city with poor subway coverage. It’s not the kind of ride you’d want to rush through.

But let’s pretend you did. Rather, let’s take our 4,315 Citi Bike rides between 12th Avenue and West 40th Street and West Street and Chambers Street, and see who covered the distance fastest. We’re pitting thousands of unknowing riders in a fictional bike race that artificially simulates a simultaneous start: it’s our 2015 Tour de Farce!

Google provides a map of the route, although our racers weren’t bound to take any specific roads. Citi Bike only logs the starting and finishing docks and not the course taken between, so technically our riders could have followed any number of paths along the way.

However, the reasonable assumption that most if not all of our bikers streamed down the Hudson River Greenway further supports our choice of this route for a race. The road’s situation on the edge of the island makes for minimal cross-town traffic and tame intersections, giving our riders more opportunity to showcase their raw speed without falling prey to the luck of a stoplight.

One final consideration is the subscription status of our competitors. Riders are defined either as “Subscribers,” who have paid for a year-long Citi Bike membership, or “Customers,” who have purchased a daypass. In general, Subscribers are responsible for the vast majority of Citi Bike use, accounting for about 86% of all 2015 rides. But our peaceful West Side jaunt attracts many single-use riders, and as a result Customers make up over half of the rides.

Still, we are going to exclude them. Why? Because the very thing that attracts Customers to our route also slows them down as they follow it. Namely, those taking the route for the experience are unlikely to speed through it, while Subscribers, less impressed by the familiar sights and more accustomed to the bike, don’t mind pushing the pace. As a result only nine of the route’s top hundred finishes — and none of the top ten — were achieved by Customers.

For a full perspective of the differences in speed between Subscribers and Customers, check out the pair of distributions below. Interestingly, they create essentially the same shape, down to the ripples in their tails, just at slightly different places on the horizontal axis. Such similarity indicates that whatever it is that slows down Customers does so in a uniform fashion across all types of rides, sapping a minute or two from both the fast and the slow.

[The horizontal axis is scaleless to preserve the surprise of the race results]

By focusing only on Subscribers, we can sort finishers by age and gender, since the personal information they provided when signing up online is beamed from their member key into the dock every time they take out a bike. Thus, our race better approximates an actual competition, where there are outright winners but also divisions so that men and women, young and old, can measure their performance relative to their cohort. So, let’s create a Spring Chicken division (born after 1984), a Dinner Party division (born between 1965 and 1984), and a Back In My Day division (born before 1965), for both men and women.

And our winners are...