Taxi and Ridehailing Usage in Chicago

Data updates from the taxi and ridehailing trip records on the City of Chicago’s open data portal. The code to calculate monthly aggregates is on GitHub

There are notes throughout the page with relevant definitions and caveats, plus more info at the bottom of the page

Jump to… Time, distance, and fare Tipping behavior Ridehailing shared trips Pickups by geography Near North Side to Lake View Loop to O’Hare Taxi vehicle-based metrics

This page won’t work without javascript enabled



Loading…

Average time, distance, and fare

Both taxi and ridehailing datasets contain some records with either missing or obviously incorrect data, e.g. a $1,000 fare for a 1-mile trip. I’ve attempted to remove these bad records, and the graphs in this section are based only on trips with “clean” time, distance, and fare data. The specific logic used to filter trips is available here on GitHub

Total farebox calculations assume that the bad records have the same average fare as the clean records

Each individual ridehailing fare is rounded to the nearest multiple of $2.50 in the raw data, so averages and aggregates should be considered estimates

Tipping behavior

Tip data is not available for taxi fares paid with cash, so taxi averages are based on credit card fares only

Ridehailing by shared status

Solo trips are when the rider did not authorize a shared trip. Shared trips are when the rider authorized a shared trip, and at some point during the trip they shared the car with another customer. Unmatched share requests are when the rider authorized a shared trip, but from the time they got into the car to when they got out, they never shared the car with another customer

trips are when the rider did not authorize a shared trip. trips are when the rider authorized a shared trip, and at some point during the trip they shared the car with another customer. are when the rider authorized a shared trip, but from the time they got into the car to when they got out, they never shared the car with another customer As an example, if Alice, Bob, and Charlie each requested a shared ride, and a single driver serviced their requests in this order: picked up A picked up B dropped off A dropped off B picked up C dropped off C that would count as 3 trips: 2 shared and 1 unmatched share request

that would count as 3 trips: 2 shared and 1 unmatched share request

Pickups by geography

Each geography bucket is a group of community areas. Distances are measured from the center of each area to the center of the Loop. For example, the “within 2 miles of the Loop” bucket includes the Loop, Near North Side, Near South Side, and Near West Side. See here for a map of all of the community areas and bucket definitions

Near North Side to Lake View, weekdays 4:00 PM–8:00 PM

Loop to O’Hare, weekdays 3:00 PM–6:00 PM

Taxi vehicle-based metrics

Vehicle-based data only available for taxis

Additional notes

Links to source datasets: taxi | ridehailing

The City of Chicago originally published the taxi dataset in 2016, but it was paused in 2017 due to data consistency issues, before resuming again in 2019. Even after the fix, taxi trips are likely undercounted between November 2014 and December 2015. See these posts for more info: 2016 release | 2019 update

The code to collect and process the raw data from the city’s website is available on GitHub

As of 2019, there are three licensed ridehailing apps in Chicago: Uber, Lyft, and Via. Note that the city refers to them collectively as “Transportation Network Providers”. The Chicago ridehailing dataset does not identify which company provided each trip

Taxi data updates monthly, ridehailing data updates quarterly

The city published an overview of some privacy-oriented measures it took when publishing both datasets

The graphs are built with Highcharts, and the underlying JSON API response is available here

Questions or issues: todd@toddwschneider.com

Some differences between the Chicago and New York datasets