Have you ever witnessed the taxi waiting area at airports? I’ve often wondered whether it makes sense for a driver to wait in that line or to ditch the airport after dropping off a passenger and search elsewhere for their next trip. Now, there’s finally some data that can provide insight into that question. That’s the purpose of this post.





Where’s the data come from?

The City of Chicago published data on taxi cab trips for 2016 on Kaggle. Other cities have done the same, but this dataset had an added feature that made this analysis possible: anonymous taxi IDs. This allows you to track a taxi’s route progression over time. In other words, you can see what taxi’s do after they drop off passengers at the airport.

A side note on privacy: Chicago made the IDs anonymous, and the geographic information on the pickup and dropoff locations is only provided at the Community Area level (basically a neighborhood). There are no street level addresses used. But fortunately, one of the Community Areas is exclusively O’Hare Airport, which allows for this analysis to be performed.





Overview of the data

Airport taxi trips that start at O’Hare airport are a bit unique. In 2016 they accounted for about 5.6% of total trips, but they accounted for about 17.5% of total fares and about 22.4% of total tips. So clearly these trips are valuable when viewed through this lens.

Seen another way, the histogram below shows the frequency of trip fares by their pickup location. Taxi trips that originate at O’Hare airport have much higher fares than trips that originate elsewhere.

But despite that fact, not all drivers wait at the airport after dropping off a passenger. In fact, about 47% of drivers leave the airport instead of waiting in line to pick up a new passenger. The chart below shows the average daily number of drivers that leave versus wait at the airport in search of their next trip.

The chart also shows us when people get dropped off at the airport. There are significantly more taxi trips to the airport during the week compared to the weekend. And the dropoffs are concentrated during the daytime hours, with a bump in the late afternoon during the work week.

Note that the data captured in the chart reflects the data after removing trips with bad information. For instance, some trips to the airport show fares of $0, so those were removed. There might be rides that show a trip duration of 0 minutes. Those were also removed. The list goes on. So the chart is more of a floor of the daily frequencies. The real values are likely higher, though the distribution between different time periods is likely the same.





Wait times

So what’s the catch? Well, to get the average $43 fare for a ride that originates at O’Hare, the taxi driver has to wait in the queue (actually, according to the data there are drivers that skip the queue and drop someone off and pick someone up within 15 minutes, but I’m assuming that’s not how it’s supposed to work). And that wait can be quite long. On average, it’s about a two hour wait for the next pickup after dropping a passenger off at the airport. The chart below is a histogram of the wait times.

However, the wait time varies depending on the time of the day and the day of the week. The chart below shows the average waiting time along with the +/- one standard deviation region shaded in light blue.

Recall from earlier that there aren’t many pickups in the evening after midnight, so there is very little shading during these time periods.





So what’s the best strategy?

Now we know that taxi trips from the airport to elsewhere in Chicago are valuable trips, but that they require a driver to wait for two hours on average. So, does it make sense to wait or leave the airport and look for fares elsewhere? We can use the data to figure out what the best course of action is on average.

First, we can split the drivers that drop off at the airport into two categories: (1) Drivers who wait in the line, and (2) Drivers that leave the airport and search elsewhere for fares.

Then, for every driver that waits at O’Hare, we can figure out both (a) how long they wait in line, and (b) the duration of their next trip. We can take the sum of (a) and (b) to figure out the total time it takes for a driver to wait in line and complete their next trip. We can then take the average of these occurrences for each hour of each day of the week.

Next, we can then use those averages to figure out how much a driver that falls into group (2) earns during that same time period. Again, we can do this for each hour of each day. The chart below lays out these two scenarios.

Finally, we can take the total fares earned during the time periods and come up with an hourly rate to normalize any differences in time durations since we saw that at different times of the day the wait times at the airport vary.

After crunching the numbers, it does make sense for a driver to wait. A driver will on average earn more in fares by waiting at the airport compared to leaving, especially during the work week. This result is statistically significant for most periods. The difference is generally not statistically significant in the early morning hours where there aren’t a lot of sample points, and during Thursday morning where it’s basically a wash between the two choices. The chart below plots the average hourly fares for each scenario.

Note that the hourly figures late at night spike. This is from a very small sample set where the waiting time is sometimes 15 minutes or less. We're talking about only a handful of rides during these windows, as opposed to hundreds or thousands of samples during the day.

Looking at tips, a driver also earns more by waiting than leaving, on average. The chart below plots the average tips for each scenario.

Overall, waiting at O'Hare earns the driver on average an extra $13.80 in fares and tips relative to leaving the airport. The long wait time is more than made up for by the higher fare and tip relative to leaving the airport and getting more shorter trips.

Also keep in mind that waiting at the airport versus leaving has implications on expenses. Waiting at the airport doesn’t use fuel and doesn’t put wear and tear on the vehicle. It also provides the driver time to do other things, so there is some additional utility of waiting that isn’t captured, as well as additional costs of leaving that aren’t captured in this analysis.





Additional points

A few words on some of the assumptions and other points.

First, the timing data for a trip's pickup and dropoff is only provided in 15 minute windows which will introduce some error vs. the true underlying data.

Second, the City of Chicago imposes a $4 tax on rides that begin at O’Hare. In the data, the $4 tax is accounted for separately from the fare and tip, so it does not have an impact on the analysis as shown from the driver’s perspective.

Third, I had to pick a cutoff when determining how long a driver waited at the airport. I chose five hours. There could be instances where a driver waited longer. There could also be instances where a driver dropped someone off, went and had lunch, and then came back to wait at the airport, which would make the wait time appear longer than it really was. Ideally there would be data for when the driver was on or off duty, but that isn’t available so I had to draw the line somewhere.

Fourth, for drivers that left the airport, I excluded instances where there were no pickups in the relevant window. This might occur, for example, if the driver went off duty after the airport dropoff. There’s no way to know for sure without more information.

Finally, and related to the last point, I excluded sequences where a driver left the airport but did not have a ride that ended within 80% of the relevant window. This is to try and control for instances where a driver may have made one more trip and then went off duty. The chart below tries to illustrate this point. This drastically reduced the number of available data points for drivers that left the airport.