Being a highly reliable commute partner for our customers, we at Cityflo ensure high accuracy and predictability of our vehicle tracking system. Showing an accurate arrival time to our users is a big contributor to the experience that our users expect from us.

The Problem

We use Google's Advanced Directions API to calculate travel time between two locations. Google is the default choice here considering the accuracy of their real-time traffic data. We used this API to calculate the ETA (Expected Time of Arrival) separately for every vehicle at each of its pick up stops by individually calling the API every time the stop location was registered. Until a couple of months back, this API was free to use.

After a change in their pricing structure, it would have cost us roughly $960/day just to show the ETA to our users.

This assumes 150 buses with an average of 8 pickup stops spanning over 20 minutes for each bus. This amounts to roughly 96000 API calls. Google Directions API costs $10 per 1000 API calls.

This was clearly not a feasible option for us, being a young startup. We could have brought it down marginally by decreasing our refresh rate of the traffic data, but it would have come at the cost of accuracy of arrival time. A sudden spike in the traffic would have reflected late in our system, which was not an acceptable experience.

We did explore some open source alternatives such as Open Street Map but it could not do what Google does. We decided to stick with the Google Maps platform and re-engineer our approach, instead.

Our Solution

Unlike cabs, our buses follow fixed routes and all stops in a region are covered by multiple buses in the fleet. We identified a pattern in the movement of our vehicles. . This pattern enabled us to compute travel time between adjacent stops, regardless of which bus was travelling. It was our route design that essentially helped us decouple the bus location updates and travel time computation for each bus, thereby reducing any redundant calculations.

If a bus connects two immediate stops, we call them adjacent.

For instance, assume a network of stops as depicted in the following picture.

A network of 6 stops in a region

Here, the pairs of adjacent stops would be (A, B), (B, C), (D, C), (C, F) and (E, F). By calculating travel time between adjacent stops, we are able to apply the same data for all vehicles. If a bus is at Stop A which will also go to Stop F, the arrival time at Stop F can be calculated by aggregating the travel times of (A, B), (B, C) and (C, F).

\[ t(A, F) = t(A, B) + t(B, C) + t(C, F) \]

It goes without saying that there is no silver bullet.

The catch here is what happens when a bus is somewhere between Stop A and Stop B, how do we calculate its ETA at Stop F?

We decided to solve this problem using linear interpolation.

If we can identify the relative position of a bus between two stops, we can estimate arrival time. For example, if a bus has crossed the half distance between two stops it is safe to assume that it will take another half of the travel time between those two stops to reach the other end.

The Directions API provides a polyline - a set of location coordinates in a specific order - between adjacent stops, which helps us pinpoint the location of the bus. To improve accuracy while interpolating, these polylines are adjusted to make them more homogeneous. For example, the polylines on straight roads are more sparse and vice versa. So, we ensure a maximum of 20 meters between two location coordinates to improve accuracy of information.

Identifying where the bus is on a polyline can be a complex problem if the stop network and bus movement are even slightly convoluted. For example, if a bus registers a location at a stop while travelling on the opposite side of the road then the system will fail. However, that story is for a separate blog post.

The Outcome

This approach made the API calls independent of the number of vehicles and dependent only on our stops, which helped us in scaling up our fleet with no additional cost. However, the cost here is dependent on the number of stops in the network. After multiple iterations of tuning and tailoring for our use case we observed at least 94% reduction in our cost for Google Directions API.

What's Next?

We’ve been able to get to a capable and usable system, but there’s still more work to do. We can further optimise it by altering the refresh rate of different stop pairs by observing a trend in the travel time between them. If the travel time variation between two stops is low, we can decrease the refresh rate and vice versa.

This was a very interesting problem for our product and engineering team to work on. This helped us maintain a high quality Cityflo experience while enabling constant innovation.