Enter Morgan Freeman

Our error checker analyzes agency real-time feeds (in real-time), identifying errors proactively instead of relying on user reports. If Robot Morgan Freeman discovers an error, he emails the agency automatically, detailing what’s wrong. But how does he discover real-time errors in the first place?

To publish real-time data, Transit relies on two different agency feeds: their real-time API (usually in a “GTFS-RT” format) and their static GTFS. The API publishes real-time locations and predictions. We match that real-time data to the agency’s static GTFS (which contains transit timetables, route info, etc.) By matching the API to the GTFS, we know which vehicle corresponds to which transit line and scheduled departure. If the API data and the GTFS data don’t match up… we’re in trouble.

To make sure there’s no discrepancies, Morgan Freeman pings one random trip on every agency feed, every few minutes. First, he checks if the API data is fresh (no older than 10 minutes). Then he checks that the API and GTFS data match up. If Morgan notices that something is wrong, he flags the API to check it for errors. Then, instead of pinging just one random trip, he starts pinging LOTS of trips on that feed. Usually, when there’s smoke, there’s fire.

So what sort of errors does Morgan encounter? For one, there’s mismatched trip-IDs. Trip-ID tags are used to identify specific trips in the data feed. They can differentiate between, say, a bus that’s scheduled to leave Port Authority at 08:00 vs. the one that’s scheduled to leave at 08:15. Sometimes, the trip-IDs published by the real-time API are different than the ones in the GTFS. This creates problems for Transit: if we can’t match the GTFS-RT data to the static GTFS trip data, we’ll know where a specific vehicle is, but we’ll have no way of knowing what route this vehicle is assigned to, or what trip it’s supposed to be on. So when you look up that line, you won’t be able get real-time data.

One common reason you get mismatched trip-IDs? Your agency is publishing real-time data from one vendor (the company that equipped vehicles with GPS), while the GTFS is getting exported by another vendor (the company that made the agency’s scheduling software.) These vendors don’t always play nicely together.

Mismatched trip-IDs are by far the most common errors we encounter, but other errors include empty API responses and bad URLs.

If we get an empty API response, it’s worse than a mismatched trip-ID: it means there’s no real-time data to match any more! These errors might happen if an agency’s server goes down. Then there’s bad URLs: when we try to download a transit file, its URL is either inaccessible, corrupt, or the data is unparsable.