Introduction

The question of service quality has been a central thread on this site more or less since its inception. It is not enough to have service on a street (or even in a subway or on a private right-of-way) if it shows up unpredictably, or can’t be used because it is overcrowded or short-turning before it gets to many riders’ destination.

For as long as I can remember, the TTC’s stock excuse for poor service was “traffic congestion” coupled with “it is impossible to provide good service with streetcars running in mixed traffic”. When detailed information about vehicle movements on the transit system became available, it was quickly evident that congestion was only one problem. Moreover, some bus routes on wide avenues exhibited service qualities almost indistinguishable from streetcars tethered to rails on narrow streets.

After a period when the Toronto supported more spending on transit to improve loading standards and hours of service, the city swung to the right treating transit service as a waste of taxpayer dollars. Despite cutbacks that could throttle demand, transit riding continues to rise, and with it the problems of service quality. Much of the service improvement we do see is funded not by subsidies but by fare revenue, not to mention by overcrowding.

The TTC has focused much effort the “soft” improvements — cleanliness, information systems and customer relations — but for the really important one — service they actually provide to riders — the jury is still out. The situation is compounded by budget constraints of the Ford/Stintz era, of just getting by with trims around the edges, but with no sense of a plan to make substantial improvements.

The time is overdue for a clear direction on improving transit service. The answer is not just to run more buses or build more subways, although service improvements are needed. We must also run the buses and streetcars we have more reliably.

The common thread through measurement schemes is that a transit system must be viewed from the passenger’s point of view. They are the people actually riding and telling their car-driving friends how good or bad transit is. In Toronto, at least, the riders are also substantially paying for the service.

How should we measure how the system is performing now and in the future?

For those who do not want to read to the end, no, I do not have a grab bag of solutions, a “right way” to do things. What we do need is a better understanding of how the system behaves at a detailed level — are there specific problems on individual routes that can be removed or at least lessened, and are there systematic problems with transit operations?

Some issues are external — there really is traffic congestion — but the question to answer is how we will deal with it. Will transit priority really take precedence at a possible cost to other road users? Some issues are internal — is there really enough service on the road, and could these vehicles be better managed? What improvements will riders accept with glee — service reliability — and which will they regard as “nice to haves” that don’t address the underlying problem that “my streetcar never shows up when I need one”.

Detailed reporting together with measurements that riders can understand are essential to maintain the transparency and credibility of a transit agency. One common element through this review of many systems and papers is that any measurements should be based on what the rider sees, not on management’s view and goals. The purpose should not be to trumpet how good Toronto’s transit is, but to find how to make it better.

Acknowledgements

Transit service quality has been a topic for others, notably Jarrett Walker and his blog Human Transit. For starters, his articles on on-time performance are worth a read including:

Several transit systems cited in this article have their own approach to measuring and reporting on service quality. Of these, the most extensive is found in London UK in part because monitoring their private sector operators requires detailed metrics people can understand and agree to.

A technical approach can be found in the Transportation Research Board’s Transit Capacity and Quality of Service Manual (2nd edition, 2003). Although the metrics proposed by TRB are more complex than most systems would be likely to implement, the underlying discussion makes several important points about aspects of service quality.

The American Public Transit Association Service Quality Handbook (revised 2011) builds off of the TRB report. It goes into great detail about the many factors affecting a rider’s perception of transit service, but sidesteps actually defining metrics for those factors. Moreover, it spends a disproportionate amount of time on organizational, big-picture, issues and the managerial focus drifts a bit too far from day-to-day reality for my liking.

Additional papers of interest are listed at the end of the article.

What the TTC Does

Every month, the TTC Chief Executive Officer’s Report includes a “scoreboard” showing the behaviour of various transit operating factors relative to their targets. Among these are reliability measures for the rapid transit lines, streetcar and bus systems. A subset of this information is published in a daily report on the TTC’s website.

The TTC’s system target is that service should operate within ±3 minutes of the scheduled headway to be counted as “punctual”. For reasons best known to the TTC, there is an inconsistency between the CEO’s Report and the Daily Report. The CEO’s report sets the standard relative to scheduled times while the Daily Report claims that it is relative to scheduled headway.

These are not the same values. Service that is on time will necessarily also meet the scheduled headway target, but not the other way around. Riders don’t care that buses are on time, only that they arrive at the advertised spacing and, for infrequent routes, at the advertised time. Every bus on a route may be 20 minutes late, but provide the expected level of service. For such an important measure, the TTC should at least be consistent, and routes should be managed to the alleged target.

For rapid transit operations, the target is to have 96% of trips within the target headway range. For streetcars and buses, the targets are 70% and 65% respectively. These values were not set by some clearly understood formula, but rather they are based on historical values of the indices. There has never been a discussion of what these targets mean in terms of service from the customers’ viewpoint, nor of the degree or type of change in operations needed to improve the values.

The underlying methodology of calculating these measures has not been published, but it is not hard to see that regardless of the numerical results, the values are a very crude way to measure system performance.

On the Subway

The subway target is 96% of trips ±3 minutes of scheduled headway, but:

Values are all-day averages combining observations at multiple locations. The TTC’s calculation weighs peak period service for 2/3 of the consolidated index even though it represents a minority of the total service hours and trips provided.

Riders do not experience average service, but specific levels of quality (or lack thereof) at specific times and places. The real question to be asked is “how often is a rider likely to encounter a problem in making their trip”, and this is not answered by a system-wide measure. If a route like the Yonge Subway is measured at several points, there is a good chance it is running “normally” at many of them even though a major delay may foul service in a specific segment. Good service at Wilson Station is of little use to someone trying to get through Bloor-Yonge, and conversely a delay at Bloor-Yonge may affect riders making a wide variety of origin/destination trips passing through that critical site.

Where frequent service is scheduled such as on the subway, a major disruption is required for headways to go beyond the 3-minute margin for an extended period. Even if there is a delay, only the first train carries a wide headway while those behind it follow as close as the signal system will allow and are counted as “punctual” trips.

There is no measure of service quantity. Trains are scheduled every 140 seconds at peak (25.71 trains/hour), but they are still “punctual” if they are 320 seconds apart (140+180). A service less than half of what is scheduled (11.25 trains/hour) would get a 100% rating from the TTC’s methodology. Service that runs at less than scheduled capacity, or which is overcrowded, cannot provide riders with the advertised headway. The “headway” seen by a rider is related to the length of time needed to get on a train, not for the first train to appear in a station.

On the Surface

Surface routes are quite another matter. The targets here are 70% for streetcars and 65% for buses. Headways measured at several points on routes should not be outside of the three-minute margin.

Data for all routes, locations and time periods are lumped together in one index. On a system-wide basis, 65-70% is not very impressive, but even this target can mask far worse service quality at specific locations and times.

The absence of forced vehicle spacing (as on the subway) plus wider headways means that surface routes can operate wildly outside of the headway targets.

Surface headways, especially outside the peak period, are long enough that ±3 minutes can be a relatively narrow band, but only for infrequent services. A 20 minute headway on the timetable may range from 17 to 23 minutes and yet be considered “punctual”. Running early is particularly bad for customers because they face a long wait if they miss a scheduled trip running “hot”.

With shorter headways, the six minute band makes bunching acceptable. For example, on a 5 minute headway, as long as vehicles stay within the range of 2 to 8 minutes they are “punctual”. (Alternating 9 and 1 minutes headways would be outside of the target range.) If riders arrive at an even rate at a stop, four times as many will accumulate in an 8 minute gap as in a 2 minute gap, and the vehicle they board will be much more crowded. The service will meet the TTC’s standard, but it is the 8 minute headway most riders will see.

As on the subway, there is no measure of service quantity. Instead of 12 vehicles/hour (a 5 minute headway), the service could be 7.5 vehicles/hour (8 minutes apart) and stay within the target. Alternately, there may be 12 vehicles in the hour, but half of them may short-turn at a location that is of no benefit to many would-be riders. This leads to wider headways for many riders and uneven vehicle loads. Nothing in the service targets makes any allowance for crowding nor for its corollary, pass-ups, where riders cannot board the first vehicle that arrives.

We are all familiar with the problem of a full bus immediately followed by an empty one. From a rider’s point of view, bunched service consists of pairs of an overcrowded bus followed closely by an almost empty one regardless of what the printed timetable might say. That empty bus is providing little real service and yet the TTC will count it in the route’s hypothetical capacity.

Very short headways can be little bonuses (a second bus coming just after you think you have missed one), but most riders arrive at stops in the big gaps, not the little ones. Experienced riders will try to board the first vehicle that arrives even with a second one in sight. If it is short-turned, at least they can drop back to the following vehicle. Greater assurance of getting to their destination takes precedence over getting on a less-crowded vehicle.

If the target for “punctual” service is only 65%, this means that fully 1/3 of the headways provided to customers are outside of the six-minute standard window. On a system wide basis, riders can expect to encounter one of these events on almost every round trip, especially if they transfer and are exposed to irregular headways more than once.

For the statistically minded, if the “punctual” trips are evenly distributed, and the probability of a trip being punctual is 2/3, then for a trip involving two vehicles, the cumulative probability is only 4/9, under 50%, that both will be “punctual”. For a round trip involving a transfer each way, the probability is a scant 16/81 or just under 20%. In other words, there is an 80% chance that a round trip involving a transfer between two bus routes will include at least one off-target gap in service. For a system designed around transfer connections, this is an appalling situation.

In practice, the actual distribution of “punctual” trips is not uniform. A notable issue, visible in many of the route analyses I have published here, is that headway adherence declines during evenings and weekends, and at the outer ends of routes where branching and/or short turns can leave wide gaps. The proportion of “non punctual” service may be terrible at some times and locations, but this is masked in the service measures by all day, all system averaging.

Measuring for the customer or for the company?



Toronto badly needs a way to report on service that reflects daily rider experience, and customer service should focus on providing reliability and the advertised quality and quantity of service.

Since horse-and-carriage days, the simple measure of “did the buses run on time” provided all the information needed for many transit operations . This is an observation at a single point, possibly a dispatch site where trip departures are monitored.

Hitting scheduled time points at major stops enroute could depend on an operation’s prevailing culture. This could be a goal shared by management and operators as an integral part of customer service and reliability, or it may be common practice that mid-route time points are neither monitored nor respected. Moreover, managing to headways requires that operators know where they are relative to nearby vehicles, but the information they are provided on vehicle consoles is relative to the schedule.

On routes with infrequent service (smaller transit systems and less important parts of the large ones), on time performance is the overwhelming requirement. Passengers plan their trips around the schedule, and on time performance is vital to system usability. Running early, a practice allowed by the TTC standard, is fatal because a rider may just miss their trip (particularly one calling for connections) and be faced with a long wait until the next bus. Think of the difference between a GO bus running every hour and a Finch bus running (nominally) every few minutes.

The ability to accurately track transit vehicle movements by GPS is a comparatively recent phenomenon, but this is only a tool, not a substitute for actually caring that service is reliable from a rider’s viewpoint. Indeed, the TTC implemented GPS not to manage service, but to drive a stop-announcement system mandated by a legal challenge about accessibility.

Despite a service target nominally based on headways, the TTC remains very much oriented to schedules because of the needs for operators (and by extension their vehicles) and their crew changes. The goal of service management is to keep operators on time and, if possible, to maintain a decent headway over at least part of a route. This typically brings short turns and ragged service to the outer ends of lines.

This is not the same as a scheduled operation where only half of the service goes to part of a route on a dependable basis. Moreover, the extremities of a route may have substantial demands in their own right depending on residential, office or academic land uses and travel patterns. Operators may be more or less on time and even on “punctual” headways at some central point on a route, but the service actually provided further out may be much below what is advertised.

Is there already a “standard” way to do this?

There is no industry standard per se although various attempts, some academic, some professional, have been made to create a framework.

A paper prepared for the TRB’s 2007 annual meeting [International Bus System Benchmarking: Performance Measurement Development, Challenges, and Lessons Learned; Randall, Condry and Trompet] observed that getting consistent data from the transit industry was quite challenging.

Service Quality Very few common comparators were found across the member organizations in measuring service quality. While it was expected that more subjective indicators, in the areas of information, driver courtesy, comfort and cleanliness, etc, would vary, it was surprising to find little commonality in measurement of time-based performance. Measurement of time-based performance is heavily influenced by the method of service operation. Many of the larger cities have much of their bus service operated on a frequency or headway rather than timetabled basis. Thus, such standard indicators as percent of trips on time are not recorded. Technology was a second important difference, with three of the benchmarking bus organizations fully equipped with AVL systems, which provide much better data in both quantity and quality. Other indicators for measuring service to customers included lost kilometers – the most common data element recorded. But again, not all organizations record this data. Another common indicator, missed trips, is measured by only half of the bus organizations. [p7]

Looking through websites of various transit systems, it is not uncommon to find quality measures based on departure times at subway terminals. The underlying assumption is that if the trains left on time, they will stay more or less on time over their journey. This is a tenuous link to a rider’s viewpoint especially if there are points of congestion enroute, delays caused by breakdowns, or if crowding prevents riders from boarding the first arriving train.

Some systems measure time relative to the schedule, while others look at actual vs planned train spacing (headway) with an upper bound on acceptable deviations.

A related measure is the percentage of trips operated, although these tend to be reported on an all-day basis that smooths out problems with specific locations and time periods. If the signal system permits trains to run much closer together than the scheduled headway, then service could be bunched almost like a surface route.

New York MTA

New York reports various performance factors with current and historical data online. Measures available for New York City transit at the route level include:

On time performance at terminals.

A measure of wait assessment (headway) defined as the proportion of trips where the actual interval between trains is not greater than scheduled by more than 25%.

For bus routes, a measure of trips completed relative to scheduled.

Chicago

The CTA has a page dedicated to Performance Metrics with a lot of historical data. (Example: September 2012)

Among measures of interest:

Number of rail system delays greater than 10 minutes. This is an absolute count that expresses major outages with a number riders can understand (delays per month) rather than average on time performance. The important issue is that delays happen, not that, on average, most trips are on time. Historical data series would be affected by major network changes (a new line, for example), but these are rare events.

Percentage of rail system that has slow orders. This reflects a system with a serious backlog of infrastructure maintenance. Slow orders delay riders and can cause bunching of trains if the scheduled service is close to the lower bound imposed by the signal system. (Constant physical spacing of trains at lower speeds means wider headways.)

“Big gaps”. Percentage of bus headways that are double the scheduled interval or over 15 minutes. This metric is not subdivided by route, location or time of day.

“Bunching”. Percentage of bus headways that are 60 seconds or less.

Back in 2006, the CTA defined performance measures and included the following important observation:

For service periods with headways 10 minutes or less:

Customers expect to board service shortly after arriving at stop/station.

In these periods, reliability means HEADWAY CONSISTENCY. For service periods with headways 10 minutes or more:

Customers rely on schedules to time their arrival at the stop or station to avoid long wait times.

In these periods, reliability means SCHEDULE ADHERENCE.

In other words, one metric will not do when measuring service. Most CTA customers traveled on services that run every 10 minutes or less, and even more when the filter is extended to 12 minutes. (This may reflect service levels in 2006 without the effects of recent budget-induced cutbacks.)

A graph of running time distributions on page 10 warms my heart because it is exactly the sort of analysis I have been publishing here. Without question, surface routes have big problems with running time reliability, although these vary by time of day. One issue for CTA is on time departures from terminals, although they managed an improvement up to over 80% in January 2006. This is also a problem in Toronto where uneven headways begin with terminal departures that may lie inside the TTC’s target 6-minute band, but which actually result in bunching that can travel the entire length of a route.

Dealing with route-level problems requires a route-specific approach where the sources of delay and uneven running times can be analyzed in detail.

Boston

The MBTA publishes a monthly scorecard with past versions available online. Detailed breakdowns for rapid transit routes, and for the bus and commuter rail systems can be viewed by scrolling down.

Subway on time performance is measured near the terminal stations, and trains must be within 1.5 times the scheduled headway. This is tighter than the TTC’s standard (it would yield a window 1’10” on either side of a 2’20” schedule at peak, or a 2’00” window either way on a 4’00” off peak service). However, there is no sense of whether the trains remained acceptably spaced as they travelled along their routes.

The number of trips operated acts as a stand-in for vehicle reliability and availability. As with many other metrics, this is an all-day figure and does not show whether all peak trains actually ran.

On the commuter rail system, a train is considered to be “on time” if it is not more than 5 minutes late at its destination.

Speed restrictions are measured as minutes of delay with no reference to the proportion of the system under slow orders as in Chicago.

Notable by its absence is any reference to surface operations beyond basic stats for the bus fleet and service.

Like Chicago, the MBTA distinguishes between the behaviour of riders on frequent and infrequent routes with the rapid transit lines measured relative to scheduled headways while commuter rail is measured relative to scheduled times.

San Francisco

San Francisco combines transit and traffic operations under a Municipal Transportation Agency. However, most of the “service standards” the agency reports relate to transit services.

Both schedule and headway adherence are reported with greater weight given to routes with more riders.

Like Toronto, San Francisco has target load factors for vehicles, but they also report on the number of peak runs that are exceed these targets by 25%. In Toronto, such problems vanish by averaging all riding over all peak trips whether anyone is on them or not.

The “on time” standard in San Francisco is +1/-4 minutes. This type of uneven window is common on other systems where some degree of lateness is tolerated, but being early by more than a trivial amount is not acceptable.

On January 3, 2012, the SFMTA Board approved their 2013-18 Strategic Plan. The City has a Transit First policy and this is reflected in the priorities of the plan.

The most noticeable improvements from this plan will include a faster and more reliable transit system, better bicycle and walking conditions for all age groups, easier access to taxis, more vehicle and ridesharing options, smarter parking solutions and more convenient payment and information options.

This is not just a transit policy, it is a transportation policy and this fundamentally changes the context in which transit quality discussions occur.

Goals of this plan include a reduction of bunching and gapping on the “rapid bus network” (defined as headways less than 2 minutes, or more than 5 minutes over the scheduled value).

Monthly reports on progress toward the goals are available online (November 2012). On time performance and scheduled departure from terminals are both reported, and these numbers are not very pretty. San Francisco is now using NextBus data to allow for automated collection of this information replacing on-street supervisor surveys used previously.

SF Muni also provides daily reports of service including vehicle and operator availability and details of major delays. Given the relative size of Toronto and San Francisco, and the intensity of transit operations here, such a report would be considerably longer for Toronto. The daily report does not include any line-level review of service quality.

However, detailed studies of some routes have been conducted under the Transit Effectiveness Project. These are micro-level reviews of problem areas along routes in which community involvement is essential for understanding of local effects and acceptance of changes. The intent is to fine-tune the operating environment of major routes so that travel times will be reduced and service quality improved. An overview of the program was presented to the SFMTA Board in November 2012.

Washington DC

The Washington Metropolitan Transit Authority (WMATA) publishes a summary page and monthly reports (November 2012) of various performance indicators. All operations are measured relative to schedule rather than on a headway basis. The window for “on time” performance is +2/-7 minutes, but even with a fairly generous definition, WMATA’s bus network barely gets above the 75% mark on an all-day basis. The rail network does better at around 90%.

I was amused to find an article on an advocacy group’s website (Greater Greater Washington) about the limitations of WMATA’s service quality measures and the fact that the generous window for “on time” could lead to badly bunched and gapped service just as it does in Toronto. The writer’s preference was for a shift to London style of reporting that looks at headways relative to scheduled values. This would measure service as riders care about it rather than from the management point of view.

London, UK



London is the granddaddy of transit systems (the Underground just celebrated its 150th birthday). They have been carrying huge numbers of riders around a large, complex city for a very long time. In recent years, much of their operations were contracted out to private companies, although famously the attempt to do this with the Tube system was a complete failure. With many separate companies providing service, the ability to monitor and report on their performance is an essential part of system operations.

Standards developed in London have been extended throughout the UK where comparable needs to monitor private bus operations. This is essential both for contract management and to establish a history of service provider quality and attention to improvement. The national target for bus operations is that 95% of trips should depart from time points (locations where service should appear at a specific time) within a band of +1/-5 minutes. The standards recognize that this may not be possible in all circumstances, but it is the target at which providers should aim.

This is further refined for terminals as:

Frequent routes (10 minute headway or better): In 95% of cases there will be 6 or more buses per hour, and no gap of greater than 15 minutes will occur.

Less frequent routes, 95% of the trips should depart within the +1/-5 minute window of the advertised time.

At midline timepoints, the rules are different:

Frequent routes are measured by the Transport for London yardstick of “excess wait time” (described below) The degree to which waits (i.e. headways) are longer than planned should not exceed 1¼ minute.

On less frequent routes, 70% of the trips should depart within the +1/-5 minute window of advertised times.

Penalties are visited on companies that fail to meet the standards.

Excess Wait Time is calculated based on the difference between the expected wait (one half of the advertised headway, on average) and the actual wait time. If the schedule says a bus should appear every 6 minutes, then the expected wait is 3 minutes. A bus arriving in a 10 minute gap will contribute 4 minutes of excess wait time. Buses running close together are not counted and this may actually under-report the effect of the lateness. (That 10 minute gap could be followed by 3 buses, but most passengers will experience the longer-than-expected wait.)

One scheme proposed to “fix” this and penalize service for wide gaps is to use the square of the excess wait time. In this case, that 4 minutes would become 16.

The underlying math works like this. If riders arrive at a stop more or less uniformly, then the number of waiting riders is a function of the gap between buses. The longer the gap, the more the riders. Their waiting time is itself a function of that gap. Squaring the wait adjusts the effect to give more weight to long gaps than to short ones.

TfL reports on its bus operations in some detail with reports subdivided by Borough. (Scroll down below the list of boroughs for definition of the measures used.) Historical values for service quality and service operated are available for every route although these are summarized at a 4-week level rather than showing the range of daily fluctuations. A summary produced each quarter includes measures such as the ratio of average to scheduled wait times and percentage chances of having to wait 10, 20, 30 minutes or more for what should be a “frequent” service. These data are not subdivided by time or location, although it should be embarrassment enough to have a “frequent” route with less than 90% of the service matching that description.

For less-frequent services, schedule adherence takes priority because people expect service to arrive according to the advertised timetable. The treatment here is completely different from that for frequent services and measures include percentage chances that a trip is within the on-time window (+2/-5 minutes), the chance that a bus will be missing, the chance that a bus will be early, and the chance that it will be late. Very late buses (over 15 minutes) are treated as “early” for the next scheduled trip, and in some cases, a “late” trip indicates that a bus is missing. These measures are much more meaningful for less frequent services than the headway-based measures used on frequent lines.

One obvious, but unanswered, question is what happens when a line is sometimes “frequent” and other times “less frequent”. If this model were applied in Toronto, then there would have to be distinctions in the measurement and management regimes depending on the level of service. Moreover, routes with branches could be “frequent” on the common section, but “infrequent” elsewhere.

For the underground, TfL reports a wide variety of measures and breaks these down to individual lines. (Period 6, 2012) One important concept used here is the “journey time”, a value calculated by actually traveling on the system and measuring the time required for various standard trips. This will include station access time (can be affected by construction, congestion, out of service escalators, etc), platform time (can be affected by headways and by train capacity), and travel time (can be affected by slow orders or service problems enroute).

“Reliability” of devices such as escalators includes those out of service for planned maintenance because this is the view riders have of the system. They really don’t care that an escalator or elevator is under scheduled maintenance, only that the station has become less accessible than expected. To put it another way, if we have to shut off an elevator for two months a year for regular maintenance, then it is not available anywhere near 100% of the time.

This brings me to another observation about how various systems report problems. In some cases, they are subdivided between “chargeable” (our fault) and “non-chargeable” (not our fault) events and only the former are reported. This may give an idea of how often service is interrupted for preventable reasons, but this gets tricky when scheduled maintenance isn’t counted.

Management wants to know what problems they might better control, but riders don’t care when they face a long walk up or down stairs. When accessibility is considered a right, the management decision to stretch out repairs by scheduling only one crew to work 40 hours a week could be seen as not making a “best effort” to keep the system accessible.

As on the surface network, the underground reports show the effect of monitoring contractor performance in the (now-abandoned) private sector arrangement. Delays caused by track, switching and signals are reported as these show the degree to which lack of maintenance can affect service quality.

Finally, there is a “Lost Customer Hours” measure which includes all events (except scheduled service outages for repairs) where service is delayed for more than 2 minutes regardless of the cause. The detailed breakdown of causes by line is interesting because there are wide differences from one underground line to another reflecting fleet and infrastructure conditions.

Fleet and Infrastructure Issues

I have omitted most references to fleet and infrastructure related measures in the survey above because the primary interest here is on service quality. However, fleet and infrastructure have their effects including:

Trains that break down in service cause delays and gaps when they are removed from the line.

Trains that are not available for service cause actual capacity to be less than planned or advertised.

Track that is in poor condition requires slow orders that annoy passengers, cause backlogs of trains on busy sections of a route, and limit the minimum headway possible due to constraints of a fixed block signal system.

Signals and switches that fail frequently can cause significant service disruptions up to complete closing of sections of a route.

Fleet numbers tend to be reported on two common bases:

Mean mileage to failure, and

Availability for service.

The mean mileage to failure numbers vary somewhat from system to system, and are probably best read as an historical track within each operation (or type of equipment) rather than as a comparison between systems. The reason for this is that some operators have different rules about what constitutes a “failure” and might, for example, not include minor incidents such as a jammed door provided that the delay was short and the train remained in service. This comes back to the concept of a “chargeable” incident I mentioned earlier.

Availability for service means just what it says, but this requires more than the scheduled number of buses at each garage. If there is a probability that, say, five buses will fail in service on a garage’s routes, then there need to be spares available to replace those buses that have gone bad order. This is challenging on a system such as the Toronto streetcar network where, unless a route is currently shut down for track work, the working fleet is too small to provide for extras.

Spare vehicle pools need to be subdivided between those vehicles that are available, but not used unless a change-off is needed, and those that are in the shop for minor or major repairs. A high requirement for maintenance spares could indicate that a class of vehicles is not as reliable as it should be, or that there are “problem children” that rarely get out of the shop. Either way, the capital investment in equipment is not producing the service it should, and it may require a disproportionate amount of maintenance staff and cost to keep such vehicles on the road.

The Transportation Research Board’s Quality of Service Manual

This discussion refers to Part 3: Quality of Service in the manual.

The TRB observes that there is a lack of standardization within the transit industry, and proposed the adoption of a scheme of “Level of Service” (LOS) comparable to that used in highway planning. The A-to-F levels of service are well understood by highway engineers (and by some politicians) at least because they have a common foundation throughout the industry. A road is a road more or less anywhere although one could argue that the standards by which the performance of a road is measured could be quite subjective depending on one’s overall goals.

The TRB distinguishes between “performance” — how well a service attains some goal — and “service quality” — how the service is perceived by a rider. Service measures represent the passenger’s point of view, the actual experience, and they “should be relatively easy to measure and interpret” [definitions, ch. 1].

Having proposed a standard way to express “quality”, the TRB promptly abandons prescription of industry-wide standards. LOS values depend on local factors — a city must make rational decisions about what each level and factor means. Just as a headway variation may be acceptable in one city’s standards, it may be totally off the mark in another city. However, “local options” can lead to problems both in industry comparisons and with localized values that award relatively high grades for performance matching political or budgetary constraints rather than a true goal of improved transit. City Councillors and transit managers do not like to get a report card full of “D”s, and there can be pressure to tweak any standard to “improve” reported performance.

This is obviously counterproductive. Better that a city says “yes, we know our system is only running at level C most of the time, but that’s what we chose to implement”. Such a statement rarely comes out of any politician’s mouth.

There is nothing wrong with local standards as long as they are recognized for what they are. The TTC has standards for its service design, and these have fluctuated depending on prevailing political winds over the past decades. It is easy to say “we meet our standards” when those standards can be adjusted to circumstances.

This discussion addresses only fixed route networks as that is most applicable to large urban systems like the TTC. Demand responsive systems have a separate proposed set of metrics, but they are beyond the scope of this article.

Service quality measures are divided into two main groups: (1) availability and (2) convenience and comfort.

The proposed system recognizes that a transit route has different components which need different quality metrics. These are transit stops, route segments/corridors and the network of which a route is part. A full list showing possible ways one might measure a transit system or service appears on page 3-4 of the document. These factors are subdivided into eight groups including availability, service monitoring, travel time, and capacity. I will not attempt to work through every one of them.

The report notes that those values which are of more interest for internal management of a transit system are more likely to be tracked than those of interest to passengers. This is partly caused by US government reporting requirements that focus on the management side of transit, and partly by the obvious self-interest of agency management groups. Moreover, the ability to track fine-grained service quality automatically is still not widely available in North American systems, and certainly didn’t exist when the data collection procedures of many systems were developed.

Availability comprises four key factors. Quoting from the TRB document:

Spatial availability: Where is service provided, and can one get to it?

Temporal availability: When is service provided?

Information availability: How does one use the service?

Capacity availability: Is passenger space available for the desired trip?

These are amazingly simple questions, but transit services organized around budgets may do poorly on some or all of these factors.

Comfort and convenience factors include:

How long is the walk? Can one walk safely along and across the streets leading to and from transit stops? Is there a functional and continuous accessible path to the stop, and is the stop ADA accessible?

Is the service reliable?

How long is the wait? Is shelter available at the stop while waiting?

Are there security concerns—walking, waiting, or riding?

How comfortable is the trip? Will one have to stand? Are there an adequate number of securement spaces? Are the vehicles and transit facilities clean?

How much will the trip cost?

How many transfers are required?

How long will the trip take in total? How long relative to other modes?

Service delivery factors include:

Reliability: how often service is provided when promised;

Customer service: the quality of direct contacts between passengers and agency staff and customers’ overall perception of service quality;

Comfort: passengers’ physical comfort as they wait for and use transit service; and

Goal accomplishment: how well an agency achieves its promised service improvement goals.

Note that service reliability comes in as its own point, and as part of an agency’s credibility in providing and improving service. Staff may be friendly, stations may be clean and buses may be only comfortably full. If the service is unreliable — even worse if it is demonstrably less reliable than an agency claims — then all the fine words about goals and communications goes for nothing and may even be counterproductive.

I will leave to the dedicated reader a detailed review of this document, but will give a few examples of the use of letter grades for Level of Service metrics.

The report notes that consolidation of many measures into a compound index may simplify life for readers (not to mention managers and political overseers with limited attention spans), but in the process vital detail is lost.

“Although indexes are useful for developing an overall measure of service quality, the impact of changes in individual index components are hidden. A significant decline in one aspect of service quality, for example, could be offset by small gains in other aspects of service quality.” [pg 3-23]

I would go even further and stress that even individual metrics, if summed across routes and/or different operating periods, will mask problems. A way is needed to perform anaylsis at the detailed level, but to report it on a summary basis, possibly by saying “X percent of the detailed metrics fall below target and here are the really poor performers”.

A simple example of LOS metrics applies to headways and their effect on perceived convenience of service.

LOS Avg. Veh/h Comments Hdway (min) A <10 >6 Passengers do not need schedules B 10-14 5-6 Frequent service, passengers consult schedules C 15-20 3-4 Maximum desirable time to wait if bus/train missed D 21-30 2 Service unattractive to choice riders E 31-60 1 Service available during the hour F >60 <1 Service unattractive to all riders

Many TTC routes operate at LOS “A” on paper, but the actual service on the street may be quite different. Also, the level even on major routes may fall into the “B” and “C” ranges at off-peak periods especially late evenings and weekends.

The LOS metric for the planned service must be combined with other measures of actual service operated. [See schedule and headway adherence metrics on p 3-47 and 3-48.]

Schedule adherence is expressed as the probability that a rider will encounter an off-schedule vehicle ranging up to level F. At this level, at least one transit vehicle will be late every day if a round trip involves four segments (one transfer connection each way). These probabilities flow directly from actual measurements of service operations as I discussed in an example on the TTC much earlier. If less than 75% of trips are on time, then the probability is very low that a rider will make four trips (there and back again with a transfer each way) without hitting a late vehicle. What riders see all the time is a dysfunctional route or system.

For headway adherence, a different scheme is used, but again it is driven by real data. In this case, the co-efficient of variation is calculated and this can be directly related to the probability that a given headway will be at least 1.5 times the scheduled one. Once this probability exceeds 50%, then most service is running in bunches. The actual headway experienced by riders is much worse than advertised with attendant problems of uneven vehicle loading and the almost inevitable short-turns driven by a desire to get vehicles back “on time”.

The schema proposed by TRB does not appear to have been implemented anywhere given both its complexity and the need for local agreement on definitions of service levels. However, the underlying discussion contains many useful guides and should prompt questions of any transit agency about just what it should be trying to measure and how it might achieve this task.

Other Papers of Interest

Critical Measures of Transit Service Quality in Various City Types

This paper reports on a 2005 study in Gyeonggi Province, Korea, which includes the capital, Seoul, but also many smaller cities with varying characteristics. The intent was to discover the type of factors that make transit attractive (or not) to people in these cities and whether there was any major difference by type of city (population, industrialization, etc.) The sample size is fairly large overall (2,397) although it is smaller for each individual city.

Factors which generally rated highly were those related to service level and reliability, fares and the “friendly” factor (staff interactions, accessibility, courtesy of other riders). Notably, the factor “Reliable trains/buses that come on schedule” ranked high in importance (9 out of 10), but low in satisfaction (2 out of 10).

There is not much “news” here beyond learning that concerns about transit are similar on the other side of the world, but the methodology of determining what is both important and well or poorly done is useful in focusing improvement (and in keeping what is already good) where it will have the greatest effect.

Valuing Transit Service Quality Improvements

This 2011 paper from the Victoria (B.C.) Transport Policy Institute reviews the factors that affect transit’s attractiveness and how these might be applied. Of particular interest is the notion that making a trip more comfortable (less crowded, more convenient) can produce comparable improvement to reductions in travel time.

This is not surprising when one considers that transit riders assign a high penalty value to unpredictable events such as waiting for a bus or transferring between routes, and their perception of a journey is affected by how easily they can board and comfortably ride.

This has important implications for planning since time costs are a dominant factor in transport project evaluation. Conventional evaluation practices tend to ignore qualitative factors, assigning the same time value regardless of travel conditions, and so undervalue service improvements that increase comfort and convenience. Yet, a quality improvement that reduces travel time unit costs by 20% provides benefits equivalent to an operational improvement that increases travel speeds by 20%. [p. 2]

The paper uses a metric of “dollars” in the sense that any factor of transit service has a real or perceived cost (the cost of delay, the cost of congestion, etc.) and in some cases, riders may be willing to pay more to improve attributes of the service. This methodology fails, in my view, to recognize that riders do not directly bear the cost of whatever service they may use, and moreover, the value of an improvement must be considered in the context of the ability to pay. The actual funder of much of transit (especially capital) is the general public through some form of taxation, not the individual who may or may not benefit. I as a heavy user of transit may benefit from and value better service, but someone who commutes by auto and regards transit as something for “other people” will not place the same value on service-related spending.

That said, there is much in the Victoria paper worth reading and in its encouragement of a wider view of transit’s attractiveness than factors measured only by expenditures.

Understanding Bus Service Reliability : a practical framework using AVL/APC data

This 2006 paper written by Laura Cham as part of her Masters program at MIT reviews in great detail the analysis of route operations using data from automatic vehicle location and passenger counter systems. Boston’s “Silver Line”, a nominally BRT implementation, is reviewed in detail. Cham finds many of the same problems we have seen in analyses I published here including variations in terminal departure punctuality as a major source of unreliable service.

This is a paper for those who want to read about analysis of real-world service data in great detail.