January 24, 2025

The MBTA’s 2024 Service Delivery Policy: Changing how Success is Measured

On December 19, 2024, the MBTA adopted a new version of its Service Delivery Policy (SDP) in an effort to update how the MBTA measures the quality of its transit service against agency objectives. In this post, we use the Fall 2023 service period as a case study, calculating how Fall 2023 service would be evaluated under the old policy (the 2021 SDP) compared to how it would be evaluated under the new policy (the 2024 SDP).

Introduction

On December 19, 2024, the MBTA adopted a new version of its Service Delivery Policy (SDP) in an effort to update how the MBTA measures the quality of its transit service against agency objectives. This new version of the SDP introduces several new standards and makes various changes to existing standards, all aimed at ensuring that the agency’s public-facing performance measures better reflect rider experiences.

OPMI was heavily involved in the development and calculation of these changes to SDP standards, which range from minor definition adjustments to major overhauls of performance measures, including some changes which are made possible by new data collection systems. We’re proud to share a roundup of the SDP metrics which have been updated or added in the 2024 version of the policy, with a focus on answering the following questions:

In this post, we use the Fall 2023 service period as a case study, calculating how Fall 2023 service would be evaluated under the old policy (the 2021 SDP) compared to how it would be evaluated under the new policy (the 2024 SDP). However, it should be noted that these new metrics, computations, and/or standards did not exist when planning for or operating service in Fall 2023. Hence, these calculations provide a useful baseline for judging future service, but should not be used to evaluate the quality of the effort put into planning or operations over this period. Note: this blog post will not discuss the metrics for which data collection is still in progress, since calculation methods are being finalized.

Metric What's Changing?
Heavy Rail Reliability Change standard to use Excess Trip Time (ETT)
Bus Reliability Change standard to consider cancelled service
Bus Frequency Change standard for Frequent Bus to 15 min. at all times
Platform Accessibility Change how elevator closures with shuttling are evaluated
Heavy Rail Comfort Add as new metric
Bus Stop Accessibility Add as new metric (data collection still in progress)
Ferry Boat Accessibility Add as new metric (data collection still in progress)
Ferry Dock Accessibility Add as new metric
Span of Service Change which portions of the first and last trip are evaluated
Green Line Reliability Change standard for headways at Green Line trunk stops

Metric Comparison

Heavy Rail Reliability

One of the most important changes in the new policy is the introduction of the Excess Trip Time (ETT) metric, which overhauls how the MBTA measures service reliability. This new metric is being rolled out to heavy rail (Blue Line, Orange Line, and Red Line) first, but we are setting up the data processes necessary to evaluate other service modes using ETT as well. There are two major ways in which Excess Trip Time improves on our previous reliability metric and paints a more accurate picture of transit reliability: it benchmarks MBTA performance against an “ideal” level of service rather than against current schedules, and it captures full trip times rather than only evaluating passenger wait times. For more information on ETT, please visit the “Introducing Excess Trip Time” blog post.

Our previous reliability metric, On Time Performance (OTP), is a percentage of actual departures or headways that are on time relative to their scheduled departures or headways. Because service schedules regularly change in response to operational considerations (e.g. operator availability), OTP is susceptible to diverge from rider expectations of service reliability because changing schedules can effectively “move the goalposts” of performance. Additionally, OTP is narrowly focused on how long passengers wait for transit vehicles to arrive prior to boarding, meaning that delays during in-vehicle travel time do not factor into its scoring of reliability.

MBTA Red Line performance over the past 3-4 years provides a strong illustration of the difference between ETT and our previous reliability metric, because during that time, the MBTA experienced both operator availability challenges and speed restrictions which led to schedule adjustments.  Our previous reliability metric, OTP, remains mostly flat during this time, because even though headways got longer in 2022 and 2023, they were evaluated against adjusted schedules, and longer travel times did not directly factor into OTP.  ETT, meanwhile, decreased sharply in response to the introduction of speed restrictions, before recovering in 2024 in response to the removal of speed restrictions and the improvement of operating headways (Figure 1).

A line chart showing the percentage of passenger trips passing the previous OTP metric versus the new ETT metric for the Red Line from December 2021 to November 2024. The reliability rate using the OTP metric remains relatively flat, ranging from 80% to 90%. In contrast, the reliability rate using the ETT metric decreases sharply following the introduction of speed restrictions, reaching a low of approximately 15% in April 2023 before recovering in 2024.
Figure 1 - OTP vs ETT: Red Line (click to enlarge)

The Fall 2023 service period would receive significantly lower reliability scores for heavy rail using the new ETT methodology than it would receive using the previous OTP methodology, as seen in Figure 2. Speed restrictions and slower schedules remained widespread in Fall 2023, so the fact that ETT captures these sources of delay better matches the way heavy rail riders experienced service reliability during this time. An analysis of differences among the MBTA heavy rail routes shows that the lowest scores were on the Red Line, where speed restrictions were more prevalent than on the Blue or Orange Lines, and where passengers tend to take longer trips, making the 5-minute threshold more ambitious.

A grouped bar chart showing the reliability scores for Bus, Green Line, and Heavy Rail, categorized by time periods (weekday, Saturday, Sunday) during the Fall 2023 service period. Diamond symbols on each bar represent reliability scores under the old OTP methodology. Reliability scores for Heavy Rail across all time periods are significantly lower compared to the old OTP methodology.
Figure 2 - Reliability Scores (click to enlarge)

Bus Reliability

For Bus service, the 2024 SDP adjusts the existing On Time Performance measure to reflect the impacts of unplanned cancellations of service, often called “dropped trips”. This change represents a quality-of-life improvement to the metric, pending the completion of the additional work necessary to calculate Bus reliability using Excess Trip Time instead, a methodology which already factors dropped trips in.

Our previous measure of Bus OTP excluded cancelled service from the calculation. This stemmed from limitations in our data sources, which could not reliably distinguish cancelled service from sensor outages and other factors that could cause a bus that operated and carried passengers to be absent from our data set. Thanks to improvements in our data sources and processes, we can now accurately incorporate service cancellations into the Bus Reliability standard.  The new standard counts all Bus timepoints affected by these cancellations as failures when computing the overall Bus reliability scores (Figure 3).

A diagram comparing Bus Reliability Standards for an example bus stop, illustrating bus arrivals at Columbia Rd. at Quincy St. on August 26, 2024, in the inbound direction for Route 16. A canceled bus scheduled to arrive at 9:44 AM is excluded under the old standards but counted as a failure under the new standards, resulting in a lower reliability score of 50% under the new standards compared to 60% under the old standards.
Figure 3 - Comparison of Bus Reliability Standards (click to enlarge)

Under this new standard, the Fall 2023 service period would receive slightly lower scores on Bus Reliability than it would receive under the old policy, as seen in Figure 2 above. An analysis of route-by-route differences in reliability scores between the old and new policies showed that many of the largest differences are on Key Bus routes, since the MBTA tends to pull buses from more frequent routes in order to avoid missing service on less frequent routes, resulting in many higher-frequency routes having higher rates of trip cancellations.

Bus Frequency

The new policy adjusts the definitions and terminology used for the agency’s most frequent Bus routes to better align with the ongoing implementation of the Bus Network Redesign. The new policy does not alter the calculation methodology for the Bus Frequency standard, nor does it represent any proposed or actual change in service – it just simplifies the definition of what headways a given route needs to attain in order to pass the Frequency test.  

Under the previous policy, the frequency expectations for key bus routes were for:

The new policy evaluates the current key bus routes under a new frequent bus standard, which is:

This streamlines the standard and makes it easier to interpret. The fact that the frequency standard is being tightened for weekend and off-peak times and loosened at peak times means that the Fall 2023 service period would have slightly higher weekday scores and significantly lower weekend scores than it would under the old policy, as seen in Figure 4.

Notably, these scores for how Fall 2023 service would be evaluated under the new policy are retroactively applying a standard that did not exist when Fall 2023 service was being planned. For example, many of the current Key Bus routes are scheduled to operate with 15- to 20-minute effective headways on weekends, in alignment with the previous frequency standard. As the MBTA increases the number of Frequent Bus routes and adds service to the current Key Bus routes to run 15-minute or better headways at all times, the weekend scores for the Frequent Bus category are expected to rise.  

A bar chart showing the frequency scores of frequent buses during weekdays, Saturdays, and Sundays during the Fall 2023 service period. Diamond symbols on each bar represent scores under the old methodology. The weekday score under the new methodology is slightly higher than the old methodology, while weekend scores are significantly higher under the new methodology.
Figure 4 - Frequency Scores (click to enlarge)

Platform Accessibility

The new policy modifies the Platform Accessibility standard to better align performance scores with rider experiences, particularly for times when shuttle alternatives are provided as mitigation for elevator outages.

The Platform Accessibility calculation counts how many hours of service at MBTA platforms are inaccessible because of elevator outages, a number which is subtracted from the total duration of service to yield a percentage of platform hours that are accessible. Some elevator outages do not impact platform access because of redundant elevators that provide alternative accessible pathways to platforms, while for other elevators, one outage may prevent access to multiple platforms at once. However, under the previous policy, elevator outages during which accessible shuttle alternatives were provided as a mitigation measure were considered accessible platform-hours.

A diagram comparing platform accessibility standards for a hypothetical elevator under two policies: the new (2024) SDP policy, where all elevator outages are considered inaccessible, versus the old (2021) SDP policy, where unplanned elevator outages with alternative shuttle service are considered accessible, resulting in a lower accessibility score of 29% under the old policy compared to 71% under the new policy.
Figure 5 - Comparison of Platform Accessibility Standards (click to enlarge)

The new policy tightens and clarifies the Platform Accessibility calculation such that the existence of shuttle alternatives by itself does not have any bearing on how an elevator outage is counted (Figure 5). To reflect the fact that an accessible shuttle alternative often doesn’t provide the same quality of service that riders would get with a working elevator, all elevator outages are now considered to be inaccessible hours for the purposes of this calculation by default.  The new policy does outline some minor exceptions to this rule, mainly for situations when an elevator is proactively taken out of service in order to reconstruct it as part of an accessibility modernization project. However, these exceptional situations are excluded from the calculation entirely rather than counted as accessible hours.

This change in standard makes a small difference on performance scores, with the Fall 2023 service period having a Platform Accessibility score that is about two percentage points lower using the new policy than it would have had under the old policy, as seen in Figure 6.  Going forward, the new standard should ensure that real changes in platform access aren’t obscured from SDP annual reports simply because of mitigating shuttle service.

A bar chart showing rail platform accessibility scores using the new methodology across all time periods during the Fall 2023 service period. Diamond symbols on the bars represent scores under the old methodology, with minimal differences observed. The score approach 100%.
Figure 6 - Platform Accessibility Score (click to enlarge)

Heavy Rail Comfort

The new policy adds a standard for Passenger Comfort on heavy rail services, taking advantage of data sources and processes that were not previously available.  Similar to the existing Passenger Comfort standard for Bus, the Heavy Rail Comfort standard evaluates how many minutes of passenger time are in non-crowded conditions, using maximum vehicle load standards defined in the SDP to define what’s considered unacceptably crowded.

However, the calculation of crowding is somewhat more approximate for the new Heavy Rail standard than it is for the existing bus standard. The Heavy Rail calculation uses passenger counts that are derived from faregate entries and exits for each station, whereas for Bus we use passenger counts from the Automated Passenger Counters (APCs) that are installed on individual vehicles.  The new heavy rail calculation therefore requires making assumptions about gate-to-platform walk times, and it evaluates average crowding across each train relative to the maximum capacity thresholds, not the crowding on individual cars.

The policy also defines the maximum capacity thresholds for heavy rail cars in a slightly different way than it does for buses. Both modes use separate thresholds for high-volume (peak) travel times and lower-volume(off-peak) times. However, where the Bus standard uses thresholds based on percentage ratios of passengers to the number of seats on a given vehicle, the heavy rail maximum vehicle capacity thresholds are set based on the seating capacity of each car plus a fixed amount of space per standing passenger, using information about the floor areas of Heavy Rail cars. This difference reflects the fact that on Heavy Rail cars, a smaller portion of vehicle floor area is dedicated to seating than on MBTA buses.

During the Fall 2023 service period, a greater portion of passenger minutes on the Blue Line were in crowded conditions than they were on the Red or Orange Lines, resulting in lower Passenger Comfort scores for the Blue Line, as seen in Figure 7.

A bar chart showing passenger comfort scores using the new methodology for Heavy Rail, Blue Line, Orange Line, and Red Line across all time periods during the Fall 2023 service period. Scores for all modes approach 100%.
Figure 7 - Passenger Comfort Scores (click to enlarge)

Ferry Dock Accessibility

Another new SDP standard measures the accessibility of ferry docks. Docks are considered accessible if they allow accessible transition on/off the vessel via bridge plate or gangways level to the vessel, and if they are designed to mitigate excessive slopes caused by changing tides. OPMI calculates Ferry Dock Accessibility using both an unweighted score (the percent of docks that are accessible) and a ridership-weighted score (the percent of ferry riders using accessible docks, using average ridership data). For Fall 2023, the ridership-weighted score (30.5%) is lower than the unweighted score (56.3%) because several of the docks with the highest ridership, such as Rowes Wharf, are not accessible (see Figures 8 and 9).

One notable difference in how the new Ferry Dock Accessibility standard is calculated compared to the existing Station Accessibility standard is that instead of recording the accessibility status of whole ferry terminals (e.g. Long Wharf North), we evaluate the individual docks within terminals, since docks serving different lines at a single terminal can vary in their accessibility to passengers with mobility impairments.

A bar chart showing the unweighted ferry dock accessibility score using the new methodology across all time periods during the Fall 2023 service period. The score is slightly above 50%.
Figure 8 - Unweighted Ferry Dock Accessibility Score (click to enlarge)
A bar chart showing the weighted ferry dock accessibility score using the new methodology across all time periods during the Fall 2023 service period. The score is approximately 25%.
Figure 9 - Weighted Ferry Dock Accessibility Score (click to enlarge)

Span of Service

The changes made in the new policy to the definition of the Span of Service standard are primarily for greater understandability rather than representing a tightening or loosening of service expectations, resulting in small and relatively meaningless changes in performance scores for some route categories. Specifically, the new standard is intended to align the expected start- and end-of-service hours defined in the policy with typical rider expectations of when service starts and ends on a given day.

The previous SDP used an “inside bounds” methodology, evaluating each route based on whether its first trip of the day was scheduled to arrive in downtown Boston or the route terminal at or before the expected start time, and whether its last trip of the day was scheduled to depart downtown Boston or the route terminal at or after the expected end time. This meant, for example, that the 6:00 am expected start of service for the Orange Line represented the time when the first train of the day was expected to arrive in Downtown Crossing, rather than the time when it was expected to leave from its first stop (Forest Hills or Oak Grove). However, riders have a variety of destinations, not just downtown, and most riders want to know what the earliest time is that they can depart from their stop, not when they would arrive downtown or at the route terminal.

The new policy therefore switches to an “outside bounds” methodology, evaluating each route based on whether its first trip of the day is scheduled to depart its origin station at or before the expected start hour and whether its last trip of the day arrives at its destination at or after the expected end hour, with the expected start and end hours being revised accordingly (Figure 10). For example, the new expected start of service for the Orange Line is 5:30am, and represents the time at which the Orange Line is expected to start picking up riders, rather than the time it’s expected to arrive downtown, which remains unchanged at 6:00am.

A diagram comparing weekday span standards for an example scheduled Fall 2023 weekday departures for Route 112, where illustrating the differences in the definition of the Span of Service standards under old "inside bounds" methodology and new "outside bounds" methodology.
Figure 10 - Comparison of Span of Service Standards (click to enlarge)

This change in definition has no effect on Span performance scores for most MBTA modes, as seen in Figure 11. For bus service, some route categories have lower scores for Fall 2023 service under the new policy, but all of the differences in score are relatively small, and stem mostly from edge cases where the round-numbered expected start and end times in the new definition do not exactly match some routes’ scheduled start and end times.

A grouped bar chart showing the span of service scores for Bus, Ferry, Rapid Transit, Regional Rail, categorized by time periods (weekday, Saturday, Sunday) using the new methodology during the Fall 2023 service period. Diamond symbols on each bar represent span of service scores under old methodology. The scores across all modes and time periods are similar compared to the old methodology.
Figure 11 - Span of Service Scores (click to enlarge)

Green Line Reliability

Another minor change in the new policy concerns how we calculate On Time Performance for trunk stops on the Green Line (stations in the downtown subway portion of the Green Line which are served by multiple branches). This change is temporary, pending our assembling of the data processes necessary to evaluate Green Line reliability using Excess Trip Time instead.  

The previous SDP evaluated reliability at GL trunk stops against a 3-minute standard instead of against scheduled headways, like we do for the rest of Green Line service, because typical scheduled headways at trunk stops can be on the order of 90-100 seconds between trains, and actual headways between 90 seconds and 3 minutes were deemed as still being acceptably reliable service. However, this meant that at times of the day or week when the scheduled times between GL trains at trunk stops are greater than 3 minutes, we evaluated whether each actual headway was less than 3 minutes instead of whether it was less than the scheduled headway.

Under the new policy, instead of evaluating all trunk-stop Green Line headways against a flat 3-minute standard, we evaluate these stops using either the 3-minute standard or the scheduled headway, whichever is greater. This results in the Green Line having slightly higher scores for its Fall 2023 service than it would have under the previous policy, as seen in Figure 2 above.

However, the method we used to calculate On Time Performance on the Green Line using this new standard does not perfectly reproduce how Green Line OTP has been traditionally calculated, owing to the recent retirement of the MBTA’s previous data collection systems for reliability. The preliminary results presented here are therefore approximate, and once the data processes necessary for evaluating the Green Line using Excess Trip Time are in place, SDP annual reports will use that standard instead of this interim revision of the traditional OTP standard.

Conclusion

Overall, the new Service Delivery Policy updates MBTA performance standards to paint a fuller and more accurate picture of how riders and residents experience service. Multi-year investments in improved data collection and processing are bearing fruit, as seen especially in the calculations of reliability, more accurately capturing the experience of riders over the past few years. We in OPMI are excited to start applying the new SDP metrics to the service delivered in 2024 (check back in May for our next Annual Report), and we look forward to expanding and extending the new data and metrics to cover more modes.

The new policy also aligns the definition of good service with future plans for the agency. As the MBTA progresses the Better Bus Program, procures new rail vehicles, prioritizes accessibility for all modes and stops, and plans for high frequency all day regional rail service, the service definitions and standards in the SDP help measure our progress toward a more reliable, more accessible, and more useful transit system.