December 19, 2024

Introducing Excess Trip Time

The MBTA is beginning to use a new metric, Excess Trip Time (ETT), to measure subway performance. ETT is a passenger-weighted metric that measures the amount of time that a passenger spends waiting for and riding public transit in excess of a baseline. In other words, ETT tries to answer the question - what is the passenger experience on transit?

Introduction

In December 2024, the MBTA adopted an updated Service Delivery Policy (SDP) that includes new methods of calculating service reliability. For the T’s heavy rail service (the Blue Line, Orange Line, and Red Line), reliability is now measured using Excess Trip Time (ETT). This post will discuss what ETT is and the methodology behind the measure, how ETT compares to the formerly used On-time Performance metric (OTP), and how to use the new ETT dashboard on mbta.com/performance. For more information on the new SDP, please visit mbta.com/SDP to view the policy, and view our OPMI Data Blogpost on how service evaluation differs under the new policy (coming soon). For more information about other recent updates to the MBTA Performance Dashboards, please view our OPMI Data Blog posts on Ridership Dashboard Updates and Trip Times Post-Track Improvement Plan (coming soon).

What is Excess Trip Time?

ETT is a passenger-weighted metric that measures the amount of time that a passenger spends waiting for and riding public transit in excess of a baseline. In other words, ETT tries to answer the question - what is the passenger experience on transit? This is fundamentally different from the metric previously used to measure heavy rail reliability, OTP. OTP is based on the scheduled headway for service, and calculates the percent of passengers that wait less than the scheduled headway at stations. This enables the T to determine performance using the resources we have available in a given rating1, which in general, is typically above 85% for heavy rail. However, adherence to a schedule does not necessarily reflect good customer experience. If a customer experiences, for example, a 15-minute headway during the morning peak and travels over multiple sections of track with restricted speeds implemented, the trip may be precisely as scheduled but is still a poor experience for the customer. ETT seeks to better measure the passenger experience by comparing experienced wait and travel times to a baseline trip time. In simpler terms, ETT measures how long a rider’s journey actually takes, versus how long that rider’s journey should take. While this concept is new for the MBTA, a version of ETT is already in use by other transit agencies, such as the MTA in New York City, and WMATA in Washington, DC.  

1A “rating” is essentially a season, and is the frequency at which service schedules are updated to reflect changing resources.

Methodology

A visual showing how actual trip time, wait time, and travel time compare to benchmark times for the excess trip time calculation.
Figure 1 - a visual representation of Excess Trip Time (click to enlarge)

The three main components of this metric are passenger loads, passenger wait times, and travel times for each Origin-Destination pair on the T’s heavy rail system. Actual passenger loads and wait times are derived from the Origin, Destination, and Transfer (ODX) model in use at the T. The model, developed by Korbato, infers a passenger’s origin, destination, and any associated transfers based on card validations throughout the day. The algorithm therefore places passengers on specific vehicles, allowing us to calculate headways associated with each passenger’s trip and the amount of time spent onboard the vehicles before alighting. Both of these components are calculated at the trip level.

Travel times, both actual and baseline, are calculated using data from the Lightweight Application for Measuring Performance (LAMP), which provides trip-level data on subway travel times. The actual travel times are calculated by service period by day (e.g., AM Peak, PM Peak, Off-peak) and applied to any trip made during the respective service period. The baseline is calculated by finding the fastest monthly median travel time by segment since April 2021, with each possible origin/destination runtime equaling the sum of the segment runtimes. If the origin stop is not a terminal, the runtime is calculated from the vehicle arrival at the origin stop to the vehicle arrival at the destination stop. If the origin stop is a terminal, the runtime is calculated from the vehicle departure at the origin stop to the arrival at the destination stop. The track condition varies significantly across the system and receives upgrades and repairs on different timelines, so this method ensures that the baseline travel time is theoretically attainable but also a stretch.

The calculation is:

(Actual Wait Time + Actual Travel Time (includes Dwell))
- (Baseline Wait Time + Baseline Travel Time)
= Excess Trip Time

The actual wait time is the actual headway for each trip divided by 2. This uses the imperfect assumption that passengers arrive at a random cadence during the time between trains, so each passenger’s expected wait time is half of the headway. Although imperfect, this method does allow us to penalize for uneven headways because fewer passengers arrive during the shorter headway. For example, imagine there are two trains scheduled 10 minutes apart. The first arrives after only 5 minutes and the second arrives 15 minutes later. If 100 passengers walked up to the platform during those 20 minutes, 25 of them wait an average of 2.5 minutes (62.5 passenger-weighted minutes), while the remaining 75 passengers wait an average of 7.5 minutes (562 passenger-weighted minutes). Combined, the 100 passengers spent 625 minutes waiting. The baseline would have been 5*100 = 500 passenger-weighted minutes.

The baseline wait time is calculated from the best schedules that could be run under the expected car count – it varies by line.

Neither the travel time baseline nor the wait time baseline is set in stone. Both may be adjusted as track conditions improve and rolling stock increases – we want to continuously strive for improvements to the customer experience as we look toward the future. As we want to be able to incorporate incremental improvements in the short-term, these baselines may be updated on a more frequent cadence than the SDP Annual Report is released.

Using the Dashboard

The ETT Dashboard displays the percentage of riders who arrived within the given time thresholds compared to the baseline. The key performance indicator on the left side of the dashboard shows the 30-day performance of the line in the default "Monthly" view, or the 14-day performance if the "Daily" view is selected. The area chart shows the percentage of riders who arrived within 5 minutes of their expected trip time in the dark shading, and the percentage of riders who arrived within 20 minutes of their expected trip time in the light shading. The grey area is the percentage of riders whose trip time was longer than the expected trip time plus the baseline time. In practice, many riders are in the heavy rail system for only a few stops (think a south side Commuter Rail rider who takes the Orange Line from Back Bay to North Station, or vice versa, as an example). Therefore, we consider 5 minutes to be the primary target as a measure of providing good service.

The circles below the area chart show the actual volume of riders on that line for the given month in the default "Monthly", or day in the "Daily" view. The larger circles represent months with high ridership, and the smaller circles represent months with lower ridership.

In the simplest terms, the color of the service line in the dashboard represents the riders who had a “good” customer experience, so the more color, the better.

A visual explanation of how to interpret the excess trip time dashboard chart, as described in the previous sentences.
Figure 2 - an example chart from the Excess Trip Time Dashboard with an explanation of how to use (click to enlarge)

This dashboard updates weekly, with the prior week data being added on Monday mornings. Due to the vast number of datapoints involved in the calculation, it sometimes takes longer than anticipated to process the data, and therefore there is a possibility that historical data will be restated if underlying data is found to be incomplete. We are aiming to provide this data as quickly and timely as possible, but due to the nature of the calculation and the processing of the data itself, there may be times when the data update is incomplete.

ETT in Practice

To illustrate how ETT measures reliability differently than OTP, let’s take a look at a few example service days. To start, we will go to Sunday, March 17, 2024. This was St. Patrick’s Day (or Evacuation Day, to those who celebrate), and was the day of the parade in South Boston. The parade usually brings large crowds and creates crowded conditions on the Red Line in particular. On that day, the Red Line posted an OTP score of 91%, above the daily target, which would seemingly indicate good reliability that day. However, on the same day, the Red Line had an ETT score of 35%. So while the Red Line was largely adhering to its schedule for that day, only about a third of passengers arrived at their destination within five minutes of how long their trip was expected to take.

Another illustrative example is November 13, 2023. On this day, all three heavy rail lines experienced on- or above-target reliability as measured by OTP (90% under the old SDP). The Red Line and Orange Line both had OTP scores of 92%, and the Blue Line had an OTP score of 90%. Evaluating that day using ETT paints a very different picture, however. Measuring how many passengers arrived within 5 minutes of their expected trip time results with the Red Line having a score of only 15%, the Orange Line performing at 59%, and the Blue Line at 80%. Again, while the service delivered was adhering close to scheduled service, the experience of a large proportion of passengers was unreliable compared to expectations.

ETT also better reflects rider experience on days with unplanned service disruptions. On the morning of January 16, 2024, for example, an electrical fire at Downtown Crossing disrupted rail service on the Orange and Red lines for a couple of hours. Shuttles were deployed to replace service during this period. For this service date using the ETT metric, 51% and 18% of riders reached their destinations within 5 minutes of the baseline on the Orange and Red lines, respectively, compared to 92% and 86% under the former OTP metric. This example reflects the increased granularity at which the ETT metric calculates, and the fact that travel time is incorporated as it had not been prior. Figure 3 shows the fluctuations in ETT scores throughout the day on the Orange and Red lines. The unplanned service disruption, as may be evident, happened in the morning between about 9 and 11 AM.

A comparison of hourly changes in excess trip time for the Red and Orange lines on January 16, 2024
Figure 3 - Excess Trip Time changes through the day on January 16, 2024 (click to enlarge)

Conclusion

Over the past couple of years, the MBTA has undertaken major efforts to improve passenger experience. Most notably, the Track Improvement Program completed the major task of eliminating the backlog of speed restrictions across the system. Passenger experience is a big reason why the new SDP is moving to using ETT as a reliability measure, instead of OTP. Using ETT as part of the Performance Dashboards will allow riders to monitor passenger experience going forward. The T is striving for ETT scores to be consistently high from here on out. In the future, we anticipate switching to ETT as a reliability measure for light rail (Green Line, Mattapan Line) and bus service as well. Stay tuned to the Data Blog for future updates to how the MBTA measures performance.