June 26, 2019

Ridership on the Dashboard and the National Transit Database

This post will discuss the methods we use to count riders and trips, and to estimate those we can't directly count. We will also discuss some of our future plans for improving these estimates and our reporting.

As an essential measure of the performance of the MBTA, we report our best estimates of ridership each month both on the MBTA Back on Track Dashboard and to the National Transit Database. As we have discussed on the blog, the source data for ridership comes from different systems and is measured in different ways. There are also many riders and trips that we are unable to measure from our equipment, and whose travel we need to estimate. This post will discuss the methods we use to count riders and trips, and to estimate those we can't directly count. We will also discuss some of our future plans for improving these estimates and our reporting.

Recap: The Sources

We use different systems to collect the raw data depending on the technology available. The two main sources are Automated Passenger Counters (APCs), which are currently installed on most of the bus fleet, and the Automated Fare Collection system that counts CharlieCard taps and other payment methods on rail and bus services. APCs are also being installed on the Commuter Rail coaches and are being installed on the MBTA's new Green, Red and Orange Line vehicles which are expected to come into service over the next few years.

For services where we do not have significant APC coverage, we use estimates based on data from the AFC system. The AFC system counts every interaction with a piece of fare equipment (for ridership purposes, these are faregates and fareboxes). We also conduct manual counts at various times and places to check against our automatically collected data, or in cases like Commuter Rail where we have limited automatic data.

Recap: The Measure

We report ridership as Unlinked Passenger Trips (UPT), which counts each boarding of each vehicle as one “unlinked” trip, even if it was part of a longer journey. While this gives additional credit to transfer trips, it is the industry standard and is required by the NTD, so we currently report ridership in this manner. We are investigating other measures of ridership and hope to be able to provide them along with UPT in the future.

How we estimate ridership from raw data

Bus: For our bus network, with a few small exceptions, we have enough APCs installed that we can use them to estimate ridership with minimal scaling and uncertainty. For each day type and route, we compare the boardings counted by APCs on trips with buses equipped with them to the total number of trips scheduled and scale the ridership up. We then scale the ridership back down to account for scheduled service that did not run.

Rapid Transit: We currently have very limited coverage of APCs on the Rapid Transit system and need to use the AFC data to estimate ridership. We start with the raw validations (taps, ticket insertions, or cash payments) at each AFC location. From here we apply three different factors in order to estimate total ridership from the validations. These factors are explained below:

Non-Interaction: Non-Interaction factors account for people who entered the MBTA system without interacting with fare equipment. These are most often children, employees, people actively evading the fare or people who entered when the fare equipment was not functioning. These factors are calculated based on a sample of manual observations of people entering faregates, conducted each year.
Station Splits: We usually assume that every validation at a faregate at a station leads to a person boarding the line that serves that station. At stations that serve multiple lines, we do not directly know which line someone who validated there then boarded. For example, someone validating at Government Center could then board either a Green Line or Blue Line vehicle without any further interaction with fare equipment. To estimate these data, we apply a factor called a “station split” to “split” the boardings at such stations between the lines that serve each station. These factors are currently based on past surveys of passengers, but at the conclusion of this fiscal year we will update them using ODX.
Behind-the-Gate: As noted above, we report ridership as unlinked passenger trips — every boarding of each vehicle. This means that for trips where passengers transferred lines without passing through a faregate or an APC, we cannot directly measure their second trip and we therefore need to estimate it with a factor. Currently, we do this using the answers from surveys of passengers. We ask them as they are waiting for a train where they are going, and determine how many additional unlinked trips we can estimate for each boarding based on which line they boarded. For example, if our survey showed that there were 121 unlinked trips for 100 passengers surveyed, the “behind the gate” factor for that line would be .21, and we would multiply the count of boardings (after the other factors were applied) by 1.21 to estimate total unlinked trips. We are also updating this factor at the end of the fiscal year using the ODX algorithm.

Putting it all together

The following chart shows an example of how we calculate final ridership from raw faregate interactions, with all three factors applied. These numbers are rounded to the nearest thousand.

A chart depicting average Red Line weekday ridership, with examples of how non-interaction, station splits, and behind-the-gate activity affects our ridership estimates.

First, we sum all the interactions at all faregates at stations with Red Line service. This will over-count the riders at stations that serve multiple lines. Then, we apply the “split factors” to the total interactions at stations that serve multiple lines (there is a different factor for each station-line combination) and apply those interactions to the other lines. This is represented by the -27 in the second column on the chart above. We then have a subtotal of 194,000 interactions that can be attributed to the Red Line.

Third, we apply the non-interaction factor to scale these taps to account for people who entered without interacting with the faregate. This brings our running total to 206,000.

Finally, we apply the additional trips from the other lines that could have behind-the-gate transfers to the Red Line (Green and Orange). These are counted in a similar calculation that is conducted on the interactions recorded at gates on those lines. This adds an additional 36,000 unlinked trips to our total, giving us our final ridership estimate of 242,000 average weekday UPT on the Red Line.

Green Line Surface

The Green Line is the most extensive and complex light rail system in the country, and this complexity presents myriad data challenges, as we have detailed on the blog. For ridership reporting, the surface-running portion of the Green Line presents some unique issues that we must account for. First, there is a high level of non-interaction on the Green Line due to the operational practice of allowing passengers with passes to board at the back door. While we believe the revenue loss from this is relatively low, it does mean we have a large non-interaction factor that we use for Green Line. We continually monitor and improve this factor, and as the new Type 9 cars, equipped with APCs, come into service, we will be able to use these to better estimate non-interaction.

Second, the Green Line fareboxes are not hard-wired to the AFC central database. This means they must be manually “probed” to download their transaction data (cash payments into the fareboxes are collected through a different process). Since the AFC system was installed nearly 15 years ago, this is a much more difficult process than it might seem; data can only be probed in certain places in the train yard, and vehicles do not always come into these places in the yard for any operational reason (by contrast, fareboxes on buses are probed much more regularly since it is part of the nightly re-fueling process). In fact, a large portion of the data from surface AFC interactions are not downloaded to our database until weeks or sometimes months after the transaction occurred.

In order to account for this probing lag, we have developed a process to impute taps for which we do not have data yet, based on the amount of service we see that each vehicle has provided (measured by stations visited from our AVL system) and the number of taps per vehicle-stop visit that we have recorded in each month in the past.

This process consists of four steps: first, we evaluate how much AFC data is missing and likely to come in through a future probing. We conservatively estimate AFC data to be missing if a vehicle is seen to be in service during a particular date but did not record any AFC records. Next, we estimate what the missing data is likely to be based on the same month of the prior year (to account for seasonal ridership trends), in terms of taps per vehicle-stop visit we tend to see in that month. We then look at the number of stop visits that occur on the vehicles with currently missing AFC, and scale them up by this estimate. Finally, every month, as more probed data comes in, we replace the estimates with real data.

Ridership on the Dashboard

We put all of the above together into our ridership update six weeks after the end of each month. This is the earliest date we feel confident that we have enough Green Line surface data to estimate its ridership. After QA/QC, we combine the above calculated ridership with the ridership reporting we get from Commuter Rail, Ferry and the RIDE to display our average weekday ridership for each month.

We are working on more detailed and granular ridership tools which will allow users of the Dashboard to explore our ridership data in different ways as data quality and availability improves. Look for these in a future update to the Dashboard.