During Game Five of the 2017 World Series, home plate umpire Bill Miller became somewhat of a household name by calling strikes on several pitches that appeared to be outsize the zone. The bad calls often favored the Astros, with Dodgers batters repeatedly looking miffed in an eventual 13-12 loss.



I can’t update this montage of examples fast enough bill miller the home plate ump for game 5 is calling the worst strike zone this is crazy pic.twitter.com/Yor1XJSeaU — Ryan Depaulo: Degenerate Gambler (@horriblestats) October 30, 2017

Ignoring the merits of each call, we’ll raise a different question concerning Miller’s performance: Should players and the media have been surprised?

In this article, we look at how to identify a strike zone for each umpire. That way, whenever you’re tuned into a game, you can check back to see if and where your team might get the benefit of a borderline call. Next, we’ll describe how the observed between-umpire variability in strike zone size is unlikely to be due to chance. Finally, we’ll look at last year’s numbers to see how each ump looked in 2017.

Turns out, in calling more strikes against the Dodgers and Astros, Miller wasn’t doing anything unusual. Our findings suggest he boasts one of the widest strike zones among major league umpires.

Measuring umpire performance

One common way of assessing umpire performance is to take a binary outcome variable (was a taken pitch called a strike or a ball?) and compare it to a binary explanatory variable (was a taken pitch actually a strike or a ball?). While this and other types of accuracy measures are informative (as in this FanGraphs article), they also lose information. In ignoring the exact location of where each pitch crossed the strike zone, a pitch right down the middle, as an example, is treated the same as one an inch inside the corner of the plate.

Fortunately, MLB provides the exact location of each taken pitch as it crossed the plate. Given that the strike zone—at least how it’s called by umpires—more closely resembles an oval than a rectangle, generalized additive models (for more on GAMs, see examples here, here, and here) are a recommended tool. GAMs are attractive for strike identification in that an analyst does not need to, a priori, identify the exact association between pitch location and strike likelihood and instead can let the data drive the most plausible relationship.

Our goal is to use GAMs to learn about each umpire. To start, we grabbed pitch-level data from Baseball Savant using the “baseballr” package in R, done for the 2008-2016 regular seasons. This was merged with umpire data (e.g., the umpire for each game) that was generously provided by Brian Mills. Next, we fit a GAM for each umpire to identify the likelihood of taken pitches being called a strike and extrapolated from this model the percent chance a taken pitch is called a strike on each part of the plate. Finally, we compared each umpire’s estimated zone with one estimated on all umpires across the major leagues to roughly identify where each umpire has called either fewer or more strikes.

Miller being Miller

Let’s start with the aforementioned Bill Miller.

Here’s a chart of how Miller’s strike zone compares to the major league average. Green portions of the graph reflect locations where Miller calls more strikes than a league-average ump, while the part in purple corresponds to fewer strikes. The strike zone viewpoint is that of the catcher; that is, it reflects what he is looking out at, with the umpire behind him, and is faceted to reflect both right-handed (R) and left-handed (L) batters.

Across nearly the entire fringe of the strike zone, Miller calls more strikes, up to 27 percentage points higher (shown in the dark green shades of the graph) than an average major league ump. High pitches, low pitches, inside pitches, outside pitches, pitches to left-handed hitters, and pitches to right-handed hitters—Miller is almost always calling more strikes.

From 2009 to 2016, Miller called an estimated 1,100 more strikes—roughly four per game—than the average umpire would have. Indeed, it seems his wide strike zone in the 2017 World Series was nothing but consistent with his past.

A Hardball Times Update by Rachael McDaniel Goodbye for now.

How about other umps?

A wide strike zone for Miller is one thing, but how do other umps compare?

Here’s a chart with nine selected umpires, chosen for both the uniqueness of their strike zone shapes and that they’ve each called at least 3,000 pitches during each year of our sample.



In the top row, Gerry Davis and Greg Gibson stand out as having two of the tightest strike zones in the game (mostly purple, or fewer strikes), and we include Fieldin Culbreth (top right graph) as an ump whose numbers are somewhat close to the average. Gerry Davis’ inclusion as the tightest ump over the last decade coincides with the fact that he was also considered to have the major leagues’ smallest zone way back in 2007.

In the middle row, with Joe West, ball locations to the catchers’ right side lead to fewer strikes (in purple), with more strikes to the catchers’ left side (in green). Interestingly, CB Bucknor’s strike zone is almost a reflection (across the middle of the plate) of West’s. Meanwhile, with Jerry Meals, (middle row, middle column), the strike zone varies based on batter handedness, with Meals more apt to call a strike on the outside corner than the inside corner.

In the bottom row, Doug Eddings and Miller stand out with the two biggest strike zones.

Although we couldn’t fit every ump on the chart above, we made a gif using each ump who called at least 3,000 pitches during each season between 2008 and 2016 (there are 41 umps shown, arranged in order from least-to-most pitcher friendly).

What would these charts look like if all umps were equivalent?

In fitting a separate GAM for each umpire, one potential worry is that we’re overfitting the data, which could yield exaggerated signals that may not reflect each umpire’s true propensity to call strikes. Although statistical testing can help in this regard, it’s perhaps as pertinent to replicate the charts above, except this time assume umpires were randomly assigned to each pitch.

Here’s a figure that shows what our modeling of umpire strike zones would look like if there were truly no differences between umps. For this chart, pitches were randomly assigned to one of the nine umpires above, such that the overall sample size reflected the actual number of pitches they called.



If balls and strike calls were truly random among umpires, we’d see very little of the signal we actually observe. In fact, among pitches on the border of the strike zone, the standard deviation between umpire strike call rates is about nine times what would be observed due to chance alone.

What’d the strike zone look like last year?

Between the 2016 and 2017 seasons, MLB’s pitch tracking software turned over from PITCHf/x to Trackman. As a result, it’s conceivable that a few related changes impacted the league’s overall strike zone. Alternatively, umps may have changed their behavior since the PITCHf/x era.

To take a more recent look at umpire strike zones, we looked within 2017 games called by each ump to see which ones tended to call more strikes and which tended to call more balls. (This is done by using each taken pitch in each game to determine the expected number of called strikes, which we compared to the actual number of called strikes.)

The following chart shows a boxplot of per-game deviations from the league average in strike calls for each ump who called at least 25 games in 2017. Umpires on the left of the chart (starting with Tom Woodring) mostly called games with fewer than expected called strikes, while umpires on the right (ending with Doug Eddings) generally called games with more called strikes than expected. Moving from Woodring to Eddings reflects a median of about 10 more called strikes per game.

It’s also worth noting that all umps were associated with some games above and below expectations as far as strike calls. And most umps are fairly accurate; exactly half of the 76 umps shown were observed to have an average game-level zone within one strike of the major league average. Two specific umps stood out based on a combination of both strike zone accuracy and consistency from game to game: Laz Diaz and Gerry Davis, whose boxplots are lightly highlighted above.

Davis’ move to being one of the most consistent umps is quite interesting. Recall that in using games prior to 2017, we earlier had found him to have one of the league’s least favorable strike zones for pitchers. As one possibility, this seems to highlight that umps can change how they call strikes over time. As recently as the 2014 season, Davis was the second-most stingy umpire by this same approach before climbing closer to the average in both 2015 (fourth-fewest strikes) and 2016 (eighth-fewest strikes).

Conclusion

This article is a snapshot of how statistical modeling can teach us a bit about umpire-level strike zones. In turn, we identified significant between-umpire differences in strike zone sizes.

Two uses for this analysis stand out. As one, pitchers and catchers would be well served to know, a priori, the location of pitches where each umpire tends to call strikes or balls (with the caveat being umps can change their zone at any given moment if they feel like it). Teams may well be doing some of this research already behind closed doors. Additionally, as fans, perhaps an understanding of similar strike zone maps or charts can better prepare us for games like Miller’s in last year’s World Series.