Baseball fans have long known, or at least suspected, that umpires call balls and strikes differently as the count changes. At 0-2, it seems that almost any taken pitch that is not right down the middle will be called a ball, while at 3-0 it feels like pitchers invariably get the benefit of the doubt. One of the earliest discoveries made possible by PITCHf/x data was the validation of this perception: Researchers confirmed that the effective size of the strike zone at 0-2 is only about two-thirds as large as in a 3-0 count.

One common explanation offered for this pattern is that umpires don’t want to decide the outcome of a plate appearance. Preferring to let the players play, this argument goes, umpires will only call “strike three” or “ball four” if there is no ambiguity about the call. As Etan Green observed at Five Thirty Eight: “Umpires call balls and strikes as if they don’t want to be noticed.” The data, however, do not support this theory. The called zone shrinks in all pitchers’ counts, even those with only one strike. Similarly, the impact of ball three on zone size is no greater than that of ball one or two. So umpires do not have any particular aversion to ringing up a batter, nor to granting him a free pass. Something else is going on.

A theory that fits the data better, first offered by John Walsh in 2010, is that umpires are putting their thumb on the scale for either the pitcher or the hitter, depending upon the situation. Suggesting that “major league umpires are a compassionate bunch of guys [who] can’t help pulling for the underdog,” Walsh argued that umpires act unconsciously to help whomever is at a disadvantage at any given count, giving pitchers a more generous zone in hitters’ counts and vice versa.

It’s as clear as day: These umpires are a bunch of softies. They see a pitcher struggling to put the ball over and they go all Gandhi on us, giving the pitcher an extra chunk of strike zone to work with when the count reaches 3-0. And when the batter becomes the underdog, when the count goes to 0-2? Why, the hearts of our merciful arbiters simply turn to mush: They can’t help pulling for the poor batter as he chokes up on the bat, hoping to make some kind of contact. Who knew the umps were such empathetic characters?

As awareness of the shifting size of the zone has spread, Walsh’s critique of umpires’ “vacillating, capricious, fickle strike zone” has become widely accepted. For most fans it is self-evidently wrong for the zone to morph over the course of an at-bat. After reading Walsh’s findings, Rob Neyer concluded “It shouldn't be happening. … The strike zone should be the strike zone, regardless of the count.”

But I’m here to offer a brief in defense of the men in blue. A careful review of all the relevant data reveals a valid—and much simpler—explanation for umpires’ shifting zone: They are just trying to make as many correct calls as possible. By adjusting their decision-making on calls to reflect the changing distribution of pitches they must call, umpires are actually reducing the number of mistakes they make. Major-league umpires are doing their job, not balancing scales.

Accurate in the Zone or Out of the Zone? That Is The Question.

To see how a shifting strike zone can be consistent with the goal of accuracy, it helps to think of umpires’ ball/strike decision-making in terms of the balance they strike between accuracy on pitches inside the strike zone vs. pitches outside of the zone, rather than simply the size of the strike zone. Wait, you say: Shouldn’t umpires make calls as accurately as possible on both types of pitches? In theory, yes. But in practice, umpires face a tradeoff here. This tradeoff is illustrated in the following table, which shows the correct call percentage for umpires at each count on pitches inside (IZCC%) and outside (OZCC%) the true strike zone, based on PITCHf/x data from 2010-2015. (SOURCE: Mills)

TABLE 1: Umpire Accuracy Rates Inside and Outside the Zone

Count IZCC% OZCC% 0-0 0.87 0.85 0-1 0.74 0.93 0-2 0.63 0.97 1_0 0.86 0.85 1_1 0.77 0.91 1_2 0.66 0.96 2_0 0.89 0.82 2_1 0.79 0.89 2_2 0.70 0.94 3_0 0.93 0.77 3_1 0.85 0.86 3_2 0.72 0.93 ALL 0.84 0.89

On the first pitch of an at-bat, umpires make the correct call on 87 percent of IZ pitches (“strike”) and 85 percent of OZ pitches (“ball”). But the variation by count is substantial, and just eyeballing the numbers tells us that in counts where umpires maximize their accuracy on IZ pitches they tend to make more mistakes on OZ pitches, and vice versa. For example, at a 1-2 count umpires call 96 percent of OZ pitches right, but correctly call only 66 percent of IZ pitches. In fact, the negative correlation between the two accuracy rates is nearly perfect at -.97. Umpires can pick their poison—reducing the risk of a mistake favoring the hitter or the chance of making a pro-pitcher call—but only at the price of a rise in the other error rate.

A certain number of called pitches are unambiguously inside or outside of the zone, and umpires nearly always get these right (regardless of count). However, there is a substantial proportion of pitches close to the edge of the strike zone that umpires will—inevitably—sometimes call incorrectly. What these negatively correlated accuracy rates tell us is that on borderline pitches umpires guess “strike” in hitters’ counts, guess “ball” in pitchers’ counts, and split the difference in neutral counts.

But why do umpires shift their error rates at all, unless it is to lend a helping hand to the underdog? The answer is, because they know (more or less) what’s coming. Umpires are changing their decision rule on close calls—“lean ball” vs. “lean strike”—based on the likelihood that the pitch will actually be a strike. With two strikes, they know the pitcher will usually throw outside the zone, and they know the hitter will typically swing at anything close—so guessing “ball” on any taken pitch is the percentage play. By adjusting their decision rule at each count to reflect their prior knowledge of the true distribution of pitches, they make better guesses and fewer mistakes. In short, umpires are Bayesian, not compassionate.

If this theory is right, umpires’ shifting accuracy rates should reflect the proportion of called pitches that are actually inside the strike zone (IZ%). If umpires are attempting to call pitches as accurately as possible, then IZCC% will be highest when called pitches are most likely to be in the zone, and OZCC% will be maximized when called pitches are overwhelmingly true balls. The more frequently that called pitches arrive outside the zone, the more an umpire can improve accuracy by guessing “ball” on borderline pitches (at 0-2, he would be right 91 percent of the time if he just signaled “ball” every time).

TABLE 2: Umpire Accuracy Rates and Pitch Distribution

Count IZ% IZCC% OZCC% Total CC% 0-0 0.44 0.87 0.85 0.86 0-1 0.23 0.74 0.93 0.89 0-2 0.09 0.63 0.97 0.94 1_0 0.39 0.86 0.85 0.85 1_1 0.25 0.77 0.91 0.88 1_2 0.12 0.66 0.96 0.92 2_0 0.42 0.89 0.82 0.85 2_1 0.27 0.79 0.89 0.86 2_2 0.15 0.70 0.94 0.91 3_0 0.58 0.93 0.77 0.87 3_1 0.36 0.85 0.86 0.86 3_2 0.20 0.72 0.93 0.89

In fact, that is exactly what happens. Table 2 again shows umpires’ accuracy rates in and out of the zone, and adds the proportion of called pitches in the zone (IZ%) and total correct call percentage (Total CC%) at each count. In pitchers’ counts an overwhelming proportion of called pitches arrive outside the strike zone (low IZ%), and in response umpires lift their accuracy rate on OZ pitches at the small cost of lower accuracy on the handful of true strikes. For example, at 1-2 fully 88 percent of called pitches are outside the zone, and umpires call 96 percent of OZ pitches correctly but are right on just 66 percent of IZ pitches. In hitters’ counts the process is reversed. At 3-0, the count with by far the highest proportion of true strikes, umpires post their highest IZCC% at 93 percent, but only a pedestrian 77 percent accuracy rate on OZ balls.

One sign that umps are changing their ball/strike decision rule in pursuit of accuracy is found in the total accuracy rates by count. Despite huge variations in IZCC% and OZCC% by count, total accuracy varies relatively little. Moreover, it is in the counts in which IZCC% and OZCC% diverge most radically—where umps are changing their zone the most—that total accuracy is highest. In these pitchers counts, so few called pitches are in the zone that umpires can achieve unusually high accuracy by guessing ball most of the time.

We can measure the magnitude of the accuracy gains by estimating what would happen if umpires consistently applied the same 87 percent IZCC% and 85 percent OZCC% rates that they achieve in the neutral 0-0 strike zone. For example, given the proportion of actual IZ (9 percent) and OZ (91 percent) pitches at an 0-2 count, the overall 0-2 accuracy rate would fall from its current 94 percent to about 85 percent if umpires called a neutral zone. That is, instead of calling just 6 percent of pitches incorrectly, umpires would more than double their error rate to 15 percent. Accuracy would also decline if a neutral zone were applied in other pitchers’ counts, though by smaller amounts. Interestingly, in hitters’ counts there is little difference between the current and projected accuracy rates, because the zone actually expands relatively little in hitters’ counts (more on this below).

Table 3 provides these neutral zone accuracy projections for all counts. Significantly, there is no count at which applying the 0-0 accuracy rates would result in a significantly higher (more than 1 percentage point) total accuracy rate. Overall, umpires’ shifting decision rules appear to lift their accuracy rate by almost 2 percentage points (87.5 percent vs. 85.8 percent), preventing thousands of additional wrong calls each season. Moreover, the accuracy gain is particularly large in several key high-leverage counts.

TABLE 3: Projected Accuracy Rates Using a Neutral Zone in All Counts

Count Actual CC% CC% if 0-0 zone Difference 0-0 0.86 0.86 — 0-1 0.89 0.86 -0.03 0-2 0.94 0.85 -0.08 1_0 0.85 0.86 0.01 1_1 0.88 0.86 -0.02 1_2 0.92 0.85 -0.07 2_0 0.85 0.86 0.01 2_1 0.86 0.86 -0.01 2_2 0.91 0.86 -0.05 3_0 0.87 0.86 0.00 3_1 0.86 0.86 0.00 3_2 0.89 0.86 -0.03 ALL 0.88 0.86 -0.02

Admittedly, these are only approximations of the accuracy rates that would result from calling the neutral zone in other counts, because all IZ pitches are not equally likely to be called strikes (nor are all OZ pitches created equal). It’s possible umpires could achieve small improvements in accuracy in some counts by making a different tradeoff in the respective accuracy rates. But the general pattern is completely consistent with an effort to minimize errors: in counts where a true strike is more probable, umpires call more strikes, and when the probability of a true strike declines, umps are more likely to call a ball.

How does this work?

An interesting question is how exactly umpires achieve this result. Perhaps they simply prime themselves to “lean” one way or the other based on the count. In the same way a hitter with two strikes primes himself to swing at anything close (because there isn’t enough time to consciously ‘decide’ whether to swing), umpires may be predisposed to call ball or strike based on their knowledge of the count.

One difference, though, is that while hitters quite deliberately change their aggression level based on the count, it’s not clear that umpires consciously bias their calls. It seems unlikely that umpires actually admit to themselves, “I’m going to call this one a ball unless it’s right down Main Street.” My guess is the process is more subtle, and largely unconscious. I suspect one factor is that umpires are primed by the pre-pitch positioning of the catcher’s glove. They use the glove’s position as a marker, to help them determine the location of the arriving pitch vis-a-vis the edge of the zone. If the catcher sets up near the edge of the zone, as is usually the case in pitchers’ counts, the umpire notes this and gives the pitcher less latitude to miss the glove. If the catcher’s glove moves much at all—especially away from the center of the zone—the ump will call a ball, which effectively shrinks the zone. But when the catcher sets up closer to the center of the zone, as he usually will in hitters’ counts, the pitcher has more latitude: If the catcher’s glove then moves a bit in catching the pitch, the pitcher may still get a strike call because the glove began nearer the center of the zone.

This would be an important amendment to Mike Fast’s “catcher target theory” of catcher framing. Fast suggested that umpires use the catcher’s glove location as a reference point, calling strikes more often when the catcher does not have to move his glove very far and calling balls when he does. No doubt there is truth to this, but by itself this theory is not consistent with the way the zone shifts by count. In my view, the amount of movement of the catcher’s glove is only part of the story—the pre-pitch location of the glove also matters. If true, then influencing umpires’ perception of how far inside the zone the catcher initially set his target may be an important element of catcher framing skill.

I can’t prove that umpires use catchers’ gloves as a signpost in this way, to help them evaluate the proximity of a pitch to the strike zone (though I invite other researchers to explore the issue). But if it were your job to judge whether a ball moving 95 mph had passed through an invisible box in the air—within one second, in front of 40,000 people—wouldn’t you take advantage of the only visible, relatively fixed marker in sight?

Regardless of how the shifting zone is achieved, umpires must face plenty of pressure to do it, without anyone even needing to explicitly acknowledge it. A young umpire who called a consistent neutral zone would find himself frequently criticized by hitters, and perhaps the league, for calling too many strikes in pitchers’ counts. The opposite would happen in hitters’ counts. Importantly, both sets of criticisms would be entirely correct, and backed by video evidence. The “changing size of the zone” is an abstraction, while calling pitches outside the zone as strikes is very concrete, with real consequences. To survive, any umpire would have to adopt the shifting zone.

Are Umps Being Nice, or Doing Their Job?

So we have two theories that both seem to fit the data: compassion and accuracy. The analytic challenge is that the proportion of true strikes in each count generally tracks the relative advantage of hitters vs. pitchers. Given this overlap, can we determine which theory is right?

I think we can. Following Walsh and other analysts, we can define the size of the strike zone as encompassing any area in which 50 percent or more of called pitches are called a strike by umpires. If we take hitters’ wOBA after each count as a measure of the batter’s relative advantage/disadvantage, we see that the size of the zone does mirror how well hitters perform starting from that count (see Table 4). Indeed, the two factors have a correlation of .81. However, the size of the zone also closely matches the proportion of actual strikes seen in each count (IZ%), consistent with an effort by umpires to maximize accuracy. And the fit between zone size and strike percentage is even tighter, with a correlation of .95. (NOTE: I am using the size of the zone for right-handed hitters, but all reported correlations are nearly identical for left-handed hitters.)

TABLE 4 Strike Zone Size, Hitter wOBA, and IZ%

Source for wOBA data: Meyer; Source for strike zone size: Carruth

Strike Zone Count Size (RHH) wOBA IZ% 0-0 3.49 0.317 0.44 0-1 2.89 0.267 0.23 0-2 2.39 0.200 0.09 1_0 3.52 0.364 0.39 1_1 3.07 0.301 0.25 1_2 2.43 0.228 0.12 2_0 3.63 0.447 0.42 2_1 3.21 0.360 0.27 2_2 2.88 0.277 0.15 3_0 3.73 0.638 0.58 3_1 3.36 0.480 0.36 3_2 2.97 0.389 0.20

This difference in correlations is not dispositive by itself. However, if we look deeper into these relationships, the case for compassionate umpires grows still weaker. It turns out that balls and strikes do not impact the size of the called zone equally: Each additional strike serves to shrink the zone considerably, while the marginal impact of a ball is rather small. The correlation of zone size with the number of balls in the count is just .40, compared to a -0.89 correlation with strikes. That is why the zone is generally not much larger in hitters’ counts than a neutral count, while the zone shrinks considerably in pitchers’ counts.

While zone size is impacted much more by an extra strike than a called ball, the same is not true for wOBA. The effect of adding a ball or a strike varies considerably by count, but in general the impact is about equal. (In fact, wOBA correlates better with the number of balls than strikes in the count.) If compassion for the underdog were umpires’ driving motivation here, their zone should be just as sensitive to the number of balls as strikes in the count, but it isn’t. In contrast, IZ% has a very high correlation with the strike count (-0.89), but only a weak correlation with the ball count (0.40)—exactly the same pattern we find for zone size.

If you don’t find correlations persuasive, consider the classic hitters’ count of 3 balls and 1 strike. There’s no question that pitchers are the “underdog” here, with hitters posting a massive .480 wOBA—that’s the 2015 version of Bryce Harper. And yet, the zone at 3-1 is actually a bit smaller than on the first pitch (3.36 vs. 3.49 sq. ft.). Where is umpires‘ compassion at this crucial moment of peril for our nation’s hurlers? The fact that slightly fewer 3-1 called pitches actually arrive in the strike zone (8 percentage points less than at 0-0) apparently matters more than who has the upper hand in the hitter-pitcher battle. And the zone shrinks considerably more at a full count—still a very good count for hitters (.389 wOBA)—because only one pitch in five is an actual strike.

Finally, as we consider all of this evidence we should not grant these two theories equal standing. Occam’s razor tells us to go with the simplest explanation for the observable facts. An umpire’s job is to call balls and strikes correctly, so it is reasonable for us to assume they are trying to make as few mistakes as possible. Making biased calls in favor of a temporary underdog, in contrast, is very much not part of the job description. As it happens, the statistical fit with zone size is stronger for accuracy than for compassion. But to paraphrase Johnny Cochran, in the case against umpires for alleged underdog bias, even an equal fit means we must acquit.

Moreover, while the compassion theory is compatible with the facts of the shifting strike zone mystery, it doesn’t explain much else. If umpires have a strong unconscious preference for underdogs, why don’t they display any bias in favor of weak hitters over strong ones, or show favoritism to weak teams over strong ones? If they root for David over Goliath, then why do they increase the size of the zone as the hitting team makes more outs—exactly the opposite of a compassionate response? (Source: Roegele) The compassion story never rang entirely true, and fortunately we no longer need it to explain umpires’ behavior.

Should Umpires Change the Size of the Zone?

One important question remains: are umpires doing the right thing by changing the effective size of the zone by count? The answer provided by the rulebook is unambiguous—the definition of the strike zone makes no reference at all to pitch count. But as we’ve seen, “changing the size of the zone” is only one way to think about umpires’ behavior. Alternatively, we could ask whether umpires are right to adopt different decision rules by count for handling borderline pitches. The reality is that umpires won’t ever be able to call every pitch correctly. So when a pitch is a close call, should umpires always do the mental equivalent of a coin toss, or should they take advantage of the fact that in some counts, a strike (or ball) is much more likely than usual? Is the notion of a single, stable zone so sacred that it should trump our desire for accuracy?

We can think of this as a regulatory problem from the league’s vantage point. A fixed zone seems obviously fair, but it creates an arbitrage opportunity in some counts for one side to exploit umpires’ inevitable mistakes. At 0-2 the pitcher knows the hitter has to swing at anything close, even bad pitches, and he’s earned that advantage by getting the hitter to 0-2. But then a neutral zone gives the pitcher an extra, unfair advantage: By throwing a large majority of borderline pitches outside the zone, pitchers would benefit from more than their fair share of wrong calls. At 0-2, about 90 percent of umpire errors would be in the pitcher’s favor using a neutral zone. By shifting their decision rule on close pitches, umpires prevent either side from systematically exploiting their mistakes.

Of course, there is a way to remove this arbitrage opportunity while also maintaining a fixed zone: use robot umpires. I don’t know whether that would be a good change or not. But based on our analysis here, we can see that the resulting changes could be substantial. Without the “counter-cyclical” adjustment umpires make at each count, the impact of the count will be even more pronounced than in today’s game. An 0-2 count will truly be a death sentence, as hitters are forced to swing even more frequently and pitchers take advantage of this by throwing even more toward the edges of the zone. With three balls we will see the reverse: pitchers forced to groove the ball even more than they do now, and hitters teeing off with greater ferocity. Knowledge of these enhanced count effects may also change behavior earlier in the count, with unknown effects. My own guess is that a constant zone across counts would mainly work to further suppress scoring, since the main change in size is a shrinking zone in pitchers’ counts. But it’s very hard to forecast all the consequences here, other than to say they could be profound.

For now though, we have human umpires. And the bottom line is that by shifting the effective size of the zone, these umpires are getting calls right much more often than if they called the same zone at all counts. Do we really want umpires to make an additional 6,000 wrong calls each season in the name of fealty to the one true strike zone? Personally, I would vote no. A changing zone is the lesser of these two evils.

But wherever one comes down on that question, we should at least agree to stop hurling charges of “bias” against the men in blue. Let’s acknowledge that they are actually using the tools available to them to call balls and strikes correctly as often as possible, just as we demand of them. Major-league umpires are not taking sides, they are taking care to get it right.

Guy Molyneux is a political pollster who dabbles in sports analytics.

SOURCES:

Baseball Savant is the source for all PITCHf/x data in this article on the distribution of called pitches and the accuracy of umpire calls by count, and the data was compiled and provided by Professor Brian Mills of the University of Florida. The author thanks Brian for generously sharing this data.

Rob Neyer, ESPN, http://espn.go.com/blog/sweetspot/post/_/id/3100/for-umps-strike-zone-depends-on-count

John Walsh, The Compassionate Umpire, The Hardball Times, April 7, 2010. http://www.hardballtimes.com/the-compassionate-umpire/

Dan Meyer, Dynamic Run Value of Throwing A Strike, The Hardball Times, May 6, 2015. http://www.hardballtimes.com/dynamic-run-value-of-throwing-a-strike-instead-of-a-ball/

Etan Green, Four Strikes and You’re Out, Five Thirty Eight, April 3, 2014. http://fivethirtyeight.com/features/four-strikes-and-youre-out/

Mike Fast, The Real Strike Zone, Baseball Prospectus, Feb. 16, 2011. https://legacy.baseballprospectus.com/article.php?articleid=12965

Matthew Carruth, The Size of the Strike Zone by Count, Fangraphs, Dec. 18, 2012. http://www.fangraphs.com/blogs/the-size-of-the-strike-zone-by-count/

Jon Roegele, The Living Strike Zone, Baseball Prospectus, July 24, 2013. https://legacy.baseballprospectus.com/article.php?articleid=21262