There's a lot of behavioral literature that indicates we tend to like people who we think belong to the same group as us, and behave favorably towards them—even though we're not aware of doing so. Another, unrelated set of research indicates that we're all prone to behaving better if we think someone's watching us—even a static photo of a pair of eyes is enough to cause people to shape up. These two threads have been brought together in a rather unusual package by a detailed statistical analysis that looks at a somewhat unusual topic for research: baseball umpires and the pitchers they sometimes torment.

Calling balls and strikes would seem to be one of the last bastions of the low-tech world; it's all up to the judgement of the lone umpire behind home plate, and there's no instant replay. But that impression would be badly wrong. In recent years, every stadium in the major leagues has been equipped with a QuesTec system that compares umpires' ball and strike calls to an objective, computer-validated standard. Deviate too far from what the system says you should be calling, and you'll automatically have your performance reviewed. This provides the ultimate "someone is watching you" experience for the umpire. As a control, the researchers behind the study took advantage of a five-year period in which the system was only installed in half the stadiums in baseball, creating a set of monitored and unmonitored games.

At least not monitored by the organization that managed the umpires. Major League Baseball itself has also gone rather high-tech, installing a system called PITCHf/x that uses two cameras and the MLB's own neural network to classify where each pitch went over home plate, providing a separate objective measure of balls and strikes.

Where does the subconscious favoritism come in? Several studies have shown that sporting officials have a tendency to exhibit subtle biases in favor of members of their own ethnic group, So, an umpire that's white might be expected to favor a white pitcher, giving him more favorable calls when pitches are at the edge of the strike zone. This sort of bias might be expected to be subtle, but the research has the sort of statistical power that comes from large numbers: a record of over 3.5 million pitches, and what their outcomes were. (Here, the authors turned to ESPN.com for a pitch-by-pitch record of the game to match up with their computer data.)

After eliminating things like foul balls, swinging strikes, and intentional balls, the authors still had a very impressive collection of data to work with: 1.9 million pitches in which the umpires made a decision. Then came the real drudge work. Using sources such as About.com and web searches, the authors pieced together the ethnic origins of all the major league players and umpires involved. And then they started crunching numbers. And what they found was a subtle bias that went away when the umpires thought someone was watching them.

Signs of bias

In its simplest form, when an umpire was from the same ethnic group as the pitcher, they were more likely to call a pitch a strike, at least at a ball park that was not equipped with a QuesTec monitor. When the same analysis was performed at a QuesTec game, the probability that a pitch would be called a strike when there was matching pitcher/ump ethnicity dropped by a full percent—"more than offsetting the favoritism shown by umpires when QuesTec does not monitor them." This was specific to pitchers, as running the same analysis with the catcher and the batter showed no statistically significant differences.

It wasn't just the presence of the automated system, though, as the authors found that any situation that would lead to heightened attention on the umpire changed the ball/strike calls. These included having more fans in the stands, and pitches that were more likely to be decisive (the pitcher had thrown three balls or two strikes, meaning the next pitch could end the at-bat). Most of the effects were very small, but the authors note that, in rare edge cases, they could add up. "One can construct specific examples where the estimated direct effect is fairly large," they write. "A black pitcher throwing a nonterminal pitch in the early innings of poorly attended games in a non-QuesTec ballpark gains over 6 percentage points by matching [the umpire's ethnicity] (41.4 versus 35.2 percent called strikes)."

There were some variations in the numbers; for example, minority pitchers tend to have fewer pitches called as strikes even by umpires from the same ethnic group, and this effect is actually enhanced by QuesTec monitoring. And white umpires tended to call balls and strikes for minority pitchers about equally regardless of whether they were monitored—but this rate was about a full percentage below the number of strikes they called for white pitchers.

One interesting effect suggests that this bias might either go away with experience, or that major league baseball was aware of it on some level: the 18 most senior umpires that act as crew chiefs showed no indication of bias.

Significant effects

Do these subtle biases add up? The authors make a compelling case that they do. One of the ways they do this is by analyzing the location of the pitches. They divide the area near home plate into three regions: probable strikes, probable balls, and an edge region between the two, where the call is likely to be largely based on the umpire's discretion. Normally, just under 20 percent of pitches are thrown to that edge region. But, when an umpire is not being monitored (and thus more likely to display a small bias), pitchers that are an ethnic match for the ump are five percent more likely to aim for this edge region.

The authors ascribe this to the pitchers being aware of the bias, and attempting to allow it a greater chance to come into play. Of course, this awareness may not be conscious, or it may derive from a recognition of ball/strike calls in the early going that leads to a larger adjustment later in the game.

Perhaps more significantly, the authors also compare game statistics for matching player/ump combinations in unmonitored ballparks. Here, the numbers are very consistent. For both white and minority pitchers, winning percentages went up by about five percent. Everything else—the number of hits and runs scored against them, the walks they gave up, and the number of home runs hit—all went down (so did strikeouts, although the effect was very small).

Given the venue, an economics journal, the authors put a gloss about contracts and performance incentives over the results, but it's probably the least interesting and least developed part of the paper. They also point out that Major League Baseball, whether inadvertently or not, had been doing things to eliminate the bias. Whether through training or incentives, the most experienced umpires showed no signs of bias. And, at the end of the study period, all parks were equipped with the QuesTec system.

American Economic Review, 2011. DOI: 10.1257/aer.101.4.1410 (About DOIs).