Editor’s Note: This piece was adapted from a presentation at SaberSeminar 2018.

The Nationals beat the Diamondbacks on May 13, in a game started by Jeremy Hellickson and saved by Sean Doolittle. During that game, Paul Goldschmidt had four plate appearances. He struck out twice, once looking on a strike that was below the zone. Home plate umpire Marty Foster called him out, and Goldschmidt turned around, said something inaudible on the broadcast to him, and walked away.

This is notable not for what happened, but for what didn’t. A month earlier, Anthony Rendon was ejected. He struck out looking on a strike below the zone and was punched out by (you guessed it) home plate umpire Marty Foster. Rendon tossed his bat, said nothing to Foster, and was ejected from the game for what Foster would later characterize as “throwing equipment.”

Goldschmidt and Rendon are somewhat comparable players in terms of their offensive output; Goldschmidt finished third in WRC+ in the NL in 2018, Rendon fourth. Rendon’s ejection — and Goldschmidt’s subsequent non-ejection — presented me with an opportunity to examine ejection data. I was curious whether umpires eject nonwhite and white players in proportion to their representation in baseball.

To answer this question, I analyzed 860 umpire ejections of players from 2008-2017 and found that, even when controlling for other factors, umpires eject nonwhite players disproportionately compared to those players’ representation in major league baseball. The following analysis examines position players compared with pitchers; player usage and roles; players ejected multiple times; cause of ejections; and individual umpires and umpire ethnicity.

Methods

This analysis uses umpire ejections of players (only players, and not managers or other ejections) from 2008-2017 via data provided by the Umpire Ejection Fantasy League. This data set constitutes 860 ejections of 482 unique players, since some players are ejected more than once.

To begin, I assigned players a nonwhite/white designation based on nation of origin and self-identity. I treated ethnicity as a mutually exclusive category— either a player was categorized as white or he wasn’t.

I used three major criteria for data. First, players from Latin American and Asian countries were classified as nonwhite. (See below for a caveat on this.) Second, I researched players from the United States or Canada using a combination of the Wikipedia listings for players, which generally specify if players are Black, of Asian or Latino descent, or are otherwise nonwhite, and spot-checking on Google. Third, I also looked at player interviews where they discussed their heritage, such as Johnny Damon discussing his mother being from Thailand. This methodology is similar to that used in a 2011 paper that examined racial/ethnic biases in umpires in calling strikes.

When possible, I tried to err on the side of self-identity, though a significant caveat to this is that I do not have access to a data set of players’ self-identities. Self-identity can also change depending on context. Specifically, players who are non-U.S.-born Latinos and who identify as white and Latino in their country of origin may identify (and be identified by umpires) as nonwhite when playing in the U.S. and Canada.

Francisco Cervelli, for instance, is Italian and Venezuelan, and was born in Venezuela. I classified him as ‘nonwhite’ for the purposes of this analysis, though I do not know how he personally identifies. Christian Yelich, who has a grandparent from Japan — which I didn’t know when I did my analysis — was classified as “white.” There are a handful of players for whom I could not find information, and so defaulted them to “white” so as not to ascribe a classification without evidence of nation of origin or self-identity. My hope in making this data set public is, in part, to fix any mistakes on my part.

Additionally, I focused mainly on comparing “white” and “nonwhite” player ejections, though I did also classify players as “Latino” and “non-Latino” when possible, with all ‘Latino’ players being classified as “nonwhite” as well. Marcus Stroman’s mother is from Puerto Rico, and he was classified as “nonwhite’”and “Latino.” The Ross brothers’ mother is of Korean descent, so they were classified as “nonwhite” and “non-Latino” for the purposes of this analysis.

Unlike an informal census of players, and other previous analyses, I did not classify players by skin tone or as being either Black or Latino. I did not treat those categories as mutually exclusive for a variety of reasons, the most of which is that many players identify as both Black and Latino. I saw no purpose in distinguishing between Yunel Escobar, whom the 2014 census identifies as “Hispanic” and Yasiel Puig, who the 2014 census identifies as “Black.: Both players are Cuban, so were identified as “nonwhite” and “Latino” for the purposes of my analysis.

I understand that any analysis like this is inherently fraught, and that racial and ethnic identity is complex. How people self-identify and how others identify them do not always match. I can’t say if and how umpires identify certain players as nonwhite — or even what they’re thinking about when ejecting players. That said, I’ve tried my best to respect players’ self-identity; any errors in classification are the result of honest mistakes. As such, I welcome corrections.

Results and Analysis

I examined this data year over year, and compared it with yearly league demographics found here. This report card comes with a few caveats, the most important of which is that it does not come with access to the underlying data used in the analysis, and shows only aggregated percentages for player ethnicity.

A Hardball Times Update by Rachael McDaniel Goodbye for now.

The years denoted by an asterisk are those in which the ejection of nonwhite players is not proportionate to their league representation, to a statistically significant degree. (I used a Chi Square test, which assumes random sampling and independent events – more on that later.) For 2008-11, 2013, and 2016, there were significantly more ejections of nonwhite players than their league representation would predict (p < .05), with at least a ‘small’ effect size, which was calculated using Cramer’s phi. The years denoted with a Y are not significant (p= .28 for 2012 and .07 for 2015), but have an effect size greater than .10.

In looking at the data overall, the percentage of ejections of nonwhite players exceeds nonwhite players’ league representation — meaning there were more ejections of nonwhite players than league representation would predict occur if ejections were equally as likely to happen to nonwhite as white players. Similarly, ejections of white players fall short of what league representation would predict.

When looked at in totality, ejections of nonwhite players far exceed what league representation predicts (p<.01), with an effect size approaching medium (.277). So umpires disproportionately eject nonwhite players. Perhaps most tellingly, a narrow majority of player ejections in the past 10 years have been of nonwhite players (442) compared with ejections of white players (418).

Position Players vs. Pitchers

Complicating matters, the overall numbers from the league demographic report card fail to reflect a further major division within baseball demographics: pitchers are disproportionately white, compared to overall league demographics, and position players are disproportionately nonwhite.

There is no comprehensive or consistently conducted census of major league player ethnicity that makes available the exact pitcher vs. position player ethnic breakdown. Estimates are available for nonwhite pitcher representation, which range from 23–30 percent. Examination shows that 40.5 percent of ejections are of nonwhite pitchers — again, significantly (p<.05) higher than what league representation, even assuming the maximum estimate of league representation, would predict.

Between 42 and 49 percent of position players are nonwhite. 55 percent of ejected position players are nonwhite — significantly (p<.05) higher than what league representation, even assuming the maximum estimate of league representation would predict.

Therefore, even assuming that pitchers are maximally nonwhite at 30 percent and position players at 39 percent, umpires disproportionately eject both nonwhite pitchers and position players compared with their representation in the league.

What about opportunities to be ejected?

I considered the possibility that some players are ejected more often simply because they have more opportunity to be ejected, with opportunity for ejection being measured in plate appearances. As such, I examined position player plate appearances as “opportunities” for ejection.

For position players, I compared each player’s plate appearances per season, since a plate appearance could be considered a “chance” to be ejected. I used per season plate appearances rather than plate appearances to that point in the season as a proxy for a player’s role — if he’s an everyday player, a bench player, etc.

I considered position player usage data in two ways — one as numerical data, one as categorical. For the former, I compared the average PAs/season between white and nonwhite players using a set of t-Tests, and found no significant difference between them, with the exception of 2016. (t-Tests assume normal distribution of data and relatively low sample size. I used a flavor of t-Test that assumes unequal variance and sample size between samples.)

I then considered position players as belonging to one of two categories: players with fewer than 400 plate appearances per season were deemed “bench” players and those with more PAs were “everyday players.” This was done to avoid the possibility of non-normal data and outliers, such as Rickie Weeks, who had 754 PAs in 2010 or Michael Saunders, who had 36 PAs in 2015. I compared these using a Chi Square test and again found no significant difference between the number of nonwhite and white bench players, and nonwhite and white everyday players, with the exception of 2016, when slightly fewer nonwhite bench players than would be expected were ejected, and slightly more white bench players. (A Chi Square test was used, even for these smaller samples, because all predicted values in each square of the 2×2 tables for these tests exceeded 10, which is the threshold between a Chi Square and Fisher Exact test.)

Actual Bench vs. Everyday Player Ejections in 2008 Nonwhite White Bench 14 12 Everyday 33 14

Predicted Bench vs. Everyday Player Ejections in 2008 Nonwhite White Bench 16.74 9.26 Everyday 30.26 16.74

In aggregate, and for nine years out of the 10 I analyzed, nonwhite and white position players had relatively equal opportunities to be ejected, but more ejections of nonwhite players occurred than league representation would predict.

In assessing pitchers, I didn’t consider pitcher plate appearances, and have not yet analyzed innings pitched per season. There is clearly a spectrum of “opportunity” ranging from an NL starter to an AL reliever, and therefore a spectrum of ejection opportunities. Of the 217 pitcher ejections examined, 113 were of AL pitchers and 104 were of NL pitchers, with four AL teams (the Orioles, Cleveland, the Yankees, and the Blue Jays) having 10 or more pitcher ejections. Only the Dodgers have 10 or more for the NL during the period analyzed. Pitcher ejections aren’t necessarily affected by pitcher plate appearances, given that the AL had more of them, though about 10 percent of pitcher ejections occurred at the plate, with the remaining 90 percent on the mound. Future work will examine pitcher ejection opportunities as a combination of PAs and IP, and various pitching roles (starter vs. reliever, etc.).

But can we consider these independent events?

The statistical tests I’m using (Chi square with Cramer’s phi to calculate effect size) assume that each ejection isn’t caused by past ejections and isn’t causing future ejections. This isn’t necessarily the case in reality. Some players are repeatedly ejected, and there are players whose past ejections seem to cause additional ejections.

Of the 860 ejections I analyzed, 482 different players were ejected — meaning there were a lot of players ejected multiple times over than span. However, the majority of ejected players were ejected only onn or two times, and the numbers drop fairly rapidly after that. So, this data set isn’t considering a handful of players ejected over and over and over again, but a variety of players, most of whom will be ejected less than once every few seasons.

However, there are players who are ejected repeatedly, and whose reputation for being ejected may influence an umpire’s willingness to toss them. Only four players have been ejected more than 10 times in the study period: Matt Kemp, Yunel Escobar, Bryce Harper, and Ian Kinsler.

I had been considering ejections as independent events that is, getting tossed once does not necessarily raise a player’s chance of being tossed again. However, for those who are repeatedly ejected it is possible that a history of ejections leads to more ejections. Having watched Harper get ejected, I’ve observed that he sometimes argues ball and strike calls, and thus is tossed for behaviors that another player with less of a reputation for arguing would likely not be tossed for. (Rendon, who tends to be fairly literally closed-mouthed — except for when he’s smiling — has been ejected twice, including April’s ejection; Goldschmidt has never been ejected.)

That said, Harper, Kemp, Kinsler, and Escobar’s total ejections constitute 48 of the 860 ejections I examined, or about sixpercent of total ejections. So I considered my data excluding them. The overall conclusion of disproportionately ejecting nonwhite players holds, with one year going from a significant difference between ejections and league demographic representation to a marginally significant one. (2011 goes from p<.05 to p ~.07.) The overall effect size drops from .277 to .27. However, when I examined position-specific ejections, their ejections did tend to show that certain positions were overrepresented in ejections as an effect of Kemp and Harper both playing centerfield.

Causes of ejection

I also considered causes of ejection in my analysis. What if, for instance, players were getting tossed because they were fighting, “throwing at,” or otherwise participating in behaviors that generally have less of a gray area than arguing balls and strikes? The UEFL data include causes of ejections — and the majority of ejections (525 out of 860) result from arguing ball and strike calls. Of these, 271 were of nonwhite players, with 254 of white players. So not only is it a disproportionate number, but the majority of ball and strike call-related ejections were of nonwhite players.

“Throwing at” constitutes the next most frequent cause of ejection, with 109, 108 of which were of pitchers, with the remaining player ejected a catcher. Of these, 50 were of nonwhite players and 59 were of white players. Since pitchers are disproportionately white, 45 percent of “throwing at” ejections being of nonwhite pitchers represents an extreme disproportionality. Pitcher representation would predict that of 108 pitcher ejections for ‘’throwing at,” 33 or so should be of nonwhite pitchers, assuming that 30 percent of pitchers are nonwhite. So 50 of these ejections exceeds that significantly (p <.05).

I did look at games in which ejections for “throwing at” and fighting or unsportsmanlike conduct occurred and found that they co-occurred only 14 times in the study period. That said, I have not yet looked at other in-game circumstances, such as hit by pitch, throwing behind, or bench-warnings, that may have led to these ejections. This remains an obvious area for future research. Umpires’ assessment of the intent of “throwing at” cannot be separated from these factors, and so this analysis is preliminary.

But what if they deserve it?

I chose not to look at video of ejections or to rate the degree of offense that led to an ejection. I made this decision for a number of reasons. First, there isn’t audio of either the umpire or the player at home plate that would allow me to hear what each said to the other. Earlier this year, Cubs catcher Willson Contreras reported that home plate umpire Greg Gibson said something to him that Contreras declined to repeat, and demanded Contreras thank him for a timeout, which didn’t lead to an ejection when he reacted, only because a bench coach intervened. So, if a player reacts like he was told something unrepeatable — or if an umpire tosses a seemingly docile player suddenly for saying something unrepeatable — video analysis will not fully reflect what transpired.

Perhaps more importantly, I didn’t feel comfortable or qualified to assess player or umpire behavior, and ascribe a value judgment to it. What constitutes aggression — whether tossing your bat down after being called out on strikes connotes disappointment or disrespect — is inherently subjective. I didn’t classify any ejections as warranted or unwarranted — the UEFL does for some data, but a lot is classified as “unknown,” which, especially in the absence of audio, it is.

Future work on ball and strike calls leading to ejections will examine whether the umpire or player was correct in their opinion on if a pitch was a strike for both the at-bat that led to a player’s ejection but also for previous at-bats that game, and for the game as a whole. Umpires are within their rights to eject players who argue balls and strikes. But when we consider how many ejections are a result of these disagreements, it is important to correctly assess umpire’s calls and improve the quality of those calls. It won’t eliminate player-umpire conflict entirely; some bad calls will remain, and some players will argue correct calls. But a better zone may help to reduce the conditions that lead to ejections.

Other avenues for future investigation could include whether the player was facing the umpire at the time of or immediately preceding his ejection, as it might indicate escalation or de-escalation on the player’s or umpire’s part, and whether the player’s manager or a member of the coaching staff entered the field of play in an attempt to intervene in the player-umpire interaction, as it could indicate an escalation in that interaction. However, there are limitations to this sort of analysis, particularly the judgment on the part of the researcher. It would therefore be advisable, if the goal is to determine the appropriateness of ejections so that umpires and league officials have actionable examples to use for further training, to assemble a diverse research team that could crosscheck one another’s assessments.

Umpires and Umpire Ethnicity

Another question that arose from my preliminary data analysis was if this were a case of a few umpires more likely to eject players, or if there was a pattern based on individual umpires or umpire ethnicity. The 860 ejections examined came from 106 ejecting umpires. Umpiring crews work in (generally) three-game sets and are randomly assigned to various stadiums and series.

Examining the data, some umpires do eject more players than others, but it’s difficult to control for the number of games worked, and games in which there are brawls and mass ejections. Of the 106 umpires in the data set, only 33 have more than 10 ejections in the study period, with Bob Davidson and Marty Foster tied with an umpire-leading 22 ejections.

For the umpires with 10 or more ejections, there’s a range in the percentage of ejections that were of nonwhite players, from 17 percent (Hunter Wendelstedt) to 73 percent (Jeff Nelson). Of Davidson’s 22 ejections, 55 percent were of nonwhite players; of Foster’s, 59 percent. That said, the caveats above make it hard to discern any real patterns in this data, other than offering an opportunity to look at an individual umpire’s history when they do eject a player.

I also controlled for umpire ethnicity. Between 86 and 93 percent of umpires are white, and most umpires eject multiple players per season. Umpire ethnicity does not appear to be related to the ethnicity of players they eject — that is, nonwhite umpires do not eject, proportionately, more or fewer white or nonwhite players than their white peers (p=.23, so there was no significant difference between predicted and actual outcomes).

Actual Ejections of White & Nonwhite Players by Umpire Ethnicity Umpire White or Nonwhite Nonwhite Player Ejections White Player Ejections Nonwhite 38 46 White 404 372

Predicted Ejections of White & Nonwhite Players by Umpire Ethnicity Umpire White or Nonwhite Nonwhite Player Ejections White Player Ejections Nonwhite 43.17 40.83 White 398.83 377.17

So about 90 percent of ejections were done by white umpires, proportionate to umpires representation in the umpiring corps.

The specific circumstances of an ejection are, ultimately, determined by the umpires and their biases — implicit or explicit. Consideration of bias isn’t restricted to player ethnicity. Some umpires are known to be more biased toward pitchers; that is, they are likely to call a less hitter-friendly strike zone. Future analysis hopes to incorporate ratings of umpire accuracy in calling balls and strikes, and use that to consider the number of ball and strike-related ejections.

That said, there are not many nonwhite umpires, which is an indication that there are problems with the umpire pipeline itself, one that others have written about extensively.

In all of this, what we can’t know is if white players are not penalized for doing the same things nonwhite players do that lead to the latter’s ejections. Ejections are, inherently, outcomes; non-ejections aren’t. A player could disagree with an ump, as Goldschmidt did, and not be ejected, and without observers making particular note of that interaction, there’s no way to tell how frequently those types of disagreements occur.

Conclusions and Future Considerations

One question underlying this analysis is if nonwhite players actually perform more “ejectable” actions than white players — and who gets to say what constitutes “ejectable” or ‘non-ejectable’ behaviors is never without bias.

Umpires do have biases: toward pitchers, toward specific parts of the strike zone, likely toward (or against) specific players. And even the absence of bias from specific white or nonwhite umpires does not mean that there is an absence of bias with umpires overall. The low percentage of nonwhite umpires is, itself, a problem for baseball, particularly when there are often complex differences in cultural communication between umpires and the increasingly nonwhite population of players.

There is no recourse for players or teams on balls and strike calls, the calls that most frequently result in player ejections. As such, I plan to take a look at the fidelity of umpires to the actual strike zone for at-bats that result in ejections. In the cases of Rendon and Goldschmidt, it was clear Foster’s call was incorrect, but how often incorrect calls lead to ejections, and in which circumstances, is worthy of additional exploration. Additionally, I have started exploring if ejections are more likely to result from specific team matchups through building network maps such as the one seen below. (Each line represents ejections between opponents with the thickness of the line denoting the number of ejections.) I’m also examining whether crowd size has an effect on ejections.

Baseball is currently wrestling with its own moment in history. How a game steeped in traditions largely dictated by white players and officials responds to its increasingly nonwhite player population will likely determine its future relevance, especially with what MLB hopes will be an increasingly diverse audience. The game has changed and will continue to do so.

MLB should therefore investigate the biases of its umpires and work to address inequitable outcomes for nonwhite players — in this case, being ejected at higher rates than their white peers — in a systematic way, whether that’s through a revised appeals process, a more robust and transparent umpire evaluation system, or another means of ensuring that a strike is, in fact, a strike. As long as the rules of the game require umpires’ judgement — for both interpretation of player action and enforcement of the rules of play — we should interrogate the role of biases that underpin those judgements to ensure a fair game for everyone. Taking a systematic approach benefits all players regardless of ethnicity — and as well as fans who just want to see their favorite player stay in the game.

Many thanks to Brian Mills and Laura Shir for their edits and encouragement, as well as to the FanGraphs community for its thoughtful comments on my preliminary analysis.