Well, fellow Sounders, that was an ugly one. This weekend we saw our boys fall to New York City in what was perhaps the sloppiest game I can remember watching on TV. They seriously might as well have been playing at “all-weather fields” at Marymoor Park in Redmond. (If you don’t know what I’m talking about, consider yourself lucky to have never played rec league soccer in the Seattle suburbs in the ‘90s.) Besides puddles, this game featured mediocre performances from some. But it also featured something worse than either of those things: a highly questionable decision from the referee in which it looked like our defender made a clean tackle, but the ref called a PK.

If you’re like me, you immediately pulled up the match details to remind yourself who this referee was, expecting to see a name like Salazar or Geiger (it was Sibiga). In fact, whenever a game-changing bad call hits us, it piques my suspicion of that referee’s intentions and qualifications. Lately, I’ve found myself wondering if some referees just enjoy making tons of game-changing calls, and whether any of them are as biased against certain teams as they seem in my subjective mind. It’s about time somebody got the data to test these hypotheses!

Welcome back to Sounder Data, the semi-regular series where I use data, statistics and visualizations to explore burning questions about the Sounders and MLS. This week I tackle a long-heated issue: referee biases and their tendencies to make game-changing calls.

Fortunately, I already had data on fouls, cards and PKs from this previous analysis I did on the dirtiest teams in MLS. The dataset includes boxscores for all 2,962 MLS games dating back to the start of the 2013 season (when they apparently started keeping track). All I had to do next was the simple task of digging up the referees for each of those games. Long story short, MLS keeps track of that too and I’m trained in the dark arts of web-scraping. With this, I can surely find a way to assess which refs empirically have a grudge against the Sounders and which ones have it out for other teams.

One thing I could do is just count the number of “game-changing calls” per game that the Sounders have endured from each ref, like the table below. Here, I’ll define “game-changing calls” as red cards, second yellows or PKs, and I’ll display calls per 10 games to make it more intelligible.

Game-changers Referee Game-Changing Calls Games Calls per 10 Games Referee Game-Changing Calls Games Calls per 10 Games Nima Saghafi 2 1 20 Fotis Bazakos 1 1 10 Geoff Gamble 1 1 10 Jorge Gonzalez 2 3 6.7 Ricardo Salazar 5 9 5.6 David Gantar 1 2 5 Sorin Stoica 1 2 5 Allen Chapman 3 7 4.3 Chris Penso 2 6 3.3 Ted Unkel 1 3 3.3 Silviu Petrescu 3 10 3 Baldomero Toledo 4 18 2.2 Drew Fischer 1 5 2 Alan Kelly 2 11 1.8 Hilario Grajeda 2 11 1.8 Ismail Elfath 2 11 1.8 Armando Villarreal 1 6 1.7 Juan Guzman 1 6 1.7 Mark Geiger 2 12 1.7 Jair Marrufo 1 14 0.7 Edvin Jurisevic 0 5 0 Ioannis Stravrides 0 1 0 Jose Carlos Rivero 0 3 0 Kevin Stott 0 11 0 Robert Sibiga 0 2 0

But there are a number of problems with this. First is that some refs have only blessed us with their presence once or twice. For that reason, looking at the rate per game is clearly better than looking at the total number. But there are still some sample size issues to worry about and we should probably just ignore the first three rows of the table, among others. More important than sample size, though, is a problem called confounding. Some refs just tend to give out more cards and call more PKs than others. For example, just because Salazar is high on our list doesn’t mean he hates us. Maybe he does this to everyone! Furthermore, if I’m talking about expanding this analysis to every team in the league, there’s the problem that some teams foul more than others, as I showed in the aforementioned article. To put some real numbers to this, here’s the average number of game-changing calls each referee makes in a 10-game span:

I’ve also estimated statistical confidence intervals for each referee to show uncertainty based on variance and sample size. What’s clear is that a few refs (Ted Unkel and Sorin Stoica) are really eager to imprint themselves on games. Unkel makes an average of 4.7 game-changing calls in a 10-game span! That’s huge considering he’s refereed games for clean and dirty teams alike. This, I think, is notable in its own right, but not the entire purpose of this analysis…

So if we want to know whether a particular referee is biased, we need to control for how card-happy he is. We also, as I alluded, need to do this on a team-by-team basis, since some teams love to commit fouls. To satisfy both analytical needs, I set up a regression analysis that accomplishes just that. I used negative binomial regression to estimate the number of game-changing calls that, empirically, one would expect to see for a given team and a given referee.

For example, if Atlanta (the team who now has the most game-changing calls per game against them) has a game refereed by Unkel, you would expect a huge number of calls just based on the history of that team and the history of that ref. DC United (infrequent reds, yellows and PKs conceded) refereed by Jair Marrufo (effectively bottom of the ref list) would be expected to have very few.

Cutting to the chase, I did this for all teams and all refs. I used the regression to compute the expected game-changing calls for each team/referee pairing, and then compared it to the observed game-changing calls for the same pairing. Here’s what it looks like for Seattle:

The x-axis here is the expected number of game-changing calls each ref would be predicted to call against us (per 10 games), based on both their history as a referee and our history as a team (since 2013). The y-axis is the number of game-changing calls each ref has actually made against us in all of our recorded meetings. Sample sizes are in parentheses. What’s important is that the diagonal line shows “equivalence”, or the cases where a referee did exactly what we would expect empirically. For example, Chris Penso would be expected to make about 3.28 game-changing calls against us in 10 games, and he has actually made exactly 3.3, which is really close. To put it simply, this means he doesn’t hate us.

But there are some points that are above the line. I’ve left out any ref with <4 Sounders games under his belt, but there’s still Salazar, Chapman and Petrescu up there. Ricardo Salazar is, of course, the highest. Even though Salazar gives out lots of game-changing calls (and hence has a high expected value on the x-axis), he tends to give the Sounders an excess of game-changers by the amount of nearly 2 per 10 games. At the other end, even though Petrescu doesn’t give out lots of game-changing calls (he’s low on the x-axis), he has had the tendency to dish out an extra one to us every 10 games. Also of note, many refs are below the line, which means they actually seem to go easy on us.

Last weekend’s nemesis, Robert Sibiga, doesn’t appear on the graph because that was only his third game. It’s worth noting though that the PK he awarded to Villa was the first game-changing call he’s made against us, putting him at 3.3 game-changers per 10 games, a bit higher than his expected value of 2.5.

Salazar, Sibiga and the lot might be above the line because they’re biased, but it also might be chance. We all know that some games just kind of randomly get out of hand and it’s not the ref’s fault. Fortunately, a statistical confidence interval is literally a representation of what might happen due to random chance, so I can just superimpose the regression’s confidence intervals around the expected game-changing calls, like this:

And what do we get? Not much actually. I didn’t superimpose everyone’s error bars just to avoid clutter, but you can see that even Salazar’s confidence interval includes some of the equivalence line. This means that his excess of game-changing calls against us is probably more bad luck than bias. Interestingly, Alan Kelly’s (below the line) error bar doesn’t include any of the equivalence line at all. There’s a good chance (95% chance to be exact) that he’s biased in our favor!

But what about other teams? Here’s a graph of the whole league:

Each blue dot is a referee-team pairing. I highlighted the Sounders in green, just for reference to the graphs above. I also labeled the two craziest ones: Sorin Stoica vs Columbus, and Ted Unkel vs RSL. Somehow, Unkel has managed to exceed his expected value of 4.8 by an enormous margin. It’s not just a sample size thing either; that confidence interval doesn’t even come close to the equivalence line. The same goes for Stoica too, when he’s reffing a Columbus game. I’m comfortable saying that this is biased refereeing. We should all feel bad for our brethren in Ohio and Utah when they have to weather the storm of these two guys.

They aren’t the only two refs-team pairings who are statistically significant though. Ted Unkel goes off the charts when he’s up against Columbus too (poor Columbus), just a little less so than Stoica. Just to be thorough, I tallied up the number of times a referee is statistically significantly above the line, shown in the table below.

Low and behold, our boy Sibiga tops the list! There are six teams in this league to whom Sibiga dolls out significantly more game-changing calls than one would expect based on the other refs’ history with those teams, and his history with other teams. Those poor souls are DC United, LA Galaxy, NYCFC, Philadelphia Union, Real Salt Lake and Vancouver Whitecaps.

This guy seems to have a statistical tendency to develop grudges against teams. Let’s hope we didn’t just start down that path.

Bias Referee Number of Teams He Appears to be Biased Against Referee Number of Teams He Appears to be Biased Against Robert Sibiga 6 Hilario Grajeda 5 Allen Chapman 4 Jair Marrufo 4 Jorge Gonzalez 4 Kevin Stott 4 Armando Villarreal 3 David Gantar 3 Geoff Gamble 3 Jose Carlos Rivero 3 Matthew Foerster 3 Ted Unkel 3

Limitations

I always try to point out the limitations of my own work. Here are the big ones: