With advanced hockey statistics, as with almost anything, this rule generally applies: the more an explanation is needed for something, the fewer accessible explanations there are out there.

In keeping with this rule, there may be more blog run-downs of what corsi/fenwick are then there are hockey blogs. There are many fewer explanations available for slightly complicated relative statistics, and fewer still for teammate relative numbers. For the intricate methodology of those H.E.R.O Charts you see all the time on hockey blogs, the only explanation is on the site’s FAQ page (and, try as it might, that explanation isn’t the easiest for beginners to understand).

Let’s start with relative stats and slowly work our way up.

To get a player’s relative performance in a statistic (let’s use Corsi as an example), you take that player’s Corsi For%, and subtract his team’s Corsi% from when he’s not on the ice. For instance, Sidney Crosby had a 5v5 Corsi For% of 53.8%, and the Penguins had a 5v5 Corsi For% of 48.5% when Crosby wasn’t on the ice. Subtracting 48.5 from 53.8, we find that Crosby has a relative 5v5 Corsi For% of 5.3.

Of corsi, there are problems with relative statistics. The most prominent issue is that players’ statistics will be more heavily influenced by their particular linemates than by their team as a whole; a player on a great corsi team but with bad linemates will be unduly punished, and a player on a poor corsi team but with great linemates will be undeservingly rewarded.

This is where teammate or linemate relative statistics (they both mean the same thing) come in. A teammate relative statistic is the same as a relative statistic, except you wouldn’t subtract the Penguin’s raw corsi for% from Crosby’s numbers. Instead, you subtract the combined corsi for% of all the guys Crosby played with (not including their corsi from when they were on the ice with Crosby). This total is weighted for how much time Crosby spent with each player. You can find these numbers at stats.hockeyanalysis.com, by clicking ‘report’ and choosing ‘On-Ice Corsi stats’.

As you can imagine, teammate relative statistics (which I’ll just call TM Rel from now on) are better measures. I wish I could tell you that they’re so much better, so perfect, that once they were developed nobody ever felt the need to make more complicated statistics. But I can’t truthfully tell you that.

Unfortunately, straight-up TM Rel stats don’t take any context into account. The major contextual factors that affect a player’s statistics (that we know of and can measure) are the current score of the game while the player is on the ice, and the venue that they’re playing at.

Score affects the numbers because teams behave differently in different situations. When a team is behind, they tend to be aggressive, pushing possession and throwing everything on net; when a team is ahead, they tend to be passive, conceding possession and trying to ‘hold down the fort’.

Venue affects the numbers because each arena has its own official scorer. Some are stingier than others in what they consider a shot on goal or a shot attempt.

These contextual issues are not all that’s wrong with TM Rel stats, either. One of their major defects is that if two players spend the vast majority of their time together, their numbers can get conflated.

For example, look at Conor Sheary’s season. Sheary spent almost all of his 5v5 ice time with Sidney Crosby, so if a player was on the ice with Sheary, they were probably also on the ice with Crosby. Since Crosby is such a good possession player, this all means that the numbers of Sheary’s other linemates were sure to rise when they were with him. His TM Rel numbers would be good regardless of how good or bad he actually was.

The opposite problem can also arise with TM Rel stats. For example (since he was the Penguins’ 3rd line Center), when Nick Bonino’s linemates weren’t on the ice with him, they were probably on the ice with Sidney Crosby or Evgeni Malkin. This means that rather than telling you how much better or worse Bonino makes his linemates’ Corsi, TM Rel Corsi is just telling you how much worse Bonino is than Crosby and Malkin. Bonino’s TM Rel Corsi may portray him as worse than he really is.

Domenic Galamini attempted to address all of these issues (both contextual and inherent in TM Rel statistics), as well as account for quality of competition with the latest addition of his H.E.R.O Charts.

The charts adjust for score and venue. They also have their own unique methodology for measuring the quality of a players’ linemates and opposition, and uses regression to deal with the inherent problems of TM Rel Corsi. These additions to normal corsi stats are why the numbers on the charts are given their own name, ‘shot suppression’ and ‘shot generation’. Galamini explains this methodology in his H.E.R.O Chart FAQ, but there are some things you need to know before you dive into it.

The FAQ describes the use of a dummy variable, which is a variable that splits a group into two mutually exclusive groups. For any shift that happens in any game, HERO charts split all players into A) players that were on the ice, and B) players that were not on the ice. When someone is in Group A, they will be assigned a value of 1 (which will be grouped in with everything from that shift when crunching that player’s numbers); when someone is in Group B, they will be assigned a value of 0 (which will negate everything from that shift when crunching that player’s numbers, since they weren’t on the ice for it).

Galamini mentions stuff like ridge regression and why it’s used. You don’t need to fully understand how that works, but you should understand that it’s used to account for some of the issues with TMRel stats that I talked about before. The FAQ says ridge regression is used to “deal with the fact that multiple skaters spend significant amounts of ice time together, thus making it difficult to delineate their impacts”. This is just the issue that I talked about earlier in this article (the section that begins with “if two players spend the vast majority of their time together, their numbers can get conflated”).

Galamini also talks about things like doing a “10 fold cross validation to tune our lambda parameter”, and using “the glimnet package in R”. I’m a hockey nerd, not a statistics professor, so I’m not touching that stuff. Just remember that Galimini went through those steps to solve the problems we talked about hear, and trust that he did it correctly.

A final word of caution about the H.E.R.O charts: ‘shot suppression’ is not a measure of how good a player is in his defensive zone. Effective offensive Defensemen (like Kevin Shattenkirk) can have fantastic shot suppression numbers despite poor defensive play. They are so good at transitioning from defense to offense that they just don’t spend a lot of time in their own end.

Likewise, some defensive Defensemen can have mediocre shot supression because, as good as they may be on defense, they spend too much time in their own zone.

If you’ve gone through this article and Galimini’s FAQ but still have questions, don’t hesitate to ask. Hopefully, after subjecting yourself to all of this statistical reading, you all feel like Zach Galifianakis’ “The Hangover” character after he mastered card-counting.