This work was presented first at the 2017 Ottawa Hockey Analytics Conference, for which you may view slides and video.

I propose a new framework for evaluating goaltending performance, taking into account the difficulty of shots faced as well as the quality of skaters playing for both teams. This is the first non-trivial fragment of what I imagine will be a long sequence of articles.

Aim

Any method for evaluating goalies should be:

Fair : it should ascribe as much responsibility as possible to the goaltender in question for what they do and as little as possible for what they do not do.

: it should ascribe as much responsibility as possible to the goaltender in question for what they do and as little as possible for what they do do. Simple : no matter how complicated it is in inputs or adjustments or subtleties of design, it should produce single numbers for goaltenders which can be compared.

: no matter how complicated it is in inputs or adjustments or subtleties of design, it should produce single numbers for goaltenders which can be compared. Extensible : as new data (such as puck and player tracking data, skater and goaltender posture measurements, fatigue information, and so on) becomes available, we should be able to include it in the existing framework without having to go back to the drawing board.

: as new data (such as puck and player tracking data, skater and goaltender posture measurements, fatigue information, and so on) becomes available, we should be able to include it in the existing framework without having to go back to the drawing board. Applicable : the many things being measured should give some insight to those for whom goaltending is of daily concern: for goalie coaches to know what things might merit extra attention, in addition to the more obvious value to managers and fans.

: the many things being measured should give some insight to those for whom goaltending is of daily concern: for goalie coaches to know what things might merit extra attention, in addition to the more obvious value to managers and fans. Repeatable: the measures extracted from past performance should correlate as well as possible with future values.

Model

The gentle reader will decide for themself how successful I have been so far in the pursuit of these goals but I feel I have grasped a foothold. In this first article I mainly describe the broad framework but I mention two related applications: adjusting for quality of skaters, and adjusting for quality of shots faced.

The key technical tool I use in our framework is a simple model of play in the defensive zone, from the goalie's perspective. I shoehorn all play into seven states:

Shot : a shot that the goaltender will have to deal with somehow. For this purpose blocked shots are not considered shots, but missed shots, saved shots, and goals are.

: a shot that the goaltender will have to deal with somehow. For this purpose blocked shots are considered shots, but missed shots, saved shots, and goals are. Freeze : when the action of the goaltender causes play to stop. This includes goalies catching or otherwise smothering the puck with their equipment, as well as shots deflected out of play by the goaltender or the goalframe. It does not include faceoffs caused by the skaters (such as when they take penalties or otherwise put the puck out of play in a way that does not involve the goaltender).

: when the action of the goaltender causes play to stop. This includes goalies catching or otherwise smothering the puck with their equipment, as well as shots deflected out of play by the goaltender or the goalframe. It does include faceoffs caused by the skaters (such as when they take penalties or otherwise put the puck out of play in a way that does not involve the goaltender). Contested : when the puck is not known to be in the clear possession of either team.

: when the puck is not known to be in the clear possession of either team. Goal : when the puck is in the net.

: when the puck is in the net. Attackers : when the attacking team has clear possession of the puck.

: when the attacking team has clear possession of the puck. Defenders : when the defending team has clear possession of the puck.

: when the defending team has clear possession of the puck. Safe: when the puck is no longer in the defensive zone at all.

The gory details and most of the modelling judgment come in with how I tabulate transitions between these states given the play-by-play information to which we have access. These details are sufficiently gruesome and lengthy that I've included them in an appendix at the end of this article.

The results for the league in 2016-2017 as a whole are:

Shot Freeze Contested Goal Attackers Defenders Safe Shot 21.1% 25.7% 47.7% 5.5% Freeze 49% 49% 2% Contested 41% 59% Goal 100% Attackers 46% 54% Defenders 35% 65% Safe 100%

Adjusting for Skater Quality

The blank entries indicate zero transitions of that type were recorded, mostly because I imputed other transitions around them. The aggregate effect is to only permit certain transitions: for instance, all transitions from "Contested" are either to "Attackers" or "Defenders", notice how defenders consistently win the bulk of such non-static puck battles. On the other hand, when goalies freeze pucks after shots, the result isa faceoff, for which the attacking team has a slight edge, but in a small fraction (3%) of cases the puck is immediately taken out of the zone to "Safety" by the linesfolk, because of scrumming attacking skaters or penalties.

The primary benefit of shaping information about defensive zone play into such a matrix as the above is that we can gain insights from simple computations using the matrix. For instance, notice that there are two "terminal" or "absorbing" states, that is, "Goal" and "Safe". These are "terminal" in the sense that I consider any play after them as totally distinct from the previous play. (We know that this is not quite true---after all, any goal scored changes the score, which we know strongly affects some aspects of play, and we know that some clearances of the puck to "safety" are actually very bad plays which give up control of the puck with minimal benefit. These concerns will have to wait for another day.) By taking a very high power of our transition matrix, we can compute the "eventual goal probability" starting from any state, that is, the chance that, starting from a given state, the puck will wind up in the net before the defenders manage to clear it. For instance, the eventual goal probability for "Shot" is 10.2%, around double the immediate probability of scoring on any given shot.

My broad opinion is that the goalie ought to be considered mostly responsible for all of the entries in the "Shot" row and not in any way responsible for the entries in any of the other rows. However, looking at transition matrixes for individual goaltenders, even for full seasons, shows significant differences in the "skater" rows. For instance, Phillipp Grubauer played twenty-four games for Washington in 2016-2017, his matrix is below:

Shot Freeze Contested Goal Attackers Defenders Safe Shot 19.2% 25.4% 51.1% 4.3% Freeze 51% 46% 3% Contested 38% 62% Goal 100% Attackers 43% 57% Defenders 32% 68% Safe 100%

In virtually every skater entry the results in front of him are more favourable than league average. Since the Capitals won the Presidents' Trophy in 2016-2017, it should hardly be surprising that his team won more puck battles, cleared more rebounds, and broke out of the zone better than the league average.

By comparision, consider Mike Smith, who played fifty-five games for Arizona in 2016-2017. His matrix is below:

Shot Freeze Contested Goal Attackers Defenders Safe Shot 22.1% 28.8% 44.1% 5.0% Freeze 49% 48% 3% Contested 43% 57% Goal 100% Attackers 52% 48% Defenders 36% 64% Safe 100%

In almost every skater entry we see results that are below league average. This again is not surprsising, as the Coyotes finished third-last in 2016-2017. They are consistently more susceptible to opponent's forechecking, win fewer puck battles, and break out in transition less.

The key idea for equalizing results across different skater contexts is to form the transition matrixes for the goalies we want to compare, and then replacing all of entries in the "skater" rows (that is, every row except the "Shot" row) with league-average values. This produces a transition matrix which I imagine as representing what would transpire if the goalie in question were provided with league-average skater context instead of the teammates and opponents they actually faced. Then, by computing the long-run probability of a shot being converted into a goal, we can compare two goaltenders more fairly. For the given pair of goaltenders, the immediate goal-per-shot figures favour Grubauer---4.3% to Smith's 5.0%. Moving to eventual goal probabilities, Grubauer's figure is 7.4% and Smith's 10.2%, where Smith's weak teammate support becomes very clear. After replacing their skater contexts with league average ones, Grubauer's "skater-independent eventual goal probability" is 7.9%, whereas Smith's is 9.5%. By this measure, Grubauer's performance was actually stronger than Smith's, even after accounting for the differences in skater quality.

Adjusting for Shot Quality

So far I've treated all of the transitions from the "Shot" state to be the responsibility of the goaltender. However, not all shots are equally easy to handle. This difficulty might be smoothed over if every goaltender faced a similar distribution of shots in each year but this is not what we observe in the NHL. For instance, Devan Dubnyk played sixty-five regular-season games for Minnesota in 2016-2017, facing the following pattern of shots:

Blue regions indicate fewer shots (than league average) per hour of 5v5 play and red regions show areas from which he saw more shots per hour. On the other hand, Mike Smith in Arizona (55 games played) saw the following distribution of shots:

Which strongly suggests that some accounting should be made to handle difficulty of shots faced.

To accomplish this, I replace the single "Shot" state with a family of states, one for every recorded shot location. I divide the defensive zone into a 100 by 100 grid, roughly corresponding to the recorded precision of the NHL's real-time stats. This changes the "Shot" state into ten thousand shot states, and our seven-by-seven transition matrixes become 10,006-by-10,006 matrixes, which makes them harder to look at but not appreciably harder to compute with. Then, we can compute a Standardized Shot Profile, that is, the relative likelihood of facing shots from given locations for a league-average goalie. Graphically, it looks like this:



Where the colour units indicate relative frequency. By encoding this standard shot profile as a matrix we can pre-multiply our observed transition matrixes by it and obtain what I call Standardized Goals Against or sGA, that is, the number of goals that a given goalie would allow per hundred shots if they faced a typical distribution of shots, calculated from how they performed on the shot distribution they did face. Similarly, we can derive "Standardized Freeze Rates", "Standardized Shot-to-contested Rates", and so on, though these quantities seem less interesting to me.

In our example above, we were comparing Dubnyk and Smith; their immediate goal probabilities (adjusting nothing at all) were 4.8% and 4.2%, respectively. However, Dubnyk's sGA for this season is 6.7, and Smith's is 4.5---unsurprisingly, Dubnyk's expected performance versus league-average shot quality is worse than observed. Somewhat less expectedly, Smith's expected performance in percentage terms also drops slightly, but the relative distribution of the shots he faces is not so different from league average, he simply sees lots more from every location.

The two adjustments described here (replacing skater terms with league averages and shot standardization) can be combined to obtain a stat that I call "sGA*".

Repeatability

If we want to imagine that our statistics are useful measures of skill than we hope that they will be repeatable, that is, future values should be related to past values. I computed correlations for several stats, including the two (sGA and sGA*) introduced here, using "career to date" as the past value and "following twenty-five games" as the future value. I tried to imagine a plausible scenario as it might appear to a decision-maker at a hockey team: which goaltender shall I (primarily) play for the next twenty-five games? Many fewer than twenty-five games risks being lost in noise completely, many more games risks disconnecting from practice. Computing Pearson correlations in this way for all goaltenders over the past decade gives:

Stat Correlation sGA 0.185 sGA* 0.135 xGA-GA per shot 0.211 All-situations save % 0.211 5v5 save % 0.115

There are a number of surprises. Most surprisingly, the least repeatable measure of goaltending talent is 5v5 save percentage, which is one of the most popular measures in the analyticky circles in which I travel. The third entry is based on Emmanuel Perry's expected goals model, which assigns to every shot a goal probability based on its type and location. Forming the difference between expected goals allowed and actual goals allowed and then dividing by the number of shots puts this notion on the same arithmetic footing as the other ones, allowing for comparisons. It is the most sophisticated existing model for goaltending evaluation to date so it's not surprising to see it perform well here; what is much more surprising is the equally strong repeatability from all-situation save percentage, which indiscriminately buckets together shots from all different contexts. I suspect that there may be a sort of survivor bias influencing results here; since all-situations save percentage appears to be the most common evaluating tool among NHL decision-makers over the past decade (with "consistent" goaltenders especially prized), perhaps there is artifically less variance in this measure.

I am sufficiently heartened by the repeatability of sGA to publish this article and to push forward with further work in this vein; however, the weakness of the repeatability of sGA* makes me think this latter stat might not be worth applying immediately.

Future Work

A project this size will take, I expect, several years and there is much left to do. The most obvious next steps to me are:

Non-trivial Priors: Instead of assuming that goaltenders are a blank slate, we should instead begin with a prior expectation of their ability that is sensible. League-average results would be one plausible start, some suitably defined "replacement level" would perhaps be better still. Very sophisticated implementations might use different priors for different goalies using clever manipulations of data from other leagues.

Instead of assuming that goaltenders are a blank slate, we should instead begin with a prior expectation of their ability that is sensible. League-average results would be one plausible start, some suitably defined "replacement level" would perhaps be better still. Very sophisticated implementations might use different priors for different goalies using clever manipulations of data from other leagues. Analysis of shot types: While the difference between what NHL play-by-play calls a "snap" shot and what it calls a "wrist" shot may be entirely negligible in practice, the same is presumably not true for slap shots or for deflections, and so on. I do not see how to accommodate this information yet; replacing 10,000 shot states by six or seven times as many (which I have tried) leads to unpleasant renormilization difficulties to account for how each individual shot state is very poorly supported in data for single goaltenders even with very large samples of games.

While the difference between what NHL play-by-play calls a "snap" shot and what it calls a "wrist" shot may be entirely negligible in practice, the same is presumably not true for slap shots or for deflections, and so on. I do not see how to accommodate this information yet; replacing 10,000 shot states by six or seven times as many (which I have tried) leads to unpleasant renormilization difficulties to account for how each individual shot state is very poorly supported in data for single goaltenders even with very large samples of games. Vision and pre-shot movement: At present we have only fragmentary data, manually gathered at great expense, largely by volunteers, concerning how much the goaltender can see and how much and in what way the puck moves in the few seconds immediately preceding shots. What little data we do have about these factors suggest that they strongly affect goal probabilities, however, so modifications will have to be made for such data when it becomes available.

At present we have only fragmentary data, manually gathered at great expense, largely by volunteers, concerning how much the goaltender can see and how much and in what way the puck moves in the few seconds immediately preceding shots. What little data we do have about these factors suggest that they strongly affect goal probabilities, however, so modifications will have to be made for such data when it becomes available. Zone Entries: I've chosen to work primarily from a shot context (influenced by xG and save percentage, which do so also) but one could instead use the framework I've introduced here to work from a zone-entry context, considering transitions corresponding to "carry ins", "dump and change", "dump and forecheck", and so forth.

2016-2017 Results

In any event, I am sufficiently happy with sGA that I will be computing it for past and future goaltending performances and quoting it on the site. The future work I mention here will take a long time and I welcome the assistance of those who are interested in accelerating that progress.

The graph below shows the raw goals per hundred unblocked shots and the sGA for all goalies who played in at least fifteen regular season games in 2016-2017.



Goalies who appear above the red line would have posted better results had they faced a league-average shot profile instead of the profile they did face; and those below would have posted worse. The players are coloured by their teams; goalies who played for multiple teams are shown in white.

Appendix: Transition Imputation

These are the details of how I took the NHL play-by-play and coerced what is written there into transitions for my model. First of all: no transitions were considered when the goalie under consideration was not in the net, no matter what the play-by-play events. That said, there are two stages of model design, one is encoding of states:

If a team took a shot on the goalie of interest that was scored, saved, or missed, that was recorded as a "Shot" in my sense. A blocked shot was recorded as "Attackers have the puck" but not as a shot.

Events labelled "GOALIE STOPPED" in the play-by-play are recorded as "Freeze". (Note the imputation about transitions to freezes below though)

Faceoffs are coerced into either "Defenders have the puck" or "Attackers have the puck" depending on who wins them.

Hits are labelled as transitions from "Attackers" or "Defenders", depending on who was hit, to "Contested".

Giveaways and takeways are coded as transitions from "Attackers" to "Contested" to "Defenders", or inversely depending on who began and ended the play with the puck.

Icings were coded as "Defenders have the puck" but were not coded as "puck to safety" since play resumes in the defensive zone in almost all such cases.

coded as "puck to safety" since play resumes in the defensive zone in almost all such cases. Everything else was recorded in the obvious way.

When "Shot" appears followed by "Attackers" or "Defenders", a transition through "Contested" is imputed.

When "Shot" or "Attackers" is followed by "Safe", transitions through "Contested" and then "Defendes" are imputed.

When "Attackers" or "Defenders" is followed immediately by "Freeze", the freeze is replaced with a "Contested", since I only want to consider goalies freezing the puck from Shots and not all faceoffs.

and not all faceoffs. When "Defenders" is followed by "Shot", I insert transitions through "Contested" and then "Attackers".

When "Defenders" is followed by "Attackers", or vice versa, I insert a transition through "Contested".

Once all of the states are encoded in this way some additionals states are imputed, with accompanying transitions: