Introducing a new stat: Location Adjusted Expected Goals Percentage

August 28, 20138 minute read

If you have been looking into statistics not recorded by the NHL officially, you probably know what Corsi is. From Hockey Prospectus,

“Corsi is essentially a plus-minus statistic that measures shot attempts. A player receives a plus for any shot attempt (on net, missed, or blocked) that his team directs at the opponent’s net, and a minus for any shot attempt against his own net. A proxy for possession.”

The most cited drawback of Corsi is that it treats a shot on goal from point blank and a missed shot from the center line with the same weight. There’s even been talks of players purposely taking more low quality shots in order to game the system, once they hear that their coach or management uses Corsi as a performance metric. I created a new statistic, Location Adjusted Expected Goals Percentage, to fix that.

I will explain what I did to derive this statistic, in laymen terms, without going into too much boring detail because I know math is not everyone’s cup of tea. If you are only looking for super technical stuff, read the Methodology article.

Summary

Before we get into Location Adjusted Expected GoAls Percentage (LAEGAP), we will first explore what I coin Expected Goals For (EGF) and Expected Goals Against (EGA). I collected all available play by play shot data from NHL.com (out of 720 regular season games, only 634 were intact), and calculated the recording bias of shot distances for each arena. Then, I went through all even strength, non-empty net shots and goals and calculated the average shot percentage for each point in the rink. All of the data was then flipped to the east (right) half of the rink and added together if they overlap. After that, each point was regressed with its neighbors, which basically just means it takes a weighted average of itself and its neighbors’ shot percentage. Every available even strength and non-empty net shot and goal was processed and the shot percentage at that location was added to the EGF or EGA of each player on ice, depending if they were they shooting team or being shot at.

In one sentence:

“Expected Goals For (EGF) is the amount of goals a player will be on ice for, if each shot had a shot percentage (chance of puck going in) of the league average shot percentage at the position where it was taken.”

If you are still confused, think of EGF as shot quality multiplied by the amount of shots taken by your team when you are on the ice, and EGA is the opposite.

Now that we have EGF, a measure of on-ice offensive events, and EGA, a measure of on-ice defensive, or lack of therefore, events, we can calculate the difference, because, ultimately, hockey is a game you win by scoring more than your opponent. Simply subtracting EGA from EGF is not good enough. This is because it gives high event players an advantage, or disadvantage, depending if they were a positive possession player. For example, if our imaginary player, John, was on ice during a shot for, at a 10% shooting percentage location, and a shot against, at a 5% shooting percentage location, every shift, for 100 shifts, he would have a EGF of 10, and a EGA of 5, a difference of 5 goals. Now image a 2nd player, Ethan, who had the same percentages every shift, but had 1000 shifts. His expected goals difference/expected +/- would be 50 goals, even though he and John both performed equally, possession wise, every time they were on-ice.

The solution is simple. Calculate the percentage of EGF in the total expected goals events (EGF + EGA). Now, both John and Ethan have the same EG%: 66.7% (10/(10+5) = 100/(100+50)). This raises another problem. What if a 3rd imaginary player, Jacob, had one shift that logged a shot for at 5% before he suffered a season ending pinky toe strain? He would have a EG% of 100% (5/(5+0)). Clearly that’s not sustainable once the sample size increases (more shifts), so how do we differentiate small sample size error margins and actual performance?

Thankfully, the math has already been worked out for us in 1927, by mathematician Edwin Bidwell Wilson. It’s called a binomial proportion confidence interval (referred to as BPCI in this article). You might actually have seen it before. It’s fairly complicated so instead of explaining how it works and how to calculate it, I am just going to explain what it does (If you are interested in reading how it works read the Methodology article). BPCI basically gives a margin of sampling error that a estimation of a probability (EG%) will have. You might have seen something like “40% of surveyed voters will vote for Mr. Lincoln as President” in your local newspaper, and on the bottom it would say something like “All results have a margin of error +/- 5%”. This means that the surveyor is confident that the actual result of the votes for Lincoln would be anywhere from 35% to 45%. As the sample size increases, this margin lowers. The level of confidence you want in the estimation also will affect the margin (technical term for the margin is coverage probability). The lower confidence you are willing to accept, the smaller the margin. Generally a 95% confidence is used, and it is what we will be using. To allow for easy sorting, we will take the lowest value possible in the interval. By taking the lowest possible value we will undervalue a player, particularly low event ones, much more often than we will overvalue them, which I think is the better of the two.

In one sentence:

“With the data I have, what is the lowest possible true EG% I will get, 95% of the time?”

Phew. Now that we got all the math out of the way, here comes the fun part. Who excels the most in this new performance metric? Here is the top 30:

name position team games laegap gfpub expected_gf expected_ga shots_for shots_against shot_differential Dan Boyle Defenseman San Jose Sharks 41 0.493526 0.729867 37.9177 23.3686 437 312 125 Brendan Gallagher Right Wing MontrŽal Canadiens 41 0.492882 0.769807 27.3263 15.1572 311 186 125 David Clarkson Right Wing New Jersey Devils 43 0.491641 0.758562 29.4998 16.9234 340 209 131 Jonathan Toews Center Chicago Blackhawks 43 0.487849 0.731771 35.4267 21.9762 388 272 116 Jake Muzzin Defenseman Los Angeles Kings 39 0.485027 0.757977 28.0701 16.3438 356 200 156 Max Pacioretty Left Wing MontrŽal Canadiens 41 0.484933 0.760313 27.5553 15.9449 336 203 133 Marian Hossa Right Wing Chicago Blackhawks 36 0.475947 0.751286 27.4272 16.5591 294 207 87 Joe Thornton Center San Jose Sharks 43 0.472452 0.733587 30.4976 19.4209 332 248 84 Patrick Marleau Left Wing San Jose Sharks 43 0.472194 0.732203 30.7604 19.6678 360 248 112 Brandon Saad Left Wing Chicago Blackhawks 42 0.468371 0.729452 30.411 19.7317 343 246 97 Lubomir Visnovsky Defenseman New York Islanders 29 0.465838 0.741839 27.1036 17.1213 314 205 109 Patrik Elias Left Wing New Jersey Devils 43 0.465605 0.741048 27.2111 17.2328 298 210 88 Justin Williams Right Wing Los Angeles Kings 42 0.463848 0.726299 29.9752 19.7833 373 241 132 Tyler Seguin Center Boston Bruins 42 0.462289 0.717262 31.7253 21.4821 396 275 121 Zach Parise Left Wing Minnesota Wild 43 0.461388 0.718571 31.1538 21.0652 377 281 96 Logan Couture Center San Jose Sharks 43 0.459966 0.72313 29.7073 19.9178 339 254 85 Mark Fayne Defenseman New Jersey Devils 27 0.459596 0.796274 17.7727 9.748 193 136 57 Evgeni Malkin Center Pittsburgh Penguins 26 0.459346 0.770753 21.0059 12.4287 239 159 80 Mikko Koivu Center Minnesota Wild 43 0.456033 0.715287 30.4973 21.0142 375 281 94 Henrik Sedin Center Vancouver Canucks 40 0.452746 0.715632 29.5615 20.4875 350 268 82 Ryan Getzlaf Center Anaheim Ducks 38 0.450954 0.718104 28.5721 19.7529 306 234 72 Anton Stralman Defenseman New York Rangers 41 0.45036 0.707422 30.8343 21.9066 375 274 101 Marc-Edouard Vlasic Defenseman San Jose Sharks 43 0.447783 0.693963 33.4906 24.6981 395 327 68 Patrice Bergeron Center Boston Bruins 36 0.446412 0.726577 25.8487 17.658 346 228 118 Andy Greene Defenseman New Jersey Devils 43 0.445498 0.713643 28.1978 19.9277 344 275 69 Matt Irwin Defenseman San Jose Sharks 35 0.444544 0.715296 27.63 19.4851 337 251 86 Ryan McDonagh Defenseman New York Rangers 41 0.444263 0.676508 37.3756 28.9067 475 369 106 Derek Stepan Center New York Rangers 41 0.442697 0.696188 31.42 23.2883 377 267 110 Alexandre Burrows Right Wing Vancouver Canucks 40 0.441986 0.725098 25.1964 17.432 298 244 54 P.K. Subban Defenseman MontrŽal Canadiens 40 0.441122 0.705541 28.8511 20.9794 360 259 101

games games played EGF Expected Goals For EGA Expected Goals Against Shots For Number of shots on goal for player's team while player is on ice Shots Against Number of shots on goal against player's team while player is on ice Shots Diff Shot Differential = Shots For – Shots Against. Pretty much same as Corsi except missed shots don't count. LAEGAP Location Adjusted Expected Goals Percentage (Could also be called Goals For % Lower Bound) GFPUB Goals For % Upper Bound LAEGAP to GFPUB is the interval in which a player's true GF% lies, 95% of the time.

Does this mean the Hart and Norris trophy should’ve gone to Dan Boyle? Probably not. (The culture that Norris goes to the best offensive defensemen instead of “awarded to the defenseman who demonstrates throughout the season the greatest all-round ability in the position” is something I disagree with, but that’s for another article.) We must remember that just like any other attribute that defines a great hockey player, you can’t just evaluate players with one attribute alone. We must also take context, the most important attribute in evaluating a player, into account. Having the best LAEGAP in the league doesn’t mean anything if you start at the offensive zone 80% of the time playing the opponent’s fourth line or if you spend more time in the box than on the ice. It is also important to remember that this system rewards those who played through higher number of events. Defensemen naturally will be on the ice for more events as they have more ice time than forwards, on average, so they will have an advantage when compared to their offensive counterparts.

For the record, Crosby is 37th on the list.

Interesting Finds

Let’s look at players who surprise us in this top 30. Mark Fayne, Brandon Saad, Anton Stralman, Andy Greene, and Matt Irwin are among the list of traditionally undervalued candidates. Although we always knew Gallagher, runner up of this year’s Calder Trophy, was a great possessive player from his corsi this season, after being adjusted for shot location it turns out he is most likely the best (he does have a 66% offensive zone start though) out of all the forwards. I say most likely because players who played less still have a possibility of increasing their LAEGAP once the sample size increases as they play more games. If all these players had the same arbitrary EGF and have their EGA in proportion, Mark Fayne would actually be on top. (Basicly means that Mark Fayne would be the best possessive player in our calculations if difference in sample size was not an issue and everyone is able to maintain the exact same performance)

LAEGAP vs Corsi

The point of this LAEGAP is to fix the lack of accounting for shot quality in Corsi, so let’s see how we did. Unfortunately, the play by play data where I collected my data from doesn’t include missed shots but I think shot differential is a good enough proxy for Corsi for our purposes.

Let’s first look at those who are discredited by Corsi unfairly. Among those with a negative shot differential (more shots against than for), Devin Setoguchi has the highest LAEGAP of 0.3997 (EGF of 24.4 and EGA of 20.57) even though he had a -25 shot differential. This meant that although he was out shot 25 times, when he was on the ice, his teammates tend to shoot in more dangerous locations than the opponent team, striving for quality over quantity. Here’s the rest of that list (only players with more than 15 games played this season are included):

name position team LAEGAP games EGF EGA shots for shots against shot diff Devin Setoguchi Right Wing Minnesota Wild 0.399679 43 24.4251 20.5732 255 280 -25 David Legwand Center Nashville Predators 0.395207 41 26.2646 23.1492 300 304 -4 Eric Brewer Defenseman Tampa Bay Lightning 0.385438 41 31.2601 30.4075 338 377 -39 Jason Garrison Defenseman Vancouver Canucks 0.385014 40 24.9074 22.6941 320 321 -1 Matt Cullen Center Minnesota Wild 0.384695 37 19.7985 16.6929 210 232 -22 Mike Cammalleri Center Calgary Flames 0.383323 40 22.8856 20.4519 273 281 -8 Bryan Allen Defenseman Anaheim Ducks 0.383293 36 21.6961 19.0404 241 277 -36 Marc Methot Defenseman Ottawa Senators 0.383155 40 29.6553 28.7386 369 370 -1 Martin St Louis Right Wing Tampa Bay Lightning 0.382137 41 29.7007 28.9374 342 347 -5 Brad Stuart Defenseman San Jose Sharks 0.380803 43 30.1785 29.7269 365 379 -14 Wayne Simmonds Right Wing Philadelphia Flyers 0.379129 38 19.5821 16.9153 228 238 -10 Craig Smith Center Nashville Predators 0.379099 37 17.5183 14.4988 186 198 -12 Nicklas Backstrom Center Washington Capitals 0.376915 44 25.418 24.2575 315 328 -13 Nazem Kadri Center Toronto Maple Leafs 0.376833 43 22.8493 21.0802 258 296 -38 Matt Stajan Center Calgary Flames 0.374182 39 22.6381 21.0987 250 258 -8 Cam Fowler Defenseman Anaheim Ducks 0.370258 32 19.7708 17.9313 227 259 -32 Keith Aulie Defenseman Tampa Bay Lightning 0.368474 38 20.4526 18.9473 237 249 -12 Tommy Wingels Center San Jose Sharks 0.367168 37 20.1443 18.6854 222 236 -14 Nick Foligno Left Wing Columbus Blue Jackets 0.364859 41 23.6026 23.3645 267 288 -21 Alex Killorn Center Tampa Bay Lightning 0.360014 34 18.5661 17.3444 201 226 -25 Matt Halischuk Right Wing Nashville Predators 0.357274 32 14.4134 12.3335 147 156 -9 Jaromir Jagr Right Wing Dallas Stars 0.35502 32 19.9777 19.6615 226 229 -3 Eric Fehr Right Wing Washington Capitals 0.354895 38 17.7587 16.7559 221 227 -6 Joel Ward Right Wing Washington Capitals 0.352883 35 16.3444 15.091 202 205 -3 Mike Richards Center Los Angeles Kings 0.351278 42 19.6898 19.6549 234 236 -2 Clarke MacArthur Left Wing Toronto Maple Leafs 0.350016 35 17.5774 16.9476 212 224 -12 Deryk Engelland Defenseman Pittsburgh Penguins 0.349942 35 17.6935 17.1085 223 228 -5 Rich Clune Left Wing Nashville Predators 0.346042 40 15.9891 15.175 171 184 -13 Cody McLeod Left Wing Colorado Avalanche 0.342992 42 16.6384 16.3021 207 208 -1 Emerson Etem Right Wing Anaheim Ducks 0.338121 33 11.4053 9.7216 144 147 -3 Matt Beleskey Left Wing Anaheim Ducks 0.337689 36 14.5212 13.8578 170 185 -15 Daniel Paille Left Wing Boston Bruins 0.336772 40 15.1781 14.8251 195 214 -19 Rene Bourque Right Wing Montr̩al Canadiens 0.33275 23 11.1159 9.6442 131 135 -4 Martin Erat Right Wing Nashville Predators 0.332706 29 14.3807 14.0438 180 186 -6 Hal Gill Defenseman Nashville Predators 0.328599 28 11.2298 10.0277 115 136 -21 Richard Panik Right Wing Tampa Bay Lightning 0.328197 23 11.0786 9.849 119 122 -3 Antoine Roussel Left Wing Dallas Stars 0.326697 37 12.3511 11.6655 133 156 -23 Cory Sarich Defenseman Calgary Flames 0.32366 27 13.2168 13.0873 169 174 -5 Sven Baertschi Left Wing Calgary Flames 0.321896 19 10.3705 9.2466 113 123 -10 Shawn Thornton Left Wing Boston Bruins 0.317794 40 12.0868 11.8861 152 170 -18 Nick Bonino Center Anaheim Ducks 0.312019 23 10.492 9.9692 119 130 -11 Joffrey Lupul Left Wing Toronto Maple Leafs 0.30614 16 10.0747 9.7094 111 130 -19 Peter Holland Center Anaheim Ducks 0.302854 18 7.9056 6.796 83 90 -7 Beau Bennett Right Wing Pittsburgh Penguins 0.301209 22 9.1646 8.6582 100 112 -12 Jason Zucker Left Wing Minnesota Wild 0.300736 18 7.5318 6.3695 88 95 -7 David Steckel Center Anaheim Ducks 0.293259 20 6.346 5.0307 84 85 -1 Mike Rupp Left Wing Minnesota Wild 0.273492 28 7.3535 7.3038 90 104 -14 Jeff Halpern Center Montr̩al Canadiens 0.260309 16 5.7545 5.3446 65 69 -4 Jim Slater Center Winnipeg Jets 0.224588 16 4.233 4.1011 49 53 -4

Predictability

Statistics are useful because it allows us to make an educated prediction of future outcomes. To verify that LAEGAP is indeed useful, we must confirm that players have a high probability of repeating similar numbers season after season. After calculating the correlation between the LAEGAP of players who played the same proportion of games (to account for long term injuries in either season) in the 2011-12 and 2012-13 season, I’ve found the correlation coefficient (Pearson) to be 0.72, with 1 being perfect correlation and 0 being no correlation at all. A correlation coefficient of 0.72 indicates that there is a moderate to strong correlation, meaning that LAEGAP is a repeatable statistic.

Location adjusted shot data should make for at least a couple of more interesting studies and spawn a couple more stats that are its Corsi counterparts such as relative LAEGAP and LAEGAP quality of competition. I am excited for what I will find and I hope you are too. Interestingly, it took me exactly 1111 lines of code to mine and calculate the data for this project. I hope you enjoyed the article :)