I got into a discussion on Twitter on Wednesday about the way that the guys over at Lacrosse Reference divide their efficiency stats that started when Joe Keegan tweeted about those stats. If you’re not familiar, they divide up their efficiency stats based on the way that possessions begin — defensive stops, unsettled ground balls and face-offs.

While those different situations certainly create some differences, I am highly skeptical that those differences are enough to overcome all of the noise that appears in the data. For example, a significant percentage of possessions that start in each end up in settled 6v6 possessions with both teams subbing their offensive and defensive personnel onto the field. It’s also possible for there to be fast break or transition situations that arise from all of them with D mids still on the field.

Further, about half of each team’s personnel is the same regardless of situation as their attack, close defense and goalies are always going to be on the field. Players at those positions who are good tend to be good regardless of the way a possession begins since skills like shooting, passing, making saves, etc. apply in all of them. Plus, there is an inherent correlation problem in all areas of college lacrosse because the top teams tend to be better at everything than bad teams, even completely unrelated aspects because they recruit better and have better coaching.

There is also a problem of sample size that by dividing up possessions into three different categories, there is less data in each which increases the impact of statistical noise and randomness.

One way to test the significance of a stats is whether it is predictive. For example, Trevor Baptiste has won a very high percentage of his face-offs in 2016 and then did so again in 2017. That suggests that he tends to win face-offs because he is highly skilled at doing so and not because of luck or statistical randomness. I used that type of analysis to show that second assists in the NLL are likely a meaningless stat. Similar analysis can be done within the same season by doing something like using the first 75% of games to predict the final 25%.

It’s early in the 2018 season, so the same size of games isn’t great, but thanks to Archive.org I was able to grab the Lacrosse Reference stats from 2017 to compare to the ones they have so far in 2018. While teams lose players to graduation, add new recruits and coaches change teams, there is significant overlap from one season to the next and we should expect to see some correlation from one year to the next in team style, strategy, skill, etc.

Here is the output from a linear regression in R using efficiency after defensive stops from 2017 to 2018 (defficiency = efficiency after defensive stops)

lm(formula = laxref18$defficiency ~ laxref17$defficiency)



Residuals:

Min 1Q Median 3Q Max

-0.194420 -0.068628 0.000364 0.054477 0.201977



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.16157 0.05414 2.984 0.00394 **

laxref17$defficiency 0.36023 0.20020 1.799 0.07640 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.08411 on 68 degrees of freedom

Multiple R-squared: 0.04545, Adjusted R-squared: 0.03141

F-statistic: 3.238 on 1 and 68 DF, p-value: 0.0764

As you can see, not much of a correlation there. The R-squared values and p-value get slightly better for efficiency after unsettled GBs:

lm(formula = laxref18$uefficiency ~ laxref17$uefficiency)



Residuals:

Min 1Q Median 3Q Max

-0.188704 -0.038096 -0.005475 0.043338 0.199465



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.16396 0.04958 3.307 0.00151 **

laxref17$uefficiency 0.40241 0.19115 2.105 0.03897 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.07587 on 68 degrees of freedom

Multiple R-squared: 0.06119, Adjusted R-squared: 0.04738

F-statistic: 4.432 on 1 and 68 DF, p-value: 0.03897

and slightly better yet for offensive efficiency after a face-off win:

lm(formula = laxref18$fefficiency ~ laxref17$fefficiency)



Residuals:

Min 1Q Median 3Q Max

-0.217714 -0.037518 0.000689 0.056299 0.150213



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.19788 0.04845 4.084 0.000119 ***

laxref17$fefficiency 0.35854 0.16080 2.230 0.029068 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.0788 on 68 degrees of freedom

Multiple R-squared: 0.06813, Adjusted R-squared: 0.05443

F-statistic: 4.972 on 1 and 68 DF, p-value: 0.02907

But all of them are still quite low. On the borderline or barely under what would likely be considered a statistically significant correlation.

Compare that to what I get for raw offensive efficiency from my data:

lm(formula = adjeff18$rawoffeff ~ adjeff17$rawoffeff)



Residuals:

Min 1Q Median 3Q Max

-0.121665 -0.039492 0.008049 0.027527 0.151695



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.08205 0.04220 1.944 0.056 .

adjeff17$rawoffeff 0.75279 0.14562 5.169 2.24e-06 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.0536 on 68 degrees of freedom

Multiple R-squared: 0.2821, Adjusted R-squared: 0.2716

F-statistic: 26.72 on 1 and 68 DF, p-value: 2.24e-06

Or even compare it to another statistic that only looks at a certain portion of an offense, such as the percentage of possessions that end in a turnover (data is once again from me and not Lacrosse Reference):

lm(formula = teams18$otoper ~ teams17$otoper)



Residuals:

Min 1Q Median 3Q Max

-0.136451 -0.030723 -0.002955 0.031142 0.116631



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.12366 0.04773 2.591 0.0117 *

data17$otoper 0.67108 0.12423 5.402 9.11e-07 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.05037 on 68 degrees of freedom

Multiple R-squared: 0.3003, Adjusted R-squared: 0.29

F-statistic: 29.18 on 1 and 68 DF, p-value: 9.11e-07

The correlation isn’t as strong, but shooting efficiency from 2017 correlates to the 2018 numbers with an R-squared of 0.1705 and a p-value of 0.0003809. The correlations for turnover percentage and shooting efficiency from one season to the next also were stronger over the course of a full season from 2016 to 2017.

However, as I wrote at the beginning, we should expect some correlation between somewhat unrelated sets of variables in college lacrosse simply because good teams tend to be better are everything that bad teams and even more so when those stats both measure being good on the same side of the ball. Just to pick one, let’s look at the correlation like between turnover percentage in 2017 and shooting efficiency in 2018.

lm(formula = teams18$otoper ~ teams17$oshoteff)



Residuals:

Min 1Q Median 3Q Max

-0.135495 -0.041434 -0.002408 0.037292 0.108023



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.58093 0.06143 9.457 5.09e-14 ***

data17$oshoteff -0.27927 0.08463 -3.300 0.00154 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.0559 on 68 degrees of freedom

Multiple R-squared: 0.138, Adjusted R-squared: 0.1254

F-statistic: 10.89 on 1 and 68 DF, p-value: 0.001542

The relationship is negative in this case because teams with a lower turnover percentage tend to shoot the ball more efficiently, but the correlation falls somewhere in between offensive efficiency on possessions starting in various situations and stats like total offensive efficiency, turnover percentage and shooting efficiency.

So total offensive efficiency is predictive from one season to the next and components that we know rely on similar skills such as not turning the ball over and shooting the ball efficiently correlate from one season to the next, yet the offensive efficiency on possessions that start in a similar manner doesn’t really correlate from one season to the next.

To me that indicates the differences are largely just random noise and not based on anything meaningful.

There are other ways of testing the correlation, such as comparing data from the first half of the season against the second half.

It’s also worth pointing out that this method of analysis also demonstrates one reason why opponent adjusted efficiency numbers are so much better than raw unadjusted ones for a sport like college lacrosse where the difference between the best and worst teams can be incredibly large. Take a look at the correlation from 2017 to 2018 so far in adjusted offensive efficiency:

lm(formula = adjeff18$adjoffeff ~ adjeff17$adjoffeff)



Residuals:

Min 1Q Median 3Q Max

-0.089267 -0.021448 0.000572 0.024113 0.099695



Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.06737 0.02399 2.809 0.00649 **

adjeff17$adjoffeff 0.79351 0.08091 9.807 1.2e-14 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



Residual standard error: 0.03646 on 68 degrees of freedom

Multiple R-squared: 0.5858, Adjusted R-squared: 0.5797

F-statistic: 96.19 on 1 and 68 DF, p-value: 1.202e-14

It is significantly stronger than the unadjusted numbers and an indication that the adjusted efficiency numbers are much more predictive of a team’s performance than the raw efficiency numbers.

Maybe with some kind of opponent adjustments the different ways possessions can start will become more predictive. Or perhaps there is a way to make them more predictive when using the season numbers to predict individual games against a future opponent.

But if it can’t be established that the stats are predictive of how teams will play in the future, it seems unlikely they are a useful measure of the differences in teams’ skill or strategy.