I.

I’m worried this blog is a little too heavy on the abstruse math, and this post isn’t going to help that. I know the picture-to-word exchange rate is pretty good, so let’s start with a picture.

On second thought, that doesn’t really draw you in, does it, reader?

That’s better.

II.

That first picture, though, is also important. It’s a complete list of seasons in which a player played at least 600 minutes and put up a block percentage of at least 6% and a defensive rebound percentage of at least 28%. It contains the names of four of the greatest defensive players of all time, and also that of Miami reclamation project Hassan Whiteside. These milestones aren’t exactly cherry-picked, either. In the just-concluded season, Whiteside played 1100 minutes, during which he blocked 9.2% of opponent shots and grabbed a staggering 34.7% of defensive rebound opportunities. More sophisticated bottom-up defensive metrics agree that he’s pretty special on defense.

Whiteside wasn’t exactly a slouch offensively, either, with a true shooting percentage of 61.9% and a usage rate of 21%. Only 14 other players have matched that in the last five years, including James Harden, Kevin Durant, LeBron James, and Stephen Curry.

In short, Whiteside put up some incredible defensive numbers and some very good offensive numbers. But if that’s true… why was his Real Plus Minus just .76, good for 113th in the NBA? Why was his Box Plus Minus below average?

Well, take a look at the formula for BPM.

Raw BPM = a*ReMPG + b*ORB% + c*DRB% + d*STL% + e*BLK% + f*AST% - g*USG%*TO% + h*USG%*(1-TO%)*[2*(TS% - TmTS%) + i*AST% + j*(3PAr - Lg3PAr) - k] + l*sqrt(AST%*TRB%)

Rebounding is taken into account on its own, but also in an interaction term with assist percentage. And usage interacts positively with assist percentage and a spacing variable as well as with efficiency. In fact, the coefficient for defensive rebounding is negative; increased defensive rebounding only tends to improve BPM because it’s associated with increased offensive rebounding and with the interaction variable.

If you want, you can take the partial derivative of BPM with respect to defensive rebounding percentage. (Assume that TRB is the mean of ORB and DRB; this is pretty close to true in Whiteside’s case.) It turns out that box plus-minus increases with increasing DRB%, as long as total rebounding percentage is greater than 5.76 times the assist percentage. When the assist percentage is in the usual range — about 4% to 50% — this is not a big problem. Hassan Whiteside’s assist percentage is not in the usual range. It is, at 1%, historically low. This analysis suggests that replacing Whiteside’s rebounding percentages with those of Isaiah Thomas would improve his box plus-minus score. So I whipped up a box plus-minus calculator, and, well:

A similar analysis suggests that a lower usage might improve Whiteside’s BPM…

What if we made Hassan shoot Kyle Korver’s league-leading True Shooting percentage at Russell Westbrook’s world-destroying usage rate?

That successfully turns him into… a league-average player. Can we get him better than league average? Let’s raise the assist rate. Andre Drummond had a 3.9% assist percentage, among the lowest for big-minutes players. Hassan with that percentage?

A 3-point improvement! (In fact, it only takes an assist percentage of 1.6 — Dewayne Dedmon’s second-worst-in-the-league mark — to hit league average.) If we give Whiteside the median assist percentage among centers — about a 6.5% — we suddenly have a top-25 player by BPM. If the assist percentage improves to 10, we get a borderline MVP candidate.

III.

Of course you shouldn’t think that the above fiddling with numbers reflects reality. If Whiteside could get substantially better only by dishing an extra assist every other game — and that’s more than it would take to improve his percentage to match Drummond’s — the Heat would have had him working on passing from the go. What it reflects, instead, is the limitations of models.

Every single player in the training data used to create RPM would have had an assist percentage higher than Hassan Whiteside’s last season. Something north of 95% of them would have had an assist percentage greater than 4. The model just wasn’t fit to players like Whiteside. He’s a black swan.

And really, he’s not the only black swan out there. We know that Isaiah Thomas’ defense is necessarily limited by his size — but we have almost no data on how much. We know that Kyle Korver’s shooting glues defenses to him and opens up the floor for his teammates — but how can we predict the value of that when we’ve never seen anyone hit threes like Korver? (Ditto for Stephen Curry.)

This isn’t really an overfitting problem — the regressors were chosen to minimize overfitting, and they were chosen pretty well! Nor is it a problem that is easily solved by more theory; everything about the BPM formula has a solid theoretical grounding, and a well-explained one. I’m not bashing BPM for behaving strangely; I think BPM is an excellent metric and I only wish I could do better. The problem is this straightforward: Accurate predictions on way-out-of-sample data are hard.

This sort of ties into my skepticism of the Rockets’ threes-and-layups approach. We’ve never seen anyone take as many three-pointers, or as few midrange shots, as Houston does, and the relationships in the data we have may break down past a point.

But there’ll be time later, in the offseason, to debate the finer points of statistical theory and the merits of Moreyball. The main thing I want you to take away from this post is this: Hassan Whiteside, in the 2014-15 season, was so good he broke Box Plus-Minus.