Over the weekend, I had the fortune of attending the Sloan Sports Analytics Conference, and participating on the baseball panel with Mike Petriello, Harry Pavlidis, Patrick Young, and Brian Kenny, which was a lot of fun. While the baseball panel was my only actual obligation at the conference, Petriello was doing double duty, having just presented — along with Greg Cain, one of the lead engineers at MLBAM — the latest update to Statcast, and introducing two new public metrics for 2017, Catch Probability and Hit Probability. These are the kinds of numbers people have been hoping for, and are one of the first steps in moving from collecting interesting single data points into providing more valuable calculations based on the combination of factors the system is measuring.

To help promote the new metrics, Jeff Passan wrote a piece on Statcast over at Yahoo, focusing mostly on what Statcast could do in the future.

Sometime soon, there is going to be a new version of Wins Above Replacement available, and its goal, aside from encapsulating a player’s value into one tidy number, is simple: Don’t be scary. The plan does not involve dumbing down the metric that serves as the flashpoint between those who yearn for a catch-all and those who lament it. On the contrary, as with almost everything it does, Major League Baseball Advanced Media wants to make it so smart people can’t help but like it. … That’s part of the excitement: Defensive WAR has been more guesswork than exact science. Statcast exists for exactitude. Even better, Statcast takes only 10 to 12 seconds to give a play’s precise details, meaning before the next pitch anyone who cares to will be able to contextualize just how good – or at least rare – a catch really was. BAM’s data warehouse then can be queried to provide context, and highlight clips of similar or better catches can be compared and contrasted on demand.

This is, undoubtedly, an exciting future, and the idea of a Statcast-based WAR system is very intriguing. The current versions of WAR still struggle with the difficulty in separating run prevention credit (and thus value) between the pitcher and the fielder. Statcast’s tools seem likely to bridge that gap, and with hit probability and catch probability — though it should be noted, the latter is outfield only right now, as infield calculations are more complicated — we are now closer than ever to being able to build metrics that directly measure the quality of contact a pitcher allowed, and adjust both the pitcher and the fielder’s contributions to the play made (or not made) based on that important variable.

So, yeah, Statcast is going to improve WAR calculations in a significant way, and should allow us to move past the FIP/ERA divide in the not-too-distant future. But perhaps more interesting is Passan’s mention that the guys at MLBAM are dreaming of their own WAR metric, and what that might look like down the line. The potential of a Statcast-based WAR model brings up a fascinating question; how granular should WAR get?

As Tom Tango has said on a number of occasions, WAR is a framework, giving the basic building blocks of adding hitting, baserunning, pitching, and fielding together in a systematic way. But the actual values that go into those components can be determined in a number of different ways, depending on the type of calculation that is being attempted, and more importantly, the question being asked.

Different numbers answer different questions, and have varying uses for determining what happened in the past or what might happen in the future. Often times, metrics are grouped into “descriptive” and “predictive” buckets, depending on whether they are trying to account for what did happen or what we may expect going forward. WAR, generally, is a descriptive metric; it is trying to measure the value a player produced in a given season, not tell you what his value will be next season.

Which makes the idea of a Statcast-based WAR model pretty interesting, because a lot of the presumed value of Statcast’s data is to allow us to say “okay, that result happened, but based on the more granular data, we’d have expected this other thing to happen.” Hit probability, for instance, is going to let us say that a particular batted ball might have been caught by a diving outfielder’s spectacular effort, but that 85% of the time, that ball lands, and the hitter got robbed by something out of his control. Right now, every publicly available version of WAR simply records the play as a negative-value event for the hitter, despite the fact that he just got screwed by a great defensive effort.

Certainly, knowing that the ball normally lands for a hit is valuable information in evaluating that player’s performance. But is there any value produced to a team in hitting a ball that probably should have, but didn’t, land for a hit? Do we want to credit a hitter for what was just under his control, or do we want to credit him with what happened on the field during his at-bat?

Or, let’s think about it from the opposite perspective. During his presentation on Saturday, Petriello showed this Andre Ethier home run from last year’s NLCS.

According to the new Statcast Hit Probability calculation, a ball hit at that exit velocity and launch angle is an out 95% of the time. Because of a favorable wind and the fact that the ball was hit in one of the few stadiums where a 353-foot fly ball to left center would clear the fence, Ethier actually got his first home run off a left-hander in three years.

In terms of value, what do you do with that play? Ethier’s home run put a run on the board for the Dodgers, so — ignoring the fact that we don’t have postseason WAR right now — our calculation would give him the same credit for that play as if he had launched a 500-foot bomb onto Waveland Avenue. And our pitching WAR would correspondingly crush Jon Lester, who gave up the home run, even though Lester did his job and induced contact that is almost always an out.

The easy answer is to say a home run is a home run, and we don’t care about what should have happened, only what did happen, and what did happen is that Ethier rounded the bases. In general, WAR is an attempt to isolate player performance from the influence of his teammates, but it is not designed to strip luck out of the equation. And if that’s what we want WAR to do, then perhaps a Statcast-based version wouldn’t be so dramatically different from what is already out there, beyond separating pitching and defense in a more accurate way, anyway.

But it’s actually more complicated than that easy answer would suggest, because right now, there isn’t a publicly available of WAR that really is calculating “what really happened”. The versions published here, at Baseball Reference, and at Baseball Prospectus all use context-neutral run values at the event level, so while Ethier’s home run really added one run to the Dodgers ledger, he’d get 1.4 runs worth of credit in WAR for hitting that home run, because we don’t think it’s his fault that there weren’t any runners on base when he hit the home run, and in general, the average home run produces about 1.4 runs worth of value for an offense.

If we took the “measure what really happened” argument to its logical conclusion, then there’s a good argument to be made that something like RE24 — which gets run values from the base/out state, not the overall average — should be the foundation for the hitting component of WAR. And once you go to base/out context included, you can continue down that path to including inning and score, and argue for WPA instead, since if we’re giving a hitter credit for the situations he hits in, a walk-off grand slam does more to help a team win than a solo homer down by 10.

The reality is that, with almost every component in WAR, you have to decide how much situational context you want to include, and the more context you include, the more credit you give to a player for something he had nothing to do with. And that brings us back to the Ethier home run. He didn’t really have control over the wind carrying his weak fly ball into the seats. So there is some logical consistency in saying that if we’re not including base/out/inning/score context because those things are out of the player’s control, perhaps we don’t want to measure a player’s contribution based on luck-based outcomes.

So an entirely Statcast-based WAR, that measured value solely on the granular data and probability that we think are within the realm of the player’s control, could be fascinating. I don’t know how popular a model that gave Ethier negative value for hitting a home run in a playoff game would be, but it would be a really interesting departure from every other version of WAR out there. And it would be the only model that stripped luck out of the picture, giving us perhaps the best view of a player’s actual contribution to an outcome.

But it would also be a radical departure from what people have said they generally want WAR to be. In responding to the question about how much value to give to a player who hits a triple and gets stranded versus a triple where the next batter drives him in, 95% of our readers said they wanted those two triples to be given the same value, and they didn’t want the run value to be based on the future sequencing after the event occurs. Given that Tom Tango, who now works at MLBAM, ran that series, I would imagine he’s going to be influenced by those ideas when helping craft MLB’s version of WAR, whenever that is.

This tension between what happened, how it affected the scoreboard, and how much credit to give to players for things they don’t control is a difficult thing to resolve. And now that we’re getting even more granular metrics about a player’s contribution to outcomes, these questions are going to continue to be relevant. Knowing the guys at MLBAM on a personal level, I’m pretty comfortable with the fact that they’ll handle these questions thoughtfully, and if (or when?) they do produce an MLB.com WAR model, it will be with all of these questions answered as well as they feel they can.

But while Statcast holds a lot of promise for improving the pitching and defensive sides of the components, getting ever-more granular hitting data might force us to again ask what we want WAR to be, and what the goal of the model is. There is no obvious right answer here, and that’s one of the reasons there will always be multiple ways of calculating WAR.