The high hills and rolling Appalachian mountains west of Harrisburg are beautiful, especially when driving up and over and through and around them. If you live in the Philadelphia area and either attended Penn State University or went to Beaver Stadium on a sun-splashed (but frigid) Saturday afternoon, then you know what I’m talking about. It’s hard not to appreciate the scenic views and grandness of it all. But you have to be careful. Just east of State College the road goes straight up and down, the last obstacle before entering Happy Valley. You are warned, though. "Steep Grade", the sign says (or something like that). After reaching the peek, my foot never left the brake for fear of losing control. These damn grades, I thought to myself, they’re dangerous!

Aren’t all grades dangerous? The most famously dangerous might be IQ. In the Mismeasure of Man, author Stephen Jay Gould fiercely (and rightly) criticized the use of numbers in grading a person’s intelligence. Ultimately, those grades are inherently subjective, created by men with a narrow definition of the brain’s capacity. But we like grades because we think they help us understand things and assign value in the absence of context, especially in the world of sports. NFL draft grades are a great example of this. We immediately need to know how the Eagles drafted: A, B, or C+. But these grades are without true context (how can we grade if the players have yet to play an NFL game?).

Pro Football Focus (PFF) attempts to address this. You’ve seen here on BGN how PFF uses their grades to rate weekly performances, project depth charts and rank punters. Yet the process is largely a mystery. If you glean their website, you will leave with a general idea of how the grades are assigned, but you won’t know exactly what their definition of context is. You won’t know exactly how it works. But maybe we don’t need to know the "how"; maybe we just need to know the "how well".

Since I don’t have a PFF subscription, I signed up for a free membership that included access to 2008’s season statistics. I performed a linear regression to see how well a team’s sum of player grades (a team’s total grade) compared to its number of wins, then compared these correlation values to other well-known or well-regarded statistics: yards, points, Football Outsider’s Defense-adjusted Value Over Average (DVOA), and Pythagorean expectation. Here is how all five correlate with number of wins in 2008:

2008 Correlation to Wins DVOA (FO) 0.824 Pyth Exp 0.824 Points Margin 0.621 PFF Grades 0.603 Yards Margin 0.555





It’s no surprise to see DVOA and Pythagorean expectation at the top of this list. There has already been a ton of research on Pythagorean expectation, research which influenced much of the DVOA development. However, there is a significant drop when looking at points. As it turns out, a good point differential (scoring more points over a season than a defense allows) does not always translate to a successful season. Nor does gaining more yards than the defense allows. PFF grades fall somewhere in between the two. Not a bad result, but for a statistic to be considered valid, it should at least correlate to wins as well as point differentials.

According to PFF, their subjective grading allows them "to bring some intelligence to the raw numbers" (Stephen Jay Gould may have chuckled at this). If this is the case, then perhaps if the subjective grades are used in conjunction with the other metrics, and not by itself, the correlations improve.

2008 Correlation to Wins w/PFF Grades DVOA (FO) 0.828 Pyth Exp 0.825 Points Margin 0.695 Yards Margin 0.677





They actually do improve, most significantly with points (+.074) and yards margins (+.121). So PFF grades can add value to existing metrics when trying to project a team’s number of wins.

So what is the takeaway here? PFF grades are not great when viewed by themselves, but can add value when combined with other statistics. I see a lot of potential in the grades assigned by Pro Football Focus, but I think, for now, it’s a potential not fully realized. Here’s why (and here’s why an analysis of 2008 data may be as applicable as an analysis of 2013 data): PFF normalizes its raw data (you can read more about it here). But they use the same normalization factors that they have always used. If you’ve read Nate Silver’s The Signal and the Noise, you know how important it is to review, test, and adjust your methods. And repeat. And then do it again. It’s not clear to what extent PFF is doing this. So just like navigating the steep grades on route 322, exercise a fair degree of caution when considering their grades. For now.

Next… I use logistic regression to test how well PFF grades project single game outcomes. Stay tuned.