This is for those who were on Twitter discussing predictive statistics. This won’t be a full fledged article, though I may do that latter. This is mostly just a data dump and exposition. I may do a full write up and series of articles some day but today is just for this small group.

A few points to start:

I started this in 2011.

My methodology is simple. I take the two teams, I select the stat I want. I then compare the two teams and the team with the better one is the predicted winner. A tie and I ignore that one for the week and move onto the next.

I don’t start with week 1 because I did that with 2011 and found almost all early season prediction data is useless. After three seasons of tinkering I found that most predictors stabilized by week 8.

This does mean I don’t have the entire season’s population data but I also believe giving a stat the best chance to succeed is more important since all stats lose 10%-15% accuracy if we do that. Also it makes logical sense, no one really knows how a team will perform week 1 or week 2. As a follow up to this I started tracking variance among the cross season measures of strength (ones like Elo and FPI) to try and verify my assumption. I found that these tends to look like a damped sine wave. They have more large point jumps early in the season and fewer of these large point jumps in the back half of the season. There is a -0.47 correlation between volume of large jumps (depends on the metric, Elo for example a large jump is 30 or more points) and the week. So early in the season, lots of big jumps. Later in the season, fewer big jumps. Elo made this easy since 538 went back so far I did a lot of seasons very quickly. FPI is a pain and I have fewer seasons to use there. Point remains, waiting until week 8 allows for us to collect the most accurate data in the most stable environment (as stable as the NFL can be anyways).



*Damped sine wave, not actual graph of Elo’s shift in large jumps by weeks passed*

If a QB got hurt and the backup was starting subsequent games I ran a similar test to see how long it would take for us to see the “true” level of that QB and it was four games. So I wait four weeks until I can include data on games featuring that QB for QB specific data, so I still include something like DVOA but exclude ones like QBR.

I do the same thing if a team is tied in a given stat.

I will track a stat for three seasons, if after the seasons (around 350 games) a stat has either failed to average over 55% accuracy or hasn’t cracked 60% once I stop tracking it. Because if it isn’t more accurate than who has home field that week, why track it? Here are the metrics I have stopped tracking due to inaccuracy: ANY/A Differential PFR’s Expected Points Added differential PFF Offensive grade PFF Defensive grade PFF Special Teams grade PFF Off/Def differential Points per play allowed Points per play off/def differential Passer Rating differential Turnover differential Penalties caused Penalty 1st downs caused Sacks % allowed Sack % created Sack % differential Deep passing (20+ yard) attempts Deep passing completion % Deep passing completion % differential 3rd down % offense 3rd down % defense 3rd down % differential Passing touchdown % defense Passing touchdown % differential QBR differential Dropped passes Rushing yards per attempt offense Rushing yards per attempt defense Rushing yards per attempt differential

Here are stats I stopped tracking for other reasons Yards per attempt (it was accurate but less accurate than ANY/A) Net yards per attempt (same reason) Y/A differential (same reason as Y/A but for ANY/A differential) NY/A differential (same reason) Win probability added (Brian went to ESPN) Passing touchdown % offense (ANY/A was more accurate and includes this)

I will occasionally re-add stats to test for another year’s worth of data if I have extra space. For example I am retesting Passer Rating after cutting it. When I rest a stat I tend to focus on ones that were inconsistent or may have been killed by a single horrific year but had two other good years.

I know this isn’t every single stat, there are so many I know I am missing but testing hundreds of stats isn’t feasible right now, especially since I am mostly doing this for myself.

I shared this with Bryan, here is a list of the current stats I’m tracking. As you can see, the gap year for Passer Rating. I only just started tracking 2017 so don’t read too much into that, week 8 was a highly accurate while week 9 was much less so. As with anything, the more games I collect into the population the more accurate it will be, this goes for the multiyear data as well as the single year data.

Here are my findings:

Defensive stats can be accurate but are generally less accurate at measuring a team’s chances of winning. This caused most differentials to drag down the offensive stat when it combined them.

Differentials are very good descriptive metrics but seem, no rather ARE, less accurate predictors.

Oddly when we get to actual measure of team strength (The DVOA, Elo, FPI, nERD, SRS, etc) I find that the differential version (SRS rather than just OSRS or DSRS for example) is more accurate, though only by a small margin. I’m not sure why this is .

. Lastly I think that traditional measure of team strength are a good way to go, are reliable year in and year out and are accurate. We already knew how good a team’s quarterback was a good measure of if that team would win and that holds up. Passing metrics are a good way to see which team will win and since ANY/A is the most accurate, it backs up the idea of how good it measure’s a quarterback’s production. It is boring but the whole “who has the better QB?” bar level debate is actually a good way to predict which team will win.

If you need clarification or have a question, shoot them to me in the comments or Twitter.