The debate

I have this ongoing debate with some friends, which also seems to be unresolved for people who have looked at it for much longer than we have. The question is: can model-based decision-making reliably outperform humans in a fantasy football draft?

On the one hand, there’s a cottage industry of people making forecasts and developing strategies for how to order draft picks. For player values, the usual suspects like ESPN, CBS, NFL, Yahoo Sports, and etc., produce predictions of expected player stats for the season; and there are also groups like fantasyfootballanalytics.net for example, who produce their own ensembled predictions, all with convenient R libraries and years tuning through of trial and error to get things working smoothly. For drafting the players, there’s a widely referenced value based drafting (VBD) strategy, first described in Fantasy Forecast Magazine in 2001, and then updated in this write-up by the footballguys in 2005. There’s also this surprisingly frequently referenced comparison of bidding policies from 2012, but I won’t get there in this post because the scope of our question is limited to snake drafts where pick order matters, but all players cost the same.

On the other hand, player value predictions aren’t perfect. For example, this fivethirtyeight article points out that ESPN fantasy projections for top running backs before the season overpredict performance, such that the top 12 players are in general ranked 2-3x lower in the actual season. There are also criticisms of VBD, as for example in this post on ESPN, saying the resulting teams “smell funny.” And then there are posts like this one on reddit, which managed to identify a combination of 9 reasonable players that underperformed by 80 points on some random week, underlining how noisy the player performance forecasts can be, albeit without saying how many combinations were considered to find this one bad team, or how many teams would have instead overperformed by a similar margin.

So to try to settle the debate, I tested if a vanilla VBD drafting policy could reliably produce rosters that come out on top in a league of typical fantasy football players. To do this, I pulled average player draft positions across thousands of fantasy leagues collected in myfantasyleague.com, and used these as input parameters to simulate leagues of 11 human drafters to compete with 1 VBD drafter. I measured performance by evaluating the actual fantasy points collected by the best set of starters on each team and then finding the rank of the VBD drafter.

TL;DR

In general, VBD gives a slight edge, but it could be an occasional slam-dunk when players drafted early by humans turn out to be lemons.

Results

I simulated drafts from three seasons (2014-2016), with the VBD drafter placed in any of the 12 positions. Across all leagues, VBD rosters had a mean rank of 4.68 when evaluating fantasy points observed at the end of the season. This is significantly higher than a rank of 6 that we would expect if VBD drafters were indistinguishable from typical human drafters (Chi. Sq. p.val=2.2e-16). Unfortunately, that is also substantially worse than how well we would have expected to VBD rosters to rank based on preseason projections, for which the mean rank is 2.05. So VBD does tip the scales slightly, but it isn’t exactly a silver bullet.

Interestingly there was a fair amount of variability between how VBD performed in the three years. Both expected ranks and observed ranks were noticeably higher in 2015 than 2014 and 2016.

I’m not sure how to explain this yearly trend. The projections were downloaded from fantasyfootballanalytics.net separately for the three years, so maybe there is an issue with the data. Looking at the quality of preseason forecasts though, the Pearson correlation and the mean absolute error (MAE) were not dramatically different in 2015 than in other years. So maybe forecast quality should be evaluated differently, or there might be a better explanation.

Position Season Pearson correlation Mean Absolute Error QB 2014 0.7124 60.4863 QB 2015 0.5562 68.4075 QB 2016 0.6117 62.9405 RB 2014 0.4459 56.5074 RB 2015 0.5165 52.4999 RB 2016 0.5724 53.1948 TE 2014 0.4884 43.3151 TE 2015 0.6593 32.0161 TE 2016 0.6001 35.3129 WR 2014 0.6390 45.4527 WR 2015 0.6362 48.7875 WR 2016 0.5236 48.1142

There was also a relationship with VBD draft turn, performing better with turns in the middle of the rotation. What stood out in 2015 is the first two draft positions resulted in some of the worst outcomes and late picks didn’t hurt performance like they did in other years. But, that doesn’t exactly point to what differentiated that year, since early picks also performed poorly in 2014.

Another explanation might be that in 2015 there were some obvious top picks in the first two round that ended up bombing. To get a sense if this is the case, we can look at the correlation between expected – observed points and draft position for each year. A correlation of 0 would indicate that across draft positions there wasn’t a trend with underperformance. A negative correlation might indicate that players drafted early underperformed to a greater extent than players drafted later.

year correlation 2014 -0.236 2015 -0.318 2016 -0.258

Turns out that this was indeed the case. Players drafted early in real fantasy football leagues underperformed more in 2015 than other years. The biggest offender that year was Le’Veon Bell, who injured his knee in November. But in all, 8 of the first 24 players drafted (first 2 rounds) underperformed by more than 100 points, compared to 2 and 4 players in the other two years.

Ok, so I wouldn’t exactly say the case is closed here, but at this point, I’d like to submit this as a first argument in favor of model-based drafting being superior.

And here are the deets, in case anyone wants to know more

League overview. I simulated a standard snake draft in a league with 12 teams. One team drafts players with a VBD-like policy, the remaining 11 have a simulated baseline policy. In all, I simulated 100 drafts for each year and each possible VBD draft position.

Evaluation. After the drafts, I identified the best set of starters on each team using actually observed fantasy points for each player and then compared how the VBD rosters rank in each league. In other words, assuming the teams are fixed after the draft, and the one best roster is selected by some oracle, how well does a VBD roster perform relative to baseline rosters? Another artifact is that I’m not considering defense, special teams, and kickers because people don’t generally draft them meaningfully early, and their performance doesn’t correlate well year to year.

Simulated human drafter. On each baseline draft turn, for every undrafted player, I sample from a Poisson distribution with lambda set to that player’s mean draft position, and then take the lowest sampled value, resolving ties by choosing the player with the lower mean draft position. This is done without picking backup players until all starters are selected.

position num starters in position max players in position QB 1 3 WR 2 5 RB 2 5 TE 1 3 FLEX (WR/TE/RB) 1

VBD policy. Here I’m just going to follow a vanilla version of what’s in the footballguys’ VBD description. On each turn, a player is drafted by taking top player in the position with the greatest discounted incremental value given by:

where and is the points value of the top player in the position after the first 100 picks.

Though there is VBD rule 7: “Know When to Deviate from VBD Principles”, and in practice doing this may give a team without a quarterback for example. So to be fair, we’ll also apply the same policy of not drafting backup players until there is a starter in every position.

The data and le code. I downloaded the yearly fantasy projections from the projections tool here, the player rankings from myfantasyleague.com, and actual player performance from the NFL using nflscrapR. After a bit of munging, I joined the three years from each source into the final dataset here. The draft simulation and results were obtained using this script.