Before we jump into the model, a quick caveat: I’m pretty new to predictive modeling and R. I’ve got a background in working with data, and I’ll demonstrate the model’s usability, accuracy, and correlation to NFL players to ease any worry you might have right now. The model might only be in the v1.0 stage, but it’s already smashing my expectations.

When I started to build these models – and yes, we’re working with a composite of multiple machine learning models – I had two primary questions I wanted to answer:

Can we accurately classify if a player will “breakout” or not? Does the probability of a our classifcation correlate to NFL success?

Inherently, the two questions are linked:

If a player breakouts out, they are going to have success in the NFL. However, there’s some additional context needed. The model is trained to classify the probability of each player’s chance to break out… at any point in their career. The model is not trained to predict a player’s fantasy points. To evaluate the usefulness of a breakout rating for fantasy, we’ll look at the correlation between breakout rating and the average PPR points per game over the first three seasons of a player’s career (a common technique to evaluate rookie projection models).

Why use a composite of multiple models? The simplest answer is: In this case, it’s the best model. We see a significant increase in the accuracy of predicting a breakout and an increased correlation to NFL fantasy success in seasons 1-3 when we use a composite over a single machine learning model. I’ll explore this more in some upcoming “model talk” articles. I suspect that if we had enough data, the models would likely converge. However, we have limited data in the NFL. As an imperfect metaphor, we have two ways to improve our understanding of a dataset:

Look at lots of data Look at the available data in many different ways.

For better or worse, my model attempts to do the second.