Overview:

The goal of this project was to predict which QB’s in the 2018 draft class are the most likely to succeed in the NFL. Using a dataset that included nearly 400 QB’s that were drafted or signed as a free agent between 1980 and 2016, I built a Neural Network in Python to aid in these projections. Neural Networks are a form of Deep Learning that attempt to learn patterns and connections in a dataset by functioning similar to human brains. Because of their ability to learn from the environment and detect complex trends, they are one of the most powerful modeling techniques.

Data:

All data for this project was courtesy of pro-football-reference.com and sports-reference.com. From the start, I considered using various input measures including NFL combine statistics such as 40 time and hand size. However, combine data is scarce and using these physical metrics would have significantly reduced my dataset and prevented valuable players from being included. Ultimately, the following inputs from a player’s collegiate career were used in my model: height, weight, conference, passing completions, passing attempts, passing %, passing yards, adjusted passing yards per attempt, passing touchdowns, interceptions, passer rating, rushing attempts, rushing yards, rushing average, and rushing touchdowns. Conference was either a 1 for a major conference or a 2 for a mid-major. All of these metrics would be used by the Neural Network to determine whether or not a QB would be successful. Depending on who you ask, the term success can have many interpretations, but in this case, I have defined success as an NFL franchise QB. For the most part, this included all players who enjoyed a long playing career and spent the majority of those years as a starting QB. The exception to this rule are QB’s currently in the league who have not had a long career but are projected to be long-term options for their teams. This list includes players such as Carson Wentz or Marcus Mariota. For all other historical players, successful QB’s were determined by a combination of Weighted Career Approximate Value and starts/season. In addition to excluding players from the 2017 draft class, players that have the potential to be franchise QB’s such as Jimmy Garoppolo and AJ McCarron were also not included in the dataset. In total, there were 61 successful QB’s and 328 others.

Results:

Below are the projections for this year’s QB class. What comes as no surprise, my model prefers the skillset of Sam Darnold over any other QB; however, Josh Rosen and Josh Allen, who most analysts have near the top, are not viewed the same. Allen struggled last season and completed under 60% of his passes during his career at a small school, so I can understand why the model may not project him well at the next level, but I tend to think Rosen will be a QB this model does not forecast accurately. Taking their spots in the top 5 are Riley Ferguson out of Memphis and Kyle Lauletta from Richmond. Both QB’s were in pass heavy offenses and threw for over 70 touchdowns in their career. Rounding out the top 5 are Baker Mayfield and Mason Rudolph

For comparison, here is how the model predicts notable QB’s from last year’s draft class, who were not included in the original dataset. The model appears to be correct on Trubisky and Watson, who both appear to be franchise QB’s. It seems likely that Mahomes is in the Chiefs long-term plans as well. The Neural Network has high confidence in DeShone Kizer and C.J Beathard, two players who underperformed during their rookie season, but they were put into poor situations in Cleveland and San Francisco, and time will tell whether or not they are busts. Other than Nathan Peterman, who was unimpressive in his debut, the rest of the QB’s below have yet to see the field.

Accuracy:

On new test data, the Neural Network predicted QB success at an accuracy level of 77%. This is far from perfect, but a 23% error rate is pretty good when it comes to predicting the most important position in the sport. Here are some of the notable predictions that the model got right and wrong.

Below are the details on how the predictions were made

Data Preprocessing:

After loading the libraries, importing the dataset and identifying the input and output columns, the next step was to encode the categorical fields. In my dataset, I had one categorical column, conference ID, that needed to be transformed and converted to two separate columns of binary data types – one column for major and one column for mid-major. Next, the dataset needed to be split into a training and a test set, with 90% designated for the training set and 10% for testing purposes. Because a successful QB in the NFL is much rarer than a bust, my dataset included imbalanced output classes. With only 61 successful QB’s and 328 busts, the model would learn to only predict the most likely class in the dataset, meaning every QB in the test results would be predicted to be a bust. Because of this disparity, I needed to oversample the successful QB’s by essentially duplicating the entries. This technique was done using SMOTE from the imblearn library. The final step before building the ANN was to use the StandardScaler package to scale the input data so that every field had an equal opportunity to make an impact on the model.

Building the ANN:

Every Neural Network consists of an input and output layer, and then as many hidden layers as necessary. My model had the best results when it consisted of 3 hidden layers. In each layer, the number of units corresponds to the number of neurons, and this number is mostly determined by trial and error. My model ultimately had 8 Neurons in the first hidden layer, 5 neurons in the second hidden layer, and 8 neurons again in the third hidden layer. I only had one output column in my dataset that consisted of a 1 if a QB was successful, or a 0 if a QB was a bust, so the output layer in my model only contained 1 unit. These neurons are initiated using the Dense class, which also contain 2 other input parameters, kernel_initializer and activation. A neuron is a set of inputs, weights, and a function, so the kernel_initializer signifies how the weights will be initialized. In this case, the uniform method is used to initialize the weights using a random number between 0 and 0.05. The activation function is then used to create an output for each neuron, and my model used the rectifier activation function for the hidden layers, and the sigmoid function for the output layer. The sigmoid function is used in the output layer to ensure that the final output is between 0 and 1. Next, the model could be optimized using the adam optimizer and the binary_crossentropy loss function, which is typically suited for binary classification problems. The accuracy metric is also commonly used with classification models. Finally, the model was ready to be fit to the training data using a batch size of 10 and epochs of 100, meaning there would be 10 instances evaluated before a weight is updated and the process would run for a total of 100 iterations.

Testing and making new predictions:

Once the model finished running, the accuracy of the model needed to be checked on the test dataset. Below are the steps to calculate the accuracy by using the evaluate class and creating a confusion matrix using the predict class on my model. Once the model was fully optimized, it could be tested on new data by using the same steps during preprocessing. The QB’s in the upcoming draft class needed to be imported and the conference ID field had to be transformed. After scaling the data, I used the predict class on the classifier model to predict the new QB’s.