Evaluating NFL Draft prospects is complicated through a number of contributing variables including combine results, college production, intangibles, or how well that player fits a certain NFL scheme. Typical approaches for talent evaluation are regression-based, and borrowed from the field of economics. However linear or polynomial regression models depend on a single predictive formula that is supposed to apply to the entire data space.

When multivariate data interact in complicated, nonlinear ways, building a single model can be endlessly confusing, if not impossible. An alternative to fitting (or over-fitting) regression models is partitioning data into smaller cohorts of data, making the interactions more manageable. One method to achieve this is called Classification and Regression Tree (CART).

NFL Draft evaluations are certainly complicated, non-linear, and endlessly confusing, so I was interested in how CART could be applied to NFL Draft datasets. I used the recursive partition package in R (rpart), and evaluated a group of Quarterbacks and Defensive Ends based on their Career Approximate Value (this metric is coined and described by profootball-reference). I’ll provide a breakdown of data collection methodology below:

Quarterbacks: I collected final year statistics in college and combine data (height, weight, and wonderlic scores) for QBs drafted between 2000 and 2005. I chose this grouping because it seems that QBs take much longer to accumulate value in the league, and I didn’t want to bias Career AV by year.

Defensive Ends: I collected career statistics in college and combine data (height, weight, 40 yard dash, bench press reps, vertical leap, broad jump, shuttle run, and 3-cone drills) for DEs drafted between 2000 and 2014. For the DEs, I actually transformed their AV by years in the league.

The QBs were partitioned into cohorts based on Completion %, Height, Passing TDs, and Wonderlic Score.

The DES were partitioned based on Weight, 40 yard dash, Games Played, and Career Tackles for Loss.

I’ll refer to these groups by “cohorts.” Each cohort was then Weibull-ranked (~ percentile-ranked), and plotted against each player’s career approximate value. The resulting linear equation, was used to solve for the 25th, 50th, and 75th percentiles for each cohort.

Quarterbacks:

Defensive Ends:

These values can be compared among cohorts to better understand expected value of prospects. For example:

QBS who measured 6’4.5” tall and had over 60.2% completion percentage in their final year of college, had (by far) the highest Career Approximate Value. The second highest cohort, were QBs shorter than 6’4.5” but who also completed over 60.2% of their passes, and scored over 27.5 on the Wonderlic.

DEs weighing over 280 lbs, or DEs weighing between 264.5 and 280 lbs, but running the 40 yard dash in under 4.775 seconds had much higher Career Approximate Value than the other cohorts.

This has been pretty fun exploring whether CART is a viable tool for future NFL Draft evaluations. Below is my first visualization using Tableau.

https://public.tableau.com/views/QB_Wonderlic_AV/Sheet1?:embed=y&:display_count=yes&:showTabs=y