A look back at the 2014 NFL Draft:

Jadeveon Clowney was thought of as a “once-in-a-decade” or even “once-in-a-generation” pass rushing talent by many. Once the top rated high school talent in the country, Clowney had retained that distinction through 3 years in college football’s most dominant conference. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. All prospects have a “ceiling” and a “floor” which represent the maximum and lowest potential that a prospect could realize respectively. For all of the concerns over his work ethic, dedication, and professionalism, Clowney’s athleticism and potential have never been called into question. But is his athleticism actually that rare? And is his talent worth gambling millions of dollars and the 1st overall pick on? This article aims to objectify exactly how rare Jadeveon Clowney’s athleticism is, and to use Quantile Regression to clarify what his proverbial “ceiling” and “floor” may be in the NFL.

Jadeveon Clowney set the NFL draft world on fire at this year’s combine when he delivered one of the most talked-about combine performances of recent memory, primarily driven by his blistering 40 yard dash time of 4.53. Over the years, however, I recall players every combine displaying mind-boggling athleticism in drills. Vernon Gholston, Mario Williams, and even Ziggy Ansah. But if each year a player displays unseen athleticism at the combine, who is really impressive enough that we deem them “Once-in-a-decade?”

To attempt to quantify how uncommon Jadeveon Clowney’s athleticism is, I probability ranked the most impressive individual drills and overall workouts of 82 defensive ends over the past decade. I applied a weibull ranking of all 82 players’ 40-time, bench press, vertical leap, broad jump, shuttle run, and 3-cone drill. What I saw was that Clowney’s 40-time was indeed VERY rare and truly “once-in-a-decade.” However, his overall combine performance shows that he wasn’t all that different from even the top DE prospect last year. Below I list 6 high profile picks from recent years plus Jadeveon Clowney and there combine results (Table 1) and associated ranks (Table 2) among the class of 82 defensive ends:

Table 1:Raw combine results for 7 high profile DEs.

Table 2: Probability ranking out of entire class of 82 for each combine drill and cumulative combine workout.

Jadeveon Clowney’s 40 yard dash time registered in the 99th percentile of the class. Likewise, his leaping ability and shuttle run were in the 90th percentile; this is truly elite lower body explosion. However, his height, weight, bench press, and 3 cone drill were average to below average rankings within the class. This lowered his average rank for the combine to 66. For comparison, Mario Williams had the highest overall average ranking at 83, and Ziggy Ansah actually shared an average ranking of 66. In the chart below, we see where Clowney’s average rank places him throughout the DEs scoring average ranking of 40 or better for the entire group of 82 (Chart 1):



This is actually pretty impressive company. Among the players ranked ahead of Clowney, Mario Williams and JJ Watt were the most elite prospects athletically in recent memory, while Margus Hunt is a world class track-and-field athlete. Of the players ranked below Clowney, Chandler Jones is a member of possibly the most athletic family in sports (see Jon and Arthur Jones) and Robert Quinn led the NFC in sacks last year. So this is convincing evidence that Jadeveon Clowney is undeniably an elite athletic specimen but not exactly “Once-in-a-decade.”

Much more goes into assessing a player’s pro potential than the combine, and I even wrote an article illustrating just how many variables do. In my previous article I introduced Principal Component Analysis (PCA) as a tool for NFL draft player evaluation. Principal Components can be used as a predictor or to inform what measurements are statistically most interesting when creating a metric. So using the PCA from my initial study, I built a predictive model of defensive end potential using the variables deemed statistically “most important”. I then performed a Quantile Regression to explore what the “ceiling” and “floor” may be for defensive end prospects in the NFL.

Quantile regression is a technique to estimate the quantiles of a response variable distribution in a linear model. Quantiles are essentially percentiles, so data at the 0.5 quantile are equal to the 50th percentile. Simple linear regressions only report an r2 for the median quantile, or 50th percentile. This is why a line is drawn directly through the center of a regression. However, looking at other quantiles may provide a more complete view of possible relationships between predictor and response variables . An NFL draft prospect evaluation typically includes a number of different statistics and measurements, but still there are countless others not quantified or reported. Consequently, there generally appears to be weak or no relationships between collegiate production/combine results and NFL success. Yet, by looking only at the median quantile response, models may be overlooking very meaningful relationships between those predictor stats/numbers and a prospect’s NFL success.

In a linear regression, the upper limit of a response variable (i.e. NFL production) is theoretically limited by the measured predictor variable (i.e. NCAA production or Combine performance). But the response variable may change by less than expected when other limiting factors are present. Figure 1 shows 4 different hypothetical example data sets illustrating how limiting factors can control responses. Figure 1.A shows a direct relationship where only the measured predictor, NCAA Sacks per Game, limits how many NFL sacks per game a player has. Figure 1.B shows what the data looks like when an additional limiting factor is present, but not measured. This additional factor could be a player’s body weight or height. Figure 1.C shows more than one limiting factor for a number of players (represented by the data points). In Figure 1.D, we see many unmeasured limiting factors for many of the players, resulting in a wedge-shaped distribution.

These un-measured limiting factors are a common statistical problem. Notice the wedge-like shape of the data set from my previous article: “40 Yard Dash vs NFL Career Sacks per Game” (Figure 2).

Obviously, I do not expect 40 yard dash to perfectly predict a player’s pash-rushing ability. But what else may be limiting? In my PCA of a group of 82 defensive ends, I found the following variables contributed collectively to about 20% of the variance:

Of all these measurements, the following appeared to be the only significant measures in PC1:

In my previous article, using Principal Component 1(PC1) as a predictor of NFL sacks per game did not prove very effective (Figure 3). However, using PC1 to inform me of what measures should be included into a predictive metric could be.

To build a predictive model, I performed a multiple regression using all these measures except games played and assisted tackles. To still account for games played, I normalized solo tackles, total tackles, tackles for loss, forced fumbles, and sacks per game played. Additionally I did not include assisted tackles as I felt that using both solo and total tackles already did so.

I performed the multiple regression using these 11 measures as predictors and “Career NFL sacks + tackles for loss per game” as the response. Admittedly, there are better ways to quantify a defensive end’s value in the NFL, and the Advanced NFL Analytics community can certainly help here. The multiple regression returned the following correlation: R = 0.704 and R2 = 0.496.

I used the following regression equation from the statistical output to predict “Career NFL sacks + tackles for loss per game” for each of our 82 defensive ends:

“Career NFL sacks + tackles for loss per game” =

-3.294 + (0.0640 * 40) + (0.00664 * Vertical Leap) – (0.000392 * Broad Jump) – (0.255 * Shuttle) + (0.439 * 3-Cone Drill) + (0.551 * NCAA tackles for loss per game) – (0.579 * NCAA sacks per game) + (0.320 * NCAA solo tackles per game) – (0.138 * NCAA total tackles per game) – (0.00327 * Body Weight) + (0.00181 * NCAA forced fumbles per game)

I then plotted the predicted number vs. the “sacks + tackles for loss per game” that these players actually recorded in their careers (Figure 4).

At first, we can see there is a general trend, but nothing definitive or compelling. However, when I perform a Quantile Regression using the 25th, 50th, 75th, and 95th quantiles, the picture becomes a bit clearer.

The 50th percentile, or 0.5 quantile is what linear regressions traditionally set to establish the correlation between the predictor and the mean response. However, an NFL GM may be interested in the “ceiling” (95th or 75th) or the “floor” (25th or lower) of a prospect that they are gambling millions on. In Figure 5, we see just this. Players like Whitney Mercilus, Ziggy Ansah, Robert Quinn, and JJ Watt fell along the 95th quantile of observed NFL production, meaning they undoubtedly realized their potential. Mario Williams, Brian Robison, and Anthony Spencer fell along the median quantile or 50th percentile, meaning they neither exceeded nor disappointed, statistically speaking. Jadeveon Clowney’s name is highlighted in red in Figure 5. Clowney’s predicted production (0.74) falls just short of Anthony Spencer’s predicted NFL numbers (0.83). At a predicted value of 0.74, there is essentially a range of 0.75 observed NFL sacks+TFL per game between the 25th and 75th quantiles. That seems like a lot of uncertainty to gamble on. But notice how when this model predicts that a player will record at least 0.6 sacks+TFL per game over his career, there is good reason to believe that he will not completely bust (dotted blue line in Figure 5). However, there are a tremendous amount of “busts” in players predicted to record below this threshold of 0.6 sacks+TFL per game. This is yet another utility of the Quantile Regression. Quantile Regression does not only look at the average response, but also shows those who exceed what is predicted and those who never reach what was expected.

Quantile Regression helps detect inferences on draft prospects that may have been previously dismissed as statistically indistinguishable. For a prospect like Jadeveon Clowney, many superlatives are used to describe what we can measure and observe. Clowney could realize his “ceiling” as an all-pro performer like Robert Quinn, or fall to the “floor” like Vernon Gholston. Is Clowney a “once-in-a-decade” type prospect? We saw Clowney display unseen speed during his 40 yard dash in February, but across the board he performed quite similarly to Ziggy Ansah from last year’s draft. The 40 yard dash, or entire combine for that matter, have been questioned for years on whether they possess any value to predicting NFL production. However, this study shows that for defensive ends, the combine DOES matter, along with a player’s production in college. In addition, Quantile Regression addresses the large amount of variability caused by all the things that we don’t quantify. Can we measure motivation, dedication, work ethic, or focus? No, but with techniques like probability ranking, PCA, and Quantile Regression we may be able to better account for those variables that we simply cannot attach a measurement to.

Feel free to contact me at Casan_Scott@Baylor.edu or casanscott@gmail.com for any comments, questions, or advice. I’d love to share any methods, coding, etc. to anyone interested.