For most working Americans, Labor Day weekend is a time to relax, reset, and celebrate our successes of the past year. We throw barbecues and take our last few trips to the beach, trying to drag out the summer as we look to the fall and our prospects at work with nervous excitement.

For players in the NFL, it's a completely different story. Labor Day weekend can be, for many, the most stressful time of the year.

In a 36-hour span, 41% of the NFL’s labor force became jobless.

On Saturday, the league’s regular-season 53-man roster limit took effect, meaning teams had to cut down from the 90 players they were allowed to keep in the preseason. In a 36-hour span, 41% of the NFL's labor force became jobless.

For some players, this is a temporary, frictional unemployment, as other teams with extra roster spots may soon snap them up. For others, it means settling for a spot on a practice squad. But for those who haven’t garnered interest from other teams around the league, it may signify the end of their NFL playing career.

In this series of articles, I'll attempt to uncover the factors dictating career length of NFL players - to understand what kinds of players survive cuts year after year, and determine whether the team who drafts, develops, and utilizes them has an impact on their longevity.

The Belichick Factor

A team’s reasons for waiving or releasing players are myriad: They may be overstaffed at a position. They may need to create cap space. The player may not fit the team’s scheme. He may have gotten injured, or has shown up to camp out of shape. Regardless, the team’s decision to move on will be scrutinized – and often lampooned, in the case of a high-profile player, or recent draft pick.

Above: Ras-I Dowling, former Patriot, drafted with pick #33 in the 2011 NFL Draft. Dowling was waived prior to the 2013 season, and played a career total 12 games. Credit: Matthew West, Boston Herald.

As the 4 p.m. Saturday deadline neared, I sat listening to a local sports radio broadcast. As reports of cuts rolled in, the commentary ranged from disappointment (“guess Etling’s wheels weren’t enough ...”) to surprise (“you let Gillislee go after signing him for that much?”) to complete incredulity (“what were they thinking with Cyrus Jones?”).

There aren’t many criticisms of Bill Belichick that have persisted over the years. Poking holes in his spotty draft record remains one of the few exceptions. For Pats fans, the “second round reach who’s out of the league in two years” has become somewhat of a self-fulfilling prophecy.

Belichick and the Patriots aren’t the only ones with draft woes (see: Denver dropping 2016 first-rounder Paxton Lynch and a handful of other recent draft picks. Or Seattle, who in July waived their first pick of the 2017 draft. Or Tampa, who traded up to #59 in 2016 to take kicker Roberto Aguayo, only to cut him one year later). Yet, it sure seems that the Patriots make a fair number of head-scratching picks who don't survive in the NFL as long as their fellow draftees. In the rest of this article, I'll test whether this observation is biased/inflated by Boston sports media and Patriots haters nationwide, or if it actually holds some statistical truth. To do so, I'll employ a couple of the tools I’ve encountered in my work as a data analyst in the biomedical field.

Survival Analysis

Survival analysis is a branch of statistics conducted on data where the outcome variable is the time until the occurrence of an event. This event can be death, diagnosis of a disease, or even the breaking of a machine part. The time to event, or survival time, can be measured in days, weeks, years, etc.

This type of analysis is used particularly frequently in clinical trials, where events are death, the recurrence of a disease, or occurrence of an adverse event. Often, trial participants are split into treatment groups and followed over time. The rates at which participants in each group survive past a given time point (i.e, the event does not occur) are compared to assess the efficacy of each treatment.

For a more detailed (read: better) description of survival analysis, read this introduction by Simona Despa.

Kaplan-Meier Curves

If you've ever come across a graph that looks something like this:

Then you've seen a survival analysis. Here, the red & green curves are the plotted values of the Kaplan-Meier estimator for two groups of patients, each with a different gene signature. The estimator approximates the survival function: the probability at a given time point that the patients will survive past that time point. These 'curves' are step-wise because the estimator is non-parametric, or determined empirically: each step down indicates the actual occurrence of an event(s) at a specific time point in the patient group which decreases the percent of survivors in the group from that time forward. With larger and larger sample sizes, the estimator approaches the true survival function of the population, and the curve smoothes.

Kaplan-Meier curves are a great visual tool. From the above chart, we can quickly see that the likelihood of a patient with 'Gene B' surviving past year 2 is 20-30 percentage points lower than that of a patient with 'Gene A'.

However, to understand whether the difference in the survival distributions of these two groups is statistically significant, we must conduct a hypothesis test. For that we turn to the logrank test. The test statistic is a form of chi-square, comparing at each time point the number of observed events to the number of expected events to test the null hypothesis of no difference in survival between the two groups. For a summary of the test and an example problem, read this guide from the Boston University School of Public Health. The p-values you will see below are results from the logrank test.

Data Prep

I performed my analysis on this dataset of historical draft picks, provided by Savvas Tjortjoglou. In the dataset, each row corresponds to a single draft pick and contains statistics that the pick would subsequently amass during his time in the NFL, in addition to the year of his retirement. I filtered the data to include only drafts from 1994 onwards, when the league switched to the 7-round format it uses now.

I won't go too deep into my methodology, as I plan to do a code-focused write-up in part II, but I performed the analysis using R and a couple of packages built to do so: survival and survminer.

I defined the 'event' as the players' retirement, and the 'time to event' as the number of games played until retirement. Once we have these two variables defined, we're ready to construct Kaplan-Meier curves and perform the logrank test for whatever dichotomies we want to examine in the data.

Survival of Patriots draftees vs. the rest of the league

Chart explained: In this chart, and the others you will see below, the light grey is the Kaplan-Meier curve for all draft picks not included in the group of interest. The dark blue, in this case, is the group of Patriots players drafted after Belichick took over as head coach in 2000. The y-axis is the values of the Kaplan-Meier estimator, and the x-axis the number of games played until retirement. The vertical lines are the median survival times - the time that each group has a 50% chance of surviving past.

In general, it turns out that Belichick's drafting strategy isn't as flawed as we've come to believe, at least statistically speaking. The median survival time is 50 games for all Belichick picks and 60 for all others, a difference that doesn't amount to a full season.

If we look at picks made only in the 2nd and 3rd rounds - the higher-profile picks for which Belichick has gotten the most criticism - the difference in median survival time widens to 24 games:

But this result is still not statistically significant. The p-value of 0.21 is not below the (p = 0.05) threshold that would suggest we reject the null hypothesis of no difference in survival between the Belichick and non-Belichick picks. When looking at two K-M curves, it is critically important to understand that comparative analysis of the two (and the logrank test) depends upon the whole curve and not upon isolated points like median survival. To illustrate this, we can look further down the curve to see that these Belichick picks have odds of playing >100 games that are similar to those of all other second- & third-rounders.

Conclusion

After constructing K-M curves for a couple of other teams, and showing my analysis to a Bears fan who commented: "I didn't realize Belichick got $%!# for his drafting... he's definitely drafted better than us over the timeframe" - I've arrived at the verdict that the dropout of NFL draftees is a league-wide epidemic, despite what the talking heads in Boston have to say.

Outside of the first round, the draft is somewhat of a crapshoot. In the years that Belichick has been at the helm, few teams have drafted better than the rest of the league , and only one has drafted significantly worse:

For further evidence of this phenomenon, look at the top left of the Redskins' curve. See how it starts with a vertical line, from 1.0 to somewhere around 0.9? That means that a significant proportion of Redskins draftees (21/157, or 13.3%) played a career total of zero games. This is true league-wide as well - in fact, 10.1% of all draft picks since 1994 never saw the field.





Bonus Materials and Next Steps

I think there is a lot more to unpack here. Next, I'd like to merge my dataset with the NFL Combine numbers and some college statistics to better understand the factors that lead to longer careers in the NFL.

I also constructed some position-based and other random Kaplan-Meier curves, but didn't have the time to analyze these in depth.

Division rivals Baltimore and Pittsburgh were two of the teams whose picks survived the longest:

Running backs and quarterbacks retire quickly in the NFL:

... and Jerry Jones may have a knack for drafting after all:



