***This article originally appeared in the Rotoworld Draft Guide

Among fantasy football players, it’s common wisdom to assume that NFL players who suffer injuries — or a particular type of injury — are more likely to suffer injury in the future.

Anecdotal evidence might seem to suggest this is true. See Arian Foster, Ahmad Bradshaw or Matthew Stafford. But we wanted to take a deeper dive into the issue. If a player has a laundry list of injuries on their record, should they expect more? Do effects of previous injuries subside over time? Do age, experience or age play a role? How influential is past and expected workload?

Specifically, we wanted to answer the following three questions:

What factors are influential in predicting injuries in an upcoming season?

Which factors are not influential?

What are the highest and lowest injury risk scenarios?

To help answer our questions and with the help of data from Sports Injury Predictor and NFL Armchair Analysis, we’ve constructed decision trees for 3 major injury classes: soft tissue, bone and concussion.

If you’re not familiar with a decision tree, don’t worry. In short, it’s a flow-chart like model that inputs a set of variables, splits their range of values in a way that extracts the most possible information and then gives you a graphical tool on which to make a decision.

At each “branch” of the tree, the variables and data become more granular and more defined.

As you narrow the focus and data, the decision tree allows you to make observations otherwise obscure to the untrained eye.

Here are the variables considered for this study:

Career Touches

Season Touches Per Game

Season Snaps Per Game

Age at start of season

BMI

Years Played

# of Past Soft Tissue Injuries

# of Past Bone Injuries

# of Past Concussions

Days Since Last Soft Tissue Injury (Prior to Sept 1 of the upcoming season)

Days Since Last Bone Injury

Days Since Last Concussion

Position

Instead of drudging through the model building details (which can be read in the analyst’s note at the end of this article), let’s dive into the results.

Note: Upcoming season touches are based on Draft Sharks projections.

Soft Tissue

Soft tissue injuries involve trauma or overuse to muscles, tendons or ligaments (we also included cartilage). To set a baseline for our concerns regarding a soft tissue injury, we should consider the games missed for one. Our data shows that a soft tissue injury can cause a wide range of games missed with a median of 3.78 games.

According to the Sports Injury Predictor database — which includes 1,332 individual occurrences of soft tissue injuries — the most common varieties are pulled hamstrings, ankle sprains and torn ACLs. Combined, they account for 22.5% of our recorded soft tissue injuries.

Here are the results from the decision tree highlighting the high (orange) and low (blue) risk scenarios for sustaining a soft tissue injury in the coming season. (True branches are up; False are down.)

Click to Enlarge

Takeaways

What are the most important variables for predicting soft tissue injury in a coming season?

-Number of such injuries in a player’s’ career prior to the start of a season (Node 0)

-Days since last soft tissue injury (prior to September 1 of the coming season) (Node 16, 21)

-Touches per game during the upcoming season (Node 1, 3, 17, 24)

What are the highest/lowest risk scenarios (combinations of variables)?

Highest risk:

Node 17: Soft tissue injury in the last 9.5 months

Charcandrick West: Suffered a hamstring strain in Week 11 and missed Week 12.

Node 21: Soft tissue injury in the last 9.5 months and projected for more than 10.5 touches per game

Mark Ingram: Suffered a torn rotator cuff in Week 13 and is projected at 17.2 touches per game.

Node 22: Soft tissue injury in the last 8 months and projected for more than 10.5 touches per game

Matt Forte: Suffered a hamstring injury in late July and is projected at 16.4 touches per game.

Lowest risk:

Node 1: No previous soft tissue injuries

Node 2: No previous soft tissue injuries and projecting less than 8 touches per game

Node 6: No previous soft tissue injuries, projecting less than 8 touches per game and at least 24 years old

What variables have little impact and aren’t helpful in judging soft tissue injury risk?

-Years Played in the NFL

-Days Since last Bone injury or Concussion

Bone

The most common types of bone injury are fractures and bruises to the hands, ribs, ankles and feet. Combined, they account for 40.2% of all our recorded bone injuries. The Sports Injury Predictor database includes 350 individual bone injuries. On average, our data shows that a bone injury sidelines a player for 4.28 games; a half game more than the average for soft tissue injuries despite having equivalent medians.

Here are the results for the decision tree highlighting the high (orange) and low (blue) risk scenarios for suffering a bone injury in the upcoming season. (True branches are up; False are down.)

Click to Enlarge

Takeaways

What are the most important variables for predicting each injury class?

-Days since last soft tissue injury (Nodes 0, 3)

-Days since last bone injury (Node 1)

-Coming Season Touches per game (Node 2, 9, 16, 18)

-Age (Node 17, 24)

What are the highest/lowest risk scenarios (combinations of variables)?

Highest risk:

Node 2: Bone Injury in last 5 years

Sammy Watkins: Broke a small bone in his foot in early April.

Node 6: Bone Injury in last 5 years and more than 6 touches per game in the coming season

Julian Edelman: Suffered a Jones Fracture in Week 10 that required a follow-up procedure after the season. He projects for 6.1 touches per game.

Lowest risk:

Node 17: Touches per game less than 5.5

Node 21: Touches per game less than 5.5 and at least 23 years old

What variables have little impact and aren’t helpful in judging bone injury risk?

-Number of past bone injuries

-Total career or per game in coming season snaps

-BMI

-Years Played in NFL

-Position

Concussion

The Sports Injury Predictor database includes 249 recorded concussions. On average, our data shows that a player misses 1.64 games with a concussion when isolating cases from September through November. Limiting to these months was done to give an accurate reflection of how many weeks a player misses.

The NFL’s unveiling of a concussion protocol in 2013 deserves a closer look. Before the protocol, players missed an average of 1.79 games. From 2013-2015, though, players were sidelined for an average of 1.46 games.

Theoretically this may be due to more minor concussions being reported. That isn’t clear if you look at the NFL’s 2015 Injury Data Report which shows the total concussions for 2012-2015 being 173, 148, 115 and 182 respectively. However, in looking at a more extensive report done by Zachary Binney at Football Outsiders, it appears that the number of reported concussions (corresponding with increased awareness and concern) actually started rising dramatically around 2009 and plateauing around 2011.

But rule changes — including a ban on tackling with the crown of a player’s helmet and moving up kickoffs — haven’t reduced concussions. While Football Outsiders shows the dip in total concussions from 2013-2014, the number spiked to 199 in 2015. Even with advancements in helmet technology and greater awareness of head trauma, the physical nature of football means we’re unlikely to see fewer concussions any time soon.

All this being said, it seems most appropriate to distinguish between pre 2009 concussions and those after.

Although the SIP database has quite a few more concussions from 2009 and beyond in its records, now we see an uptick in games missed for concussions after the increased awareness and concern.

So, what’s relevant and irrelevant for predicting concussions in a coming season? Let’s examine the decision tree. (True branches are up; False are down.)

Click to Enlarge

Takeaways

What are the most important variables for predicting concussions in the coming season?

-Snaps per game (Nodes 2, 6, 14, 15)

-Days since last concussion (Nodes 22, 26)

-Age (Node 9)

What are the highest/lowest risk scenarios (combinations of variables)?

Highest risk:

Node 22: More than 18 snaps per game

Node 23: More than 18 snaps per game and last concussion within last 2 years

Teddy Bridgewater: Suffered a concussion last November and projected for 67 snaps per game. (Suffered a season-ending knee injury in late August.)

Node 25: More than 18 snaps per game and last concussion within last 2 years and BMI is greater than 28.3

Latavius Murray: Has actually had two concussions with the last 2 years (Nov 2014 and Nov 2015) and has a BMI of 28.6 and is projected for 40 snaps per game

Lowest risk:

Node 3: No past recorded soft tissue or bone injuries and snaps per game less than 28

What variables have little impact and aren’t helpful in judging concussion risk?

-Position

-Days Since Last Soft Tissue/Bone Injury

-Years Played in NFL

In general, the days since the last injury was very important in determining the risk of upcoming injury. It helps make sense out of a Matt Stafford type who was able to escape the injury cycle and has been relatively clean since. Or Ahmad Bradshaw who was never really able to get back to 100%. Also, opportunity to get injured – somewhat obviously – plays a big role. The more touches and snaps you are playing the more likely you are to sustain an injury.

Perhaps surprisingly, the wear and tear from years in the league and career touches had relatively little influence as did position. But most surprising is the lack of evidence showing increased numbers of past injuries predicting future ones. Our decision trees just showed that if you had at least one you were more prone to have another.

Concerned about a particular skill player? As long as you know a few key aspects about the player, you can use the decision tree to decide whether they have high or low risk of injury. Use it to justify your concerns about your team’s depth chart or whether the player should be moving up or down your draft board in the next couple weeks. And – before jumping ship on a player – use it as the grain of salt when you hear analysts and beat writers discussing “injury prone” players.

Analyst Notes on Model Building Process:

Data: Data was sourced from both the Sports Injury Predictor (SIP) database which is primarily made up of fantasy relevant skill position players and the NFL Armchair Analysis datasets. SIP player seasons with soft tissue, bone and/or concussions were identified. For all players in the SIP database, injury free (no injuries recorded in the SIP database) seasons were filled using NFL Armchair Analysis data. Additional player seasons for players with injury free careers were added for skill position players who played at some point between 2012 and 2015. To remain in the data for analysis, the players with injury free careers need to have at least one season with at least 16 touches per game for QBs, 4.5 for RBs, 1.6 for TEs and 2 for WRs. These thresholds were the 10th percentile for each position for players present in the SIP database. This hopefully keeps potentially fantasy relevant players. The result is a total of 3452 player seasons; 582 experiencing at least one in-season soft tissue injury, 174 with at least one bone injury, and 173 with at least one concussion.

Model Building Process: Separate Classification Decision Trees using the same explanatory variables were built in Python for three major in-season injury classes: Soft Tissue, Bone and Concussion. For each injury class, we tested over 1200 model designs having differing combinations of information quality measure (entropy or Gini), maximum tree depth, minimum leaf sample size, and minimum sample size for further splitting. For each model design, 5 fold stratified cross validation was applied. In addition to the stratified sampling, minority class weighting was applied to balance the effects of each class on model quality measures. The final three model designs favored low log loss and tree depth and high sample sizes for both leafs and splitting, then were applied to the full data set. For the purpose of creating usable decision trees to be shared in this article, a tree depth of 4 was deemed reasonably simplistic and accurate. Below are the accuracy reports for each decision tree.

Next Steps: Over the coming months we plan to enhance the SIP dataset to include more injuries. The current dataset of over 2000 player injuries is heavily focused on arbitrarily fantasy relevant players and injuries. Some issues may exist in the current models due to the fact that the initial splits may be simply identifying players likely to have recorded injuries rather than their actual injury proneness (particularly in the concussion tree). I believe that if this issue is real that it is primarily an issue with first splits and is reduced toward the leaf nodes.

Further, in future iterations we plan to add more variables related to athletic profile, home field surface, position sub-types, etc. to the analysis. We also want to start controlling for workload by studying injury rates on a per touch/snap basis. Moving away from classifiers and towards games missed and production reduction prediction models are in the plans, as well.