Over the course of the semester I’ve been using data from college football as examples in my Quantitative Methods course.

One data set that I found particularly interesting is this map outlining recorded concussions in college football last year. The map is kind of cool, but what’s really interesting is all of the other data they gathered along with it.

So…I downloaded the .kmz, pulled it into ArcMap, converted it into a shapefile, and then pulled out the data table.

I was particularly interested in determining if some positions were more prone to concussions than others.

Here we go…the most straightforward breakdown. With chi-square tests you compare your observed frequencies to an expected frequency distribution – in this case, if spread out all 166 concussion in their sample evenly across all of the positions. To test whether any differences are statistically significant, we run a Pearson’s Goodness of Fit test, which returned a significant p-value (aka we reject the null hypothesis that the observed and expected values for each position are the same). It seems that with this setup, defensive backs are over-represented with concussions, whereas tight ends are under-represented.

However, something about this bugs me. On any given play, there is usually one tight end and four to five defensive backs. Could the reason they’re “popping” relative to the other positions is that there’s simply a lot more defensive backs versus tight ends on the field at any one time.

Here’s what it looks like if you adjust the proportions to the percentage of each player’s position on the field with a pro set offense versus a 4-3 defense. Again, the variation in distribution of concussion across the positions is statistically significant, except the position that has the biggest different between “expected” and “observed” are the quarterbacks (WAAAYY over-represented) followed by the defensive line (WAYYYY under-represented).

I decided to run this one more time, but with a more contemporary reflection of the proportion of offensive and defensive players. In other words, the proportions of a typical spread offense versus a nickel defense. This returns similar values (quarterbacks over-represented and defensive line under-represented) but with runningbacks now over-represented.

What do I think this means? I think it means that players in more high profile positions (i.e. they get the ball a lot) are more likely to have an obvious concussion that warrants them being held out of a game. The cynic in me would put it another way – these are positions where it’s harder to hide that a player has a concussion.

On the other hand, defensive line is a high profile position in that 300lb defensive tackles don’t grow on trees, and perhaps the reason they are so under-represented is that they are so valuable on the field that coaches are willing to play them even if they are concussed (not to mention that it might be easier to hide than a position like quarterback).

Also, only 166 players? Granted, there could be a data collection issue here, but that doesn’t seem like very many players at all, especially given that there over 1500 players on SEC rosters alone, and this sample is from ALL of the bowl division conferences.

So, what happens when you look at their data across conferences?

(Well, if you ignore the fact that I somehow added one more to the sample and now we have 167 – I’m tired and I’ve had two beers and I’m doing this in my spare time).

After adjusting for the number of teams in each conference, you’ll see the biggest differences are the Mid-American (over-represented), Sun Belt (under-represented), Independents (over-represented), ACC (over-represented), and Mountain West (over-represented).

Is this an issue of uneven reporting? Or are there some weird biases in who is likely to be diagnosed with a concussion in college football?