In July, I wrote a piece titled “The Rate of Domestic Violence Arrests Among NFL Players,” which has been getting a lot of attention recently — some of it missing the point.

I based the analysis in my article on USA Today’s NFL Arrests Database, combined with data from the Bureau of Justice Statistics’ Arrest Data Analysis Tool and some historical data gleaned from the National Incident-Based Reporting System and a variety of BJS reports on domestic violence. The main points I made were:

For most crimes, NFL players have extremely low arrest rates relative to national averages. Their relative arrest rate for domestic violence is much higher than for other crimes. Although the arrest rate for domestic violence may appear low relative to the national average for 25- to 29-year-old men, it is probably high relative to NFL players’ income level (more than $75,000 per year) and poverty rate (0 percent).

But the article has been cited by a number of people to support the proposition that the NFL does not have an unusually high domestic violence rate. While I think this is a fair characterization of my intermediate results — the arrest rate I noted was 55.4 percent of the national average for 25- to 29-year-old men as suggested by the USA Today arrest data and rough number of players in the NFL — it’s misleading when taken out of context.

Let’s be more explicit about the different assumptions that can affect that bottom-line comparison. For that analysis, I generally tried to lean toward assumptions favorable to the NFL, with the intention of showing that, even under those assumptions, the NFL appeared to have a “downright extraordinary” arrest rate for domestic violence.

But there are still a lot of unknowns in the data and lot of choices to be made about what exactly we’re comparing to what.

Reliability of arrest data

A lot of readers, commenters, emailers, tweeters, media, etc., have questioned the USA Today NFL arrest data. They’re right to be skeptical. There’s a good chance the arrest data is incomplete — particularly when it comes to marginal players who are only attached to the NFL briefly.

When I wrote that piece, I was concerned about both over- and under-inclusion: The pool of NFL players who would pop up in the database might be even larger than the estimate based on roster limits (because some players come and go, and players are frequently dropped and replaced throughout the year), but it might also miss some players whose arrests flew under the radar.

I hand-sampled a number of cases and found that they appeared to include many marginal players with minimal attachment to the league. With the NFL being so intensely followed, I thought the USA Today data set was probably pretty comprehensive.

But some readers have made some good cases for why the arrest count the database produces could be low.

On the pure data-collection level, I’ve corresponded with an enterprising reader who compared the frequency of arrests in the USA Today data for players with more games played vs. those with few games played. He found the first group had a much higher arrest rate. From this, he concluded that the database was probably missing arrests for lesser-known players, and he determined that basing the arrest rate on an assumption of 53 players per team (rather than the 80 players per team I used) was the most accurate approach (only coincidentally corresponding to the number of players on the roster during the year).

His case seemed strong to me but not conclusive: It’s possible that marginally attached players are arrested at a lower rate. For example, marginally attached players may be younger (unsigned rookies) or older (borderline veterans) than typical players, and thus less likely to have families (younger) or be aged out of the most likely group to commit domestic violence (older). Additionally, we don’t know what’s driving the NFL’s overall domestic violence arrest rate, and I can imagine plausible scenarios in which regular players are more likely to commit and/or get arrested for the offense.

Another potential problem, as several readers pointed out, is that virtually any NFL arrest data may understate the equivalent arrest rate in a less privileged population. In other words, NFL players who are involved in domestic violence incidents could be better at avoiding arrests than the general public. Relatedly, it’s possible there have been arrests that were either avoided or kept off the media’s radar because of team and/or league machinations.

Whether any of those possibilities are likely or not, we should be explicit as to how our position on them affects our results.

An appropriate pool for comparison

If we want a bottom-line NFL vs. X number, the pool you use for X is obviously quite meaningful. But it’s difficult to figure out which pool we should be comparing to, and even if we do know what pool we want to use, figuring out their arrest rate (especially for domestic violence crimes) can be quite difficult.

In my article, I primarily compare NFL arrest rates to arrest rates for 25- to 29-year-old men, and then I compared their arrest rate for domestic violence to their arrest rates for other crimes (it’s about four times higher). While we don’t have arrest data broken down by income, we do have such breakdowns for victimization rates (based on BJS survey data). I compared the relative domestic violence victimization rate for people from households making $75,000 or more to both the overall domestic violence victimization rate (it’s 39 percent as high) and rate for ages 20 to 34 (20 percent as high). It’s impossible to compare this directly to the relative NFL arrest rates with precision, but at least it gives us some benchmark for how income level may affect domestic violence incidents.

In addition to inherent murkiness of trying to compare across different types of data, there are a few other possible problems with the $75,000 or more per year comparison.

First, NFL players have a number of advantages that your typical member of a household making $75,000 and up each year may not. That’s the highest income group I had data for, but NFL players are typically wealthier than that. NFL players spend a good portion of the year in an extremely structured environment. They have extremely low rates of drug and alcohol abuse (especially relative to arrest rates for drug and alcohol-related crimes), and alcohol and drugs tend to be big risk factors for domestic violence.

On the other hand, NFL players didn’t necessarily have the advantages that a lot of $75,000-and-up earners do. NFL players may be more likely than those earners to have come from difficult backgrounds, or to have experienced or observed abuse in their families, and in general to have missed out on the privileges associated with coming from a wealthier background.

Finally, there are some differences in the data that we don’t know enough about to say what their effect might be, such as:

Are victims from higher-income households more or less likely to make police reports that lead to arrests? How does the extreme wealth disparity between NFL players and their domestic partners affect the power dynamics that may lead to more or fewer arrests?

Note: None of this has to be the case, and I haven’t studied these factors or their effects on criminality. But they are questions that affect our assumptions, and affect what type of comparison we should be making and how we should interpret it.

Even if we could settle on a perfectly representative pool for comparison, getting even approximate figures for each group is extremely difficult. For example, as I noted in the original article, the BJS’s Intimate Partner Violence reports don’t include breakdowns by income anymore. So we have to make reasonable estimates based on several related numbers. This process has a lot of wiggle room in it as well, so we should be clear to look at what kinds of proxies lead to what kinds of results.

Different combinations of assumptions

With so much murkiness in both our data and our aims, the best thing to do is to look at a range of assumptions and see whether there are patterns that are apparent independent of such choices.

Let’s first combine the possible issues with the USA Today data and represent them as a single number — which we’ll call “percentage of arrests captured by USA Today data” — representing its completeness with regards to actual arrests, as well as arrests that were otherwise avoided.

Likewise, let’s combine the issues about comparison groups into a single percentage representing the bottom-line arrest rate of our comparable population (whatever it might be) relative to our 25- to 29-year-old average. In other words, we’re using one metric to represent each group by our best estimate for its relative arrest rate (which we can compare to benchmarks).

Then we combine these two metrics with the information we have (NFL Arrest Rates in USA Today database, approximate number of NFL players and arrest rates for the general population), like so:

We calculate the known NFL arrest rate and scale it to per 100,000 by taking the NFL arrests per year in the database, multiplied by 100,000, and divided by the number of NFL players per year (approximately 2,560). We divide this by the “percentage of arrests captured by USA Today data” (by assumption, per above). We gather data on the known national arrest rate for 25- to 29- year-olds, which is per 100,000. We divide this by our estimated relative arrest rate of a comparable population (by assumption, per above). Finally, we calculate the ratio between 2) and 4) and subtract 100 percent — this tells us how our estimated NFL arrest rate compares to the rate we estimate for a comparable population.

Now we can chart the result of this calculation for given values of A and B as heat maps. Even if we assume extremely incomplete arrest data, the NFL’s overall arrest rate is still very low relative to the national average for its age range. But if we hold the NFL to an extremely high standard, we can still find its arrest rate to be subpar.

I’ve used the same color scheme for both of these (100 percent = white). So it should be obvious that the NFL’s doing much worse with domestic violence arrests than with arrests overall.

Note that the difference between assumptions can be an order of magnitude or more. Under a favorable set of assumptions, the NFL looks better than average; under an unfavorable set of assumptions, it’s doing terribly.

For example, if you compare NFL players only to the national average for 25- to 29-year-old men, and you assume that the USA Today database is pretty much complete, you arrive at the 55.4 percent figure.

On the other hand, if you assume that the NFL’s domestic violence arrest rate should be proportional to the overall arrest rate, you can see that the NFL has a “domestic violence problem,” whether the USA Today data is complete or not. This was essentially the scenario I was leading to in my initial article.