The use of deadly force by police is an extremely controversial issue, especially when post-hoc, it appears unjustified. Many claim that there are covariates that result in the biased use of force (race, class, sex, presence of mental illness etc). As such, I will try to quantify “typical” use of force scenarios using cluster analysis.

Cluster analysis attempts to find natural groupings (clusters) of data points such that the difference between a data point and its cluster is minimized. Data within each cluster are similar in some sense of the word – they may share features or be close in terms of mathematical distance. The following graph (taken from Wikipedia’s article on cluster analysis) shows the concept of natural grouping:

It is apparent that there are 3 clusters: the red, the blue, and the green. Each point within that cluster is closer to the average value (center) of its cluster than the average value of any other cluster.

In applying this to police shootings data, I will thus be using measures of similarity to determine how “close” or similar different use of deadly force events were. The characteristics of these “typical” use of force events (defined as the center of the clusters) should give us some decent insight into police shootings.

The Data

The Washington Post maintains a record of all fatal police shootings. The data begins at January 1, 2015 and has 1,512 data points. For each record, I pulled out whether the victim was armed, their sex, if they were displaying signs of mental illness, their threat level to the police (whether they were attacking, an undetermined threat, or other [which I assume means not a direct threat to police specifically – the Post wasn’t very clear on this]), and whether the victim was attempting to flee. I disregarded any records that didn’t have all of these available data, leaving 1,413 observations to analyze.

The Clusters

Using the k-modes algorithm, I partitioned the data into 10 clusters:

These are the 10 most frequent clusters after 200 iterations of the algorithm (where the algorithm was sub-run 5 times and the best clusters of the 5 runs were chosen). A point is assigned to a cluster if the point is closer to that cluster than any other cluster. Thus, k-modes minimizes the within-cluster distance.

Very quickly, we see that the “typical” police shooting has an armed victim. That victim is almost always male (in the 1,413 records, only 63 were women). In 20% of the clusters, there was evidence of mental illness in the victim. In 30% of clusters, the victim had a threat level of “other.” Again, I’m not sure what the Post meant by this, but it either means that the victim posed no threat or was not directly attacking police. Some suspects did flee, and others didn’t.

In terms of race, White, Black, and Hispanic men were all targets of shootings. The White male who attacked police without signs of mental illness and didn’t flee had the highest cluster membership of almost 400 records. The largest cluster after that was Black men who were attacking police, didn’t flee, and didn’t show signs of mental illness at 240 records. The smallest cluster (perhaps the furthest away from the others) was the man with the “other” race who had a gun, was attacking police, and was not fleeing. There were only 9 records for this cluster.

Given the media attention on the “unarmed Black man who was not resisting, not fleeing, and not mentally ill,” the cluster was not one of the top 10 most frequent. In fact, it was the 15th most frequent cluster, occurring 18% of the time. This, I would speculate, is because these incidents are so far outside of the norm, making them into a cluster actually increases the overall error. If many other records are closer to a cluster that is sacrificed to make one for unarmed, non-resisting, mentally fit, non-fleeing Black men, then the overall total within-cluster distance increases (which the algorithm seeks to minimize).

Indeed, of the 1,413 records, only 6 are associated with unarmed, non-resisting, mentally fit, non-fleeing Black men (as a reference, 21 records were for unarmed, non-resisting, mentally fit, non-fleeing men in general). These events are incredibly rare which is likely why they receive so much media attention, but they absolutely are not the “typical” case of police shooting.

Why 10 Clusters?

So, why did I choose 10 clusters? Surely every individual police shooting holds valuable information? Considering our 6 features, there are actually 322 unique combinations of armament, race, sex, mental illness, threat to police, and fleeing. So, to get no error (within-cluster distance), we would need 322 clusters. This would tell us almost nothing about typical police shootings.

In statistics, it is often the case that we want to simplify our models. We may want to use fewer predictor variables, have fewer cluster centers, etc. The simpler our models, the more generally applicable our results are and the less risk of overfitting we run. So, we want to use as few cluster centers as possible. But, the fewer cluster centers we use, the more information we lose and the more within-cluster distance we accumulate. Thus, in statistical terms, we want to minimize the cost of our number of parameters while still maximizing the generality/informativity of our clusters.

One way to do this heuristically is with the Elbow method. The following graph is the total within-cluster error for different numbers of clusters:

Notice that at very few clusters, we convey more generality, but the within-cluster distance is very large. The within-cluster distance decreases rapidly until 10 clusters, after which it stops rapidly decreasing. Thus, 10 is an “elbow,” and we use 10 clusters.

Is this method perfect? Absolutely not. It is entirely legitimate to have 5 clusters or 12 clusters or 20 clusters or even 322 clusters. It’s not “optimal,” but the number of clusters you use should be governed by the problem at hand. Plus, this is just a heuristic approach – there are more formal ways of calculating the optimal number of clusters like the gap statistic. Regardless, I think 10 is a pretty good number of clusters given our goal of maximizing generality while minimizing error.

What This Analysis Does and Does Not Mean

This analysis does not mean that any race is or is not shot more or less than others. It does not indicate whether disparities in police use of deadly force are racial or based on the presence of mental illness. It also does not say that there are only 10 typical use of force cases. It doesn’t say anything about the typical victims of police shootings outside of their race, sex, armed status, mental status, threat level to the police, and whether they were fleeing.

Just because something is a “typical” use of force scenario does not mean that every scenario conforms to one of the 10 clusters – in some cases, it may be informative to use all 322 unique clusters. As we saw above, the shooting of unarmed, not-fleeing, mentally fit, and non-threatening (Black) men are so far out of the norm and so infrequent, they don’t get a cluster. It is not a common use of force case, but it does still happen.

So, what can we say about these results? These are the 10 use of force scenarios that minimize the intra-cluster distance. That is, these are the 10 cluster centers which partition the data in the most informative way. In some sense, they are the most “typical” use of force cases, but that does not mean that these are the 10 most common use of force scenarios.

I do think that this analysis suggests that most use of force scenarios involve armed suspects. Frequently, they involve a suspect attacking the police (the extent to which remains subject to what the Washington Post considers an “attack”). The suspect often is fleeing or shows signs of mental illness.

Either way, the use of force by police is a complicated issue. Don’t let people with black-and-white views of the world take advantage of you by manipulating statistics to their favor. This applies to both ends of the spectrum – don’t look for data to justify your world view. Let your world view be governed by the available data and be willing to change your mind when presented with high quality evidence.