Safest to do what they say Robert Nickelsberg/Getty

Police shootings in the US frequently make local and national headlines, but there is no government-run database of the fatalities. So people are turning to machine learning to make sure no shooting by the police goes unrecorded.

Working out how many people in the US have been killed as result of police action isn’t easy. Where they do exist, official records are somewhat lacking. A 2016 study by the US Bureau of Justice Statistics found that police records cover half as many police-related deaths as media reports. The violent-death reporting system run by the Centers for Disease Control and Prevention collects data in only 42 of the 50 states, and doesn’t focus on police shootings.

So far the gaps have been filled in by newspaper-led projects notably The Washington Post, which in 2015 logged more than twice as many fatal police shootings than the FBI did.


The most comprehensive database – used by The Washington Post – is thought to be Fatal Encounters , a website that lists every person killed in an interaction with police in the US since 2000. It is run by D. Brian Burghart, a Nevada-based journalist who, over the past five years, has enlisted a network of online activists to gather the details of over 22,000 fatalities dating back to 1 January 2000.

Jigsaw puzzle

Burghart scours local news reports, petitions local law enforcement for information and pieces together details from other sources to build his database, which he estimates is over 90 per cent complete. Making a record of every single police-related fatality is extremely labour-intensive, he says.

To make this task easier, Brendan O’Connor at the University of Massachusetts Amherst and his colleagues have created a system that automatically scrapes news reports for mentions of police shootings.

“News articles cover a large amount of these fatalities,” says O’Connor. His team’s algorithm analyses sentences in news reports to try to extract the names of people who have been killed. They trained the system by having it analyse news articles from 2016 that included certain police- and fatality-related keywords. An algorithm extracted sentences that it thought referred to police shootings, and then compared these sentences to the names in Burghart’s database to find out which ones referred to people who had actually been shot by police.

The idea was that the system gradually learned to recognise sentences that referred to recent police shootings, and ignored those that refer to historic shootings or shootings that weren’t fatal.

The system managed to identify 57 per cent of the people shot by police between September and December 2016 that were in Burghart’s database.

Unknown unknowns

O’Connor is hoping to improve the results by feeding the algorithm a greater range of news sites and perhaps even social media data. “We want to pull in more data from more sources,” he says.

Burghart is trialling the system to help build his own database, but he flags up a weakness in it. If a shooting isn’t covered by the press, or reported by the authorities, then it will remain out of reach of any algorithm or researcher, says Burghart. “There’s no way to know what doesn’t exist,” he says.

Beyond that, shootings are only part of the picture. Taking into account other police-related fatalities – including suicides, chase deaths and taserings – Burghart’s figure of total police-related deaths so far in 2017 adds about 500 to The Washington Post’s headcount.

Despite the narrow focus, Burghart says machine learning will eventually mean that he doesn’t have to spend hours every week digging through past news reports to pull together records of police-related deaths. “I’m very hopeful to be made obsolete by machine learning,” he says.

Journal reference: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, aclweb.org/anthology/D17-1163

Read more: Smarter police interviews could help reduce racial tension