Over the last decade, Mexican drug cartels have been fighting each other—and corrupt police and military units—for control of the lucrative drug trade, plunging the country into chaos. Outsiders might think of Mexico as sunny and tequila-soaked, but beyond the beach resorts of Cancun and Mazatlan there hides a grimmer tale: levels of murder, rape, and kidnapping are hitting levels rarely seen outside hotspots in Africa, Asia, and South America.

So grim the tale, when 43 college students went missing in Mexico's southern state of Guerrero in 2014, investigators found 129 other bodies in 60 fosas clandestinas (mass graves) before stumbling on badly burned remains in a mass grave they think might—possibly, maybe—contain what’s left of the missing students. Mexico's attorney general says the local mayor conspired with the town’s police force to abduct the students and turn them over to a local gang, who murdered them and burned the bodies, and dumped the charred corpses into a river.

And just a few weeks ago, in March, 2017, Mexican press reported that more than 250 human skulls have been found so far—excavations are not complete—in a mass grave in the state of Veracruz.

The situation is so bad that, after six decades of gains, the average life expectancy in Mexico has decreased, according to recent research.

Now one group of human rights researchers believe they have a high-tech way of discovering mass graves: machine learning. They hope that finding many more burial sites with clever software will expose the true cost of the conflict, help grieving family members find peace, and perhaps stymie the tide of desaparecidos—the disappeared.

Murder and mayhem in Mexico

Jorge Ruiz Reyes is a grave hunter, a bespectacled, bearded human rights researcher at the Universidad Iberoamericana in Mexico City who radiates intensity. He speaks in rapid-fire Spanish. More than 30,000 people are missing in Mexico, he says, many dead and buried in mass graves. He wants to find those graves.

"The war on drugs has resulted in a high level of murders and disappearances, and other violations of human rights in Mexico," he tells Ars. "The official numbers don't tell the whole story."

Official numbers are a conflicting hodge-podge of what many consider to be under-reported horrors. According to the attorney general's office in Mexico, 662 bodies were found in 201 hidden graves between 2005 and 2015. Mexico's armed forces counted 534 bodies in 246 graves between 2001 and 2015.

But, Ruiz Reyes says, the Mexican press frequently report on cases not included in the official numbers, including the 43 missing college students, and the mass grave found recently in Veracruz.

Officially more than 30,000 people are missing in Mexico. That tally is mostly from people who went missing last decade, Ruiz Reyes says, and only includes people whose families have filed a missing person report—which, given the high level of violence in the country, many are afraid to do. One Mexican NGO disputes the official tally, though, saying the number could be as high as 300,000.

"This is what's happening in Mexico," Ruiz Reyes says. "The number of disappeared is very high."

It's impossible to know how many of those missing people lie undiscovered in a mass grave, he says. That's why, with the help of the Human Rights Data Analysis Group (HRDAG) in San Francisco, he's using a machine learning model to predict where grieving family members are most likely to find the bodies of their loved ones.

Grave hunting, meet machine learning

Patrick Ball, director of research at HRDAG, working with Ruiz Reyes and other researchers at the Universidad Iberoamericana and Data Cívica in Mexico City, analysed data from Mexico's 2,547 municipios ("counties"), and built a machine learning model that predicts where similar graves are likely to be found.

"If we know which counties have some discovered mass graves, we can predict which other counties have mass graves that are similar," he tells Ars.

The model uses obvious predictor variables, Ball says, such as whether or not a drug lab has been busted in that county, or if the county borders the United States, or the ocean, but also includes less-obvious predictor variables such as the percentage of the county that is mountainous, the presence of highways, and the academic results of primary and secondary school students in the county.

Using records from 2013, Ball mixed together data from counties with observed graves, and from counties researchers were confident did not have graves, and then randomly divided the resulting set into two groups. He used one half of the data to train a random forest algorithm, he says, then used the resulting model to see if it could predict the second half of the data.

The random forest method uses random subsets of the training data to create a "forest" of decision "trees," and then averages the predictions of the individual trees. The algorithm takes the predictor variables for each county and assigns them a numerical score—the likelihood that the county contains a mass grave similar to those previously seen—much like a spam filter.

"How good a job does the model do at predicting the data that it's never seen before?" Ball asks. "Our answer is it perfectly predicted the counties that have graves and those that didn't. It had no false positives and no false negatives."

Alarmed at such perfect results, Ball says he repeated the experiment 500 times, each time splitting the data randomly into testing and training data.

The machine learning model perfectly predicted the testing data every time, he says. "So the result is not an accident of how the testing and training data were split, but it still might be an artifact of how really really different the kinds of counties are."

Ball repeated the experiment using 2014 data, and got the same results. "There's a really big difference between the counties that have have graves and the ones that don't," he says. "It's easy for the model to disambiguate them."

"This is," he admits, "not a huge surprise to people who are looking at patterns of violence in Mexico."

The real value of the machine learning model is not just confirming existing intuition among grave hunters, he says, but guiding the search strategy by predicting which counties are most likely to have hidden graves.

"It would be interesting to know which of those counties are most like the counties that have graves—'like' in this complicated statistical sense," Ball explains. "This would essentially provide you with a ranking of 2200 counties in terms of the ones most like the ones where we've found mass graves."