They also need to be able to show their work in a way that lawyers and judges can understand. So many algorithms that purport to match people across databases run up against the so-called black box problem. They may be able to make statistically sound decisions, but they can't easily explain how they made them. In a recent Supreme Court hearing over partisan gerrymandering in Wisconsin, Chief Justice John Roberts dismissed research-backed methods to measure gerrymandering as "sociological gobbledygook." Hersh and Ansolabehere wanted to develop a tool that could be easily understood.

'The more we can agree on methods that are easy to explain, the better off we are.' Eitan Hersh, Tufts University

So, working with the Department of Justice, the researchers set out to determine whether they could match voters on the voter roll with their corresponding records in ID databases using just a few basic details. To do that, they developed an algorithm that scanned the state of Texas’s voter rolls and compared it to the federal list of driver licenses, state IDs, and concealed handgun permits, among other forms of acceptable identification. It scanned each record by address, date of birth, gender, and name, to see if, for instance, a combination of address, gender, and name would be as accurate a predictor as a combination of date of birth, gender, and name.

To check their results, the researchers relied on a subset of the voter data that contained Social Security numbers. Those records effectively served as the algorithm’s answer key. They ultimately found that 98 percent of the records that could be matched using Social Security numbers could also be matched using any three of the four key identifiers—address, date of birth, gender, and name.

“This combination is as good as a Social Security number,” Hersh says.

That high accuracy rate is essential in court, says Charles Stewart III, a political scientist at Massachusetts Institute of Technology, who has served as an expert witness in a case against South Carolina's voter ID law. Most database-matching algorithms are used in low-risk scenarios, he explains, like advertising, where companies want to target customers across a range of platforms. If they target the wrong customer, at worst, they've lost a marginal amount of money. In court, it could mean losing the case altogether. "There is really no room for error," Stewart says, citing the risk of having to demonstrate an algorithm's chops before the bench. "If the judge doesn't get matched properly, for instance, you might as well have not done anything."

Deep Impact

Once Hersh and Ansolabehere were confident they had properly matched registered voters to their ID records, they used a commercial tool called Catalist to predict each voter's race. That tool analyzes names to determine how likely a given name is to be associated with one race or another. It also accounts for the demographics of the Census block where a given voter lives. Using this tool, the researchers confirmed what voting rights advocates already know to be true—that black voters are more likely to lack adequate identification under voter ID laws. According to the study, 3.6 percent of registered white voters had no match in any state or federal ID database. By contrast, 7.5 percent of black registered voters were missing from those databases.

The algorithm shows a clear and disturbing racial disparity on voting rights. But Hersh says that it also shows that voter ID laws affect a relatively small percentage of the population. Across all registered voters in Texas, the researchers found 4.5 percent lack proper identification. For registered voters who actually showed up at the polls in 2012, it's 1.5 percent.

"You're down to a small percentage of the population that doesn't have an ID," says Hersh. That's one reason why, despite Alabama's restrictive voter ID law, black turnout in the recent Senate election still exceeded expectations. Still, while the percentages may sound small, that 4.5 percent still represents 608,470 Texas citizens who could potentially be disenfranchised.