Brendan Max and two of his colleagues in the Cook County, Illinois, public defender’s office got some good news and some bad news in the spring of 2018. Actually, it was the same news: The three lawyers had nearly aced a proficiency test designed for fingerprint examiners. None of them had any training or real expertise in latent fingerprint analysis — the practice of trying to match a fingerprint collected from a crime scene to the known print of a suspect — aside from what they’d learned during their years working criminal defense. So, nominally, it was good news: Each of them had correctly identified all but one of the fingerprints contained in the test. But they were certain this was not a good thing. If they could so easily pass the test with zero training to guide their analysis, what did that say about the test’s ability to accurately assess the competency of any fingerprint examiner, including the six employed by the Chicago Police Department, whose work they regularly had to vet when defending clients? Acing the tests, which the CPD examiners regularly did, allowed them to bolster their credibility in court regarding their conclusions about matches between a crime scene print and a criminal defendant. But the lawyers also knew from cross-examinations that these same analysts appeared to know frighteningly little about their discipline, and they worked in a lab setting that had none of the written policies or quality assurance practices designed to keep forensic work well-documented and reliable. As proficiency testing has become ubiquitous in the forensic sciences — according to federal data, 98 percent of practitioners working in accredited public crime labs are proficiency tested — the disconnect Max and his colleagues face in Chicago raises a series of sobering questions. Not least among them: What, if anything, do proficiency tests say about the abilities of the forensic examiners taking them? Startling False Positive Rates The release of a groundbreaking report from the National Academy of Sciences in 2009 threw a harsh light on the state of forensic science. Aside from DNA analysis, the majority of the forensic disciplines lacked meaningful scientific underpinning, the report concluded. This was true for all of the so-called pattern-matching disciplines, where a practitioner takes a piece of crime scene evidence and attempts to match it to a pattern known to be associated with a suspect, a process that is highly subjective. This includes fingerprint, or friction ridge, analysis, along with things like handwriting analysis and bite-mark matching. Friction ridge analysis rests on a deceptively simple foundation: that human fingerprints are unique — an individuality that persists — and that this uniqueness can be transferred with fidelity to a substrate, like glass or paper. While experts have long said that no two prints are the same, there’s no proof that is the case. Moreover, crime scene prints are often distorted — or, “noisy” — partial prints that may be smudged or otherwise degraded, which is where errors occur, as in the infamous case of Brandon Mayfield, the Oregon lawyer who was wrongly suspected of being involved in the 2004 Madrid train bombing based on the FBI’s faulty fingerprint analysis. Implicated in the Mayfield fiasco was a common issue in fingerprint analysis known as a “close non-match.” This is particularly problematic with analyses aided by the Automated Fingerprint Identification System, a database of millions of prints maintained by the FBI. When a latent print is pulled off a piece of evidence — in the Mayfield case, it was lifted from a bag of detonators — but there is no suspect already identified for comparison purposes, an examiner can feed the crime scene print into the system, which generates a list of potential matches based on similar characteristics. While it may or may not be true that no two prints are exactly alike, there are plenty of very similar prints.

The National Academy of Sciences report made a host of recommendations for shoring up the validity and reliability of forensic practices. While some practitioners have effectively stuck their heads in the sand, a number in the fingerprint community have heeded the calls for reform by investigating what leads to errors, trying to devise error rates for the discipline, and conducting research into objective techniques for doing their work. Meanwhile, the academy also made a series of broader recommendations, including that crime labs be accredited and practitioners certified and regularly tested for proficiency. It was amid this broad sweep toward reform that Max, chief of the public defender’s forensic science division, and his colleagues Joseph Cavise and Richard Gutierrez started to get interested in the research on fingerprint analysis. There was the 2012 human factors report, which delved into causes of error in the field, and the 2016 report by the President’s Council of Advisors on Science and Technology, which included details on two studies that had produced startlingly high false-positive rates for latent print examiners; one revealed an error rate as high as 1 in 24. The council concluded that juries should be told about the results of such studies.

A false positive could send an innocent person to prison; an incorrect exclusion could see a killer go free.

“We started reading that research and we thought, ‘Wow, fingerprint evidence is not nearly as strong as has been testified to in the past,’” Max recalls. “We started expecting that our local lab would be aware of this — the groundbreaking research in the field — and that they would start altering how they explained fingerprints.” Chicago police examiners were regularly testifying to things that Max and his colleagues knew were scientifically unsupportable, including that a fingerprint match inculpated a suspect to the “exclusion of all others” in the world. Perhaps, they thought, this would change. But as they continued probing the analysts during cross-examination, they realized that the analysts still believed this kind of categorical testimony was legitimate. In one case, in October 2017, Max questioned a CPD examiner named Thurston Daniels about whether the common method for analyzing prints, known as the ACE-V process, had been scientifically vetted for reliability. “It’s the methodology used by all latent print examiners, so I guess they would assess it as pretty reliable if everybody uses it,” Daniels replied. But where the examiners seemed to know so little about the scientific underpinning of their discipline and the myriad advances in practice, they had at least one accomplishment with which to tout their expertise: They annually aced their proficiency exams. This is a common point on which to build credibility, says Heidi Eldridge, a latent print examiner who is a research scientist at RTI International, an independent, nonprofit research organization. “When you’re trained how to testify, you’re supposed to talk about your degree, you’re supposed to talk about your in-house training, you’re supposed to talk about your proficiency test record,” she said. “It’s the national sort of standard operating way of testifying.” And this is crucial, says Max. Judges are supposed to ensure that examiners qualify as experts before allowing their testimony into evidence. “The testimony … in that regard is usually pretty brief, but the one objective thing that examiners offer, usually the only objective indicator that they’re at all qualified … is that they pass proficiency tests.”

Illustration: Mark Pernice for The Intercept

Suspicious Proficiency Rates The disconnect got Max, Cavise, and Gutierrez wondering what these proficiency tests were all about. They went to the website for Collaborative Testing Systems, a Virginia-based company that is the nation’s leader in providing testing materials for forensic practitioners. CTS publicly posts the results of its proficiency exams, including for latent print examiners. Looking through years of results, the lawyers discovered that it wasn’t just the CPD examiners who were acing the tests, it was nearly everyone who took them. The whole situation “made us really suspicious,” Max said. “We looked at the passage rates year in and year out, and they’re all in the mid to high 90s.” They came up with two hypotheses: one, print examiners “are uniformly amazing,” or two, “the tests are really easy.” They decided to take the test themselves. And they each did really well, getting 11 out of 12 questions right. (All three got the same question wrong.) It was the second hypothesis, they concluded, that was the correct one. At issue, it seemed, was the type of sample prints contained in the test. They were fairly pristine with lots of details making them suitable for analysis, not the noisy or bloody partial prints one might expect to find at a crime scene. And there were no close non-matches, the kind that confused the experienced FBI examiners in the Mayfield case.

The lawyers had to go back to 1995 to find a test that had tripped up a lot of examiners.

In fact, the lawyers had to go back to 1995 to find a test that had tripped up a lot of examiners. That year, CTS had included not only a bloody print and one with “tonal reversal” — which can make print furrows appear as ridges — but also an example of a close non-match, in the form of prints from identical twins. Less than half of the people who took the test got all seven questions correct. The results caused an uproar, recalls Eldridge, and the next time around the tougher comparison questions disappeared. Of course, it isn’t true that everyone in the community is satisfied with the status quo. Quite the opposite, says Eldridge, who notes that for years practitioners have complained to CTS about the questions being too easy. What Brendan Max and his colleagues did with their experiment was to demonstrate that to the entire field. “If the purpose of the proficiency test is to say, ‘Everyone who took this can meet a very low level of the minimum expectation of what someone should be able to do to work in this job at the lowest level,’ then game on; that’s exactly what it measures,” she said. “If they’re trying to test a certain level of competence, it’s not testing that.” “The problem that Brendan brought up is that we use this as a shield when we go to court,” Eldridge added. “The moment we make that claim and we use the proficiency tests as evidence of expertise, now we’re claiming it’s measuring something that it’s not measuring.” Part of the problem is that, by and large, crime lab directors pay for the tests, which are expensive. For U.S. practitioners, CTS’s current latent print exam costs up to $340 per person. “It’s the lab directors who don’t want to pay thousands of dollars to purchase a test that half their staff will fail,” said Eldridge. Indeed, the tests are supposed to function not only as a check on individual examiners, but also as a means of interlaboratory comparison — looking at whether examiners across labs have come to a consensus decision about a given print examination. Finding the right balance of questions to achieve a meaningful result is a challenge, says Chris Czyryca, president of CTS. Where practitioners complain about the tests not containing “case-like samples,” for example, CTS is already at a disadvantage: It wouldn’t be possible to ink or lift a print exactly the same way hundreds of times in order to supply one to every test taker, he said, so they have to use photos of prints instead. “It’s not really like casework.” Czyryca says CTS walks a “knife’s edge” in creating the tests. “There are prints that are controversial. The ones that are easy, they tend to get called out as easy,” he told The Intercept. “There are ones that are more difficult and sometimes you have people saying that ‘It’s too difficult.’ Sometimes, ‘You’re trying to trick us.’” And, of course, there’s also market pressure to contend with. Although CTS is dominant, it competes with two other testing companies for crime lab business. “There is a commercial pressure to produce tests that are not burdensome and not too complex,” Czyryca explained back in 2015. But he also takes exception to the idea that missing just one answer means a test-taker did well. “I’m not sure you understand that implication of just nine out of 10 and thinking, ‘Hey, 90 percent. That’s an A, right?’ No. That’s not the way this works.” Indeed, in the forensic lab, getting just one print match wrong can have serious real-world consequences — a false positive could send an innocent person to prison; an incorrect exclusion could see a killer go free. And missing answers on a proficiency test can trigger an extensive work review within the lab, particularly in accredited labs with meaningful quality assurance programs. “If you want to tell me they’re too hard or they’re too easy by a little bit, I’ll accept that. If you’re a defense attorney who’ll say, ‘This is trivial, it means nothing,’ I don’t accept that.” On the test Max, Cavise, and Gutierrez took, just 12 out of 360 people missed one or more answers. The Benefits of Blind Testing While Eldridge agrees that there are consequences for examiners who don’t ace their proficiency tests, “I wouldn’t use that as an argument that we should, therefore, keep the tests really easy,” she said. “If you can’t pass the test then that should be an alert to someone. We should be looking at what we need to do to make you better at your job.” In fact, some of her research could meaningfully change the way proficiency is tested. Eldridge is working on a tool that can objectively measure the quality of individual prints. That tool could then be used to build a test with scaled information. “You took this test, you got all the easy ones right, you got all the medium ones right; you got a few of the hard ones and then you missed some of those hard ones,” she said. “So now I have a better idea of about where you are on the scale, how good you are. And then we can say something meaningful about somebody’s skill level that can be taken into court. And we can take away the stigma of, ‘Oh, gosh, I failed my proficiency test and I’m going to be fired,’ because it would be expected that nobody would get 100.” Eldridge argues that this would be a better way to test the strength of individual examiners and the greater system — a way to find the limits of ability. “But it would be a big paradigm shift.” There are some labs that are pushing things forward in new and interesting ways, including the Houston Forensic Science Center in Texas. The lab is something of an anomaly: It is completely independent and overseen by a board of directors, meaning that, unlike the bulk of crime labs, it is untethered from police or prosecutor agencies. Among the recommendations put forward by the National Academy of Sciences was that all crime labs be independent, free from the potential bias of law enforcers and their budgets. It remains one of the report’s most hotly debated recommendations. Among the 409 publicly funded crime labs identified by the federal government, only a handful aside from the Houston Forensic Science Center are truly independent. The Houston lab is also big. And that scale gives it a distinct advantage. Peter Stout is the CEO of the Houston lab, which has 200 employees and receives about 30,000 requests for forensic analysis per year. Among the advantages this offers is an ability to devise and incorporate an extensive internal proficiency testing program. Like Eldridge, Stout says that the current proficiency tests aren’t exactly robust, but that isn’t necessarily CTS’s fault. “I very much have a philosophy of ‘test to the failure.’ You make tests for the system that look for where the system breaks,” he said. But that hasn’t been the philosophy of the forensics world writ large. “There’s not been a real press from the practitioner community to make proficiency materials that are more representative, harder, challenge the system more. So that’s part of why CTS is what it is. There’s just been no demand.”

It’s the kind of proficiency testing that forensics reformers have said is important to shoring up the nation’s system. But it comes at a price.

Since 2015, Stout’s lab has incorporated a different kind of proficiency testing into the examiners’ workflow: blind testing. The lab’s quality assurance staff devise case-like samples that are slipped into the system, amid the regular work, in order to test examiners’ skills — and the broader health of the lab’s protocols — without alerting anyone working with the faked evidence. It’s the kind of proficiency testing that forensics reformers have said is important to shoring up the nation’s system. But it comes at a price — both in terms of money and manpower — that most labs don’t believe they can handle. While the Houston lab has a quality assurance division with a staff of six, the majority of crime labs simply don’t have the resources to separately staff such an operation, meaning that lab employees often play a dual role — a lab director may also serve as head of quality assurance, or practitioners may do some of that assessment. Where that’s the case, blind testing doesn’t work. Half of the nation’s publicly funded labs have fewer than 24 employees, notes Stout. “There is no separation to manage a blind system.” Then there is the matter of backlogs, which are a constant headache for crime labs. Where there’s an extensive backlog, “it’s much more difficult to slip these things in because they’ve already got stuff in queue.” Because the Houston lab enjoys robust staffing, significant funding, and a moderate backlog, “it creates an obligation for us to do some of these things,” Stout says. “We have the privilege of being able to set this stuff up and show, here’s what works, here’s what the limitations are, so those who don’t have as much ability to take those risks can then tell their administrative hierarchies, ‘No, look, Houston did it. The sky didn’t fall.’” Stout made a deal with his staff: If they can accurately spot a blind test coming through the system, they get a Starbucks gift card; if they’re wrong, they owe Stout $1. The examiners have demonstrated some pretty amazing powers of observation, he said. One examiner in firearms spotted a blind because the gun “smelled familiar,” Stout said, the way an old gun does — so, not like one used in a recent crime. “And they were right.” A fingerprint analyst examining a crowbar allegedly used in a smash-and-grab spotted the blind based on the way the fingerprint appeared on the metal: No one, the examiner reported, would hold a crowbar that way. To date, Stout has given out a couple dozen gift cards and has about $3-4 in his pocket. Eventually, he hopes the data gathered from blind testing will help formulate error rates for the lab. The whole goal is to design a system that better minimizes the risk of an error. And for now, at least, part of the process is continuing with the cross-lab proficiency tests like CTS offers. “Right now, our blinds give us … a comparative test of our system — but I still need the comparative performance of our lab to other labs. So open proficiency testing is still very much a necessary tool.” But, he says, “there still needs to be more rigor in them.”

Illustration: Mark Pernice for The Intercept