A federal judge this week unsealed the source code for a software program developed by New York City’s crime lab, exposing to public scrutiny a disputed technique for analyzing complex DNA evidence.

Judge Valerie Caproni of the Southern District of New York lifted a protective order in response to a motion by ProPublica, which argued that there was a public interest in disclosing the code. ProPublica has obtained the source code, known as the Forensic Statistical Tool, or FST, and published it on GitHub; two newly unredacted defense expert affidavits are also available.

“Everybody who has been the subject of an FST report now gets to find out to what extent that was inaccurate,” said Christopher Flood, a defense lawyer who has sought access to the code for several years. “And I mean everybody — whether they pleaded guilty before trial, or whether it was presented to a jury, or whether their case was dismissed. Everybody has a right to know, and the public has a right to know.”

Caproni’s ruling comes amid increased complaints by scientists and lawyers that flaws in the now-discontinued software program may have sent innocent people to prison. Similar legal fights for access to proprietary DNA analysis software are ongoing elsewhere in the U.S. At the same time, New York City policymakers are pushing for transparency for all of the city’s decision-making algorithms, from pre-trial risk assessments, to predictive policing systems, to methods of assigning students to high schools.

Thousands of Criminal Cases in New York Relied on Disputed DNA Testing Techniques New York City’s crime lab has been a pioneer nationally in analyzing especially difficult DNA samples. But the recent disclosure of the source code for its proprietary software is raising new questions about accuracy.

DNA evidence has long been a valuable tool in criminal investigations, and matching a defendant’s genetic material with a sample found on a weapon or at a crime scene has impressed many a judge and jury. But as new types of DNA analysis have emerged in recent years to interpret trace amounts or complex mixtures that used to be dismissed as hopelessly ambiguous, the techniques are coming under fire as overly ambitious and mistake-prone.

An article ProPublica co-published with The New York Times on Sept. 4 detailed the growing doubts about the Forensic Statistical Tool, which New York City created to determine the likelihood that a given defendant’s DNA was present in a mixture of multiple people’s genetic material. According to the crime lab’s estimates, FST was used to analyze crime-scene evidence in about 1,350 cases over about 5 1/2 years. It was phased out at the beginning of this year in favor of a newer tool.

A coalition of New York City defense lawyers has called for a review of all cases that may have been affected by either FST or a second disputed analysis method, called high-sensitivity DNA testing. The state inspector general, which acts as the lab’s ombudsman, has received the lawyers’ request but has not yet announced whether she will launch an investigation.

The crime lab, which is part of the Office of the Chief Medical Examiner, did not oppose ProPublica’s motion, but maintains its support of its technology. “I want to be very clear that OCME continues to stand behind the science that the FST source code operationalized, and that we will continue to defend FST,” Florence Hutner, general counsel for the medical examiner’s office, wrote to the judge on Oct. 6.

She added that the lab agreed to full disclosure of the expert affidavits because the redactions had “exacerbated the substantial misunderstanding of fundamental aspects of the FST source code that is reflected in multiple published criticisms of that code.”

ProPublica’s motion came in a federal gun possession case, U.S. v. Kevin Johnson. Johnson was staying with his ex-girlfriend in the Bronx when police were called to her apartment and found two socks wedged between the refrigerator and the wall, one containing a black pistol and the other a silver revolver. By FST’s calculation, the DNA found on one gun was 156 times more likely than not to contain Johnson’s genetic material. DNA from the other gun had an overwhelming likelihood of 66 million.

In that case, Caproni became the first judge to order the lab to hand over the code for examination by the defense, but her protective order barred attorneys and experts from discussing or sharing it. Nathaniel Adams, a computer scientist and an engineer at a private forensics consulting firm in Ohio, reviewed the code for the defense and submitted an affidavit that was partially redacted before being made public. “The correctness of the behavior of the FST software should be seriously questioned,” he wrote in an unredacted section.

See the Source Code You can see the code for the Forensic Statistical Tool on GitHub.

ProPublica’s motion, filed on Sept. 25 with the help of the Media Freedom and Information Access Clinic at Yale Law School, argued that the judge should vacate that protective order because of “the profound importance of this technology to the integrity of the criminal justice system, and the overriding public interest in transparency.”

“This ruling finally enables ProPublica to gain access to the code in order to report on this matter of vital public concern,” said Hannah Bloch-Wehba, a supervising attorney in the MFIA clinic, following the judge’s order. “As law enforcement agencies increasingly rely on algorithmic tools in the criminal justice system, it is all the more important that the press and public have access to the information critical to understand what the government is doing and hold it accountable.”

FST was invented by employees of the crime lab and programmed by software consultants. The lab began using it in 2011 to analyze complex mixtures of DNA left behind at crime scenes. About 50 jurisdictions as far away as Bozeman, Montana, and Floresville, Texas, also sent samples to New York City for testing. When defense attorneys challenged FST’s results in court and sought access to the program’s source code, the crime lab has previously refused, saying it was proprietary.

Although almost all judges have allowed FST results as evidence in court, one state judge, Mark Dwyer of Brooklyn, ruled them inadmissible in two cases in 2014. Dwyer, now presiding in Manhattan, excluded FST evidence from two more cases this week. While prosecutors in both cases said DNA evidence analyzed with FST showed that the defendants violated gun possession laws, Dwyer said in court on Oct. 16 that his doubts about the program’s acceptance in the scientific community persist, especially since the New York lab is no longer using it, and no other lab has adopted it.

New information about the development of the FST source code and some of its purported weaknesses surfaced this past July in the cases before Dwyer in an affidavit by Eugene Lien, a technical leader in the DNA lab, whom the prosecution was using as an expert. After the lab started using FST for casework in early 2011, he and his colleagues discovered a problem with the program’s math that could skew a test’s results, according to Lien. “Because of this, the FST program was taken offline and portions of the software were re-coded,” he wrote.

The lab did a “performance check” of the new version before resuming casework with it in July 2011, he went on, but lab officials did not inform the state oversight commission about the change, nor did they run another full validation study on the program.

ProPublica Seeks Source Code for New York City’s Disputed DNA Software We’re asking a federal court for the code behind a technique that critics say may have put innocent people in prison.

The letter to the state inspector general from the group of defense lawyers cited Lien’s account, saying it contained “damning admissions” about the lab’s lack of transparency. They also theorized that the recoding Lien described could itself have led to one problem identified by Adams — the exclusion of potentially valuable data from FST’s calculations of likelihood ratios. Characterizing Adams’ criticisms as merely cosmetic rather than substantive, the lab has contended that FST calculations were reliable.

Besides ongoing criminal cases in New York City involving FST, Caproni’s decision to unseal the source code may also affect another legal fight for access to a proprietary DNA software system. The American Civil Liberties Union and the Electronic Frontier Foundation intervened in a case in California’s appeals court on Sept. 13 in support of a defendant’s right to review the source code behind a commercially available DNA analysis program called TrueAllele.

“It’s a major credit to the court, the parties and ProPublica that the source code used in Mr. Johnson’s case will now be subject to public scrutiny,” said Brett Max Kaufmann, a staff attorney for the ACLU who is working on the California appeals case. “We urge other courts to follow this example when hearing cases involving similar types of evidence.”

Outside the courtroom, some New York City lawmakers are seeking more public review of algorithms and their impacts. On Oct. 16, the New York City Council’s Committee on Technology held a hearing about a proposed bill calling for all city agencies to publish online the source codes for algorithms that they use in decision-making. As an example of the danger of relying on algorithms, witnesses and a committee report cited ProPublica’s 2016 investigation that found racial bias in a software program used by courts to decide whether it’s safe to let defendants out on bail.

“These tools seem to offer objectivity, but we must be cognizant of the fact that algorithms are simply a way of encoding assumptions, and their design can be biased, and that the very data they possess can be flawed,” the bill’s author and the committee chair, James Vacca, said at the hearing. “I have proposed this legislation not to prevent city agencies from taking advantage of cutting-edge tools, but to assure that when they do, they remain accountable to the public.”

The committee heard from defense lawyers and others who support the bill as well as representatives from Mayor Bill DeBlasio’s Office of Data Analytics and the city’s Department of Information Technology and Telecommunications, which both oppose it in its current form. After the hearing, Vacca told ProPublica that he would revise the bill to address criticisms he had heard about confidentiality concerns, and also to clarify that the proposal applies to both programs developed by third-party vendors and software developed in-house by city employees. Vacca said he is determined to pass a law on this issue before the end of his term.

“To my knowledge, we are the first city, and the first legislative body in our country, to take on this issue,” Vacca said during the hearing. “And as with so many other things, I’m hoping that New York City will set the example for others throughout the world.”