PART I. INTRODUCTION

Science has long been regarded as ‘self-correcting’, given that it is founded on the replication of earlier work. Over the long term, that principle remains true. In the shorter term, however, the checks and balances that once ensured scientific fidelity have been hobbled. This has compromised the ability of today's researchers to reproduce others’ findings.

Over the past several decades, forensic science has faced immense criticism. This criticism often reduces to the notion that forensic scientific knowledge has not traditionally been produced and presented in a way that allows judges and juries to assess its reliability. As a result, untested—often invalid—‘science’ contributed to many miscarriages of justice. Along a similar timeline, but with almost no express recognition of the issues happening in forensics, a scientific revolution has been occurring in the ‘mainstream sciences’. This revolution—one focused on methodology—responded to the discovery of several peer-reviewed and published findings that appeared to be false or substantially exaggerated. Metascientists (ie those who use scientific methodology to study the scientific enterprise itself) at the heart of this revolution prescribe more open and transparent methods. In this article, we evaluate the openness of forensic science in light of the reforms underway in the mainstream sciences. We then consider the distinctive challenges and advantages that openness presents to forensic science, and propose several tangible ways to improve forensic science through open science.

The most authoritative expression (to date) of open science's methods and values is a 2018 Consensus Study Report of the National Academy of Sciences, Engineering, and Medicine (‘NASEM Report’). The report reviews the state of openness across several scientific fields and provides a vision for widespread adoption of open science methods. In doing so, it broadly accepts openness as a better way to conduct research:

The overarching principle of open science by design is that research conducted openly and transparently leads to better science. Claims are more likely to be credible – or found wanting – when they can be reviewed, critiqued, extended, and reproduced by others.

Similarly, we believe there are at least three pressing reasons for forensic science to adopt a variety of open scientific practices. First, as the NASEM Report notes in the preceding quote, open science enables more thorough analysis of factual claims. Conversely, when science is not conducted transparently, recent metascientific research has found that results are misleading, with actual false positive rates well above what is reported. This may be acceptable (but not salutary) in the mainstream sciences as the literature may self-correct over time, but it can produce vast injustice in the criminal law context.

Second, and flowing from the first, open and transparent knowledge-generation processes comport with and progress legal values like the presumption of innocence and access to justice. For example, there is well-established imbalance in the state's ability to develop a forensic scientific case against an accused and the accused's ability to assess that case and amass his or her own evidence. This inequity is heightened when the foundational science behind the state's case was conducted opaquely and published in paywalled journals (and then applied in crime labs, which have been described as ‘organizational black boxes’). Similarly, commentators have studied access to justice in terms of legal assistance and access to databases of legal decisions. However, the factual basis of access to justice has largely been neglected. This is unfortunate because if the science behind a case was transparently reported and more affordable to assess, impecunious parties may stand a better chance at mounting a defense.

Third, open science provides a set of tools that may make forensic science more efficient. Many forensic disciplines have a long way to go in validating their subjective methodologies and in developing objective methodologies. Resource limitations are often severe. To counter such restrictions in the mainstream sciences, open science reformists are developing web platforms and best practices for collaboration and sharing of data and methods.

In the following Part, we briefly review forensic science and the challenges it is currently facing. Part III then delves into the open science movement afoot in the mainstream sciences and assesses forensic science's current level of openness. Next, in Part IV, we tie together the forgoing, examining the ways in which open science is distinctively suited to help improve forensic science. We also address the challenges that forensic science will face in adopting a more open model. Part V concludes.

PART II. THE STATE OF FORENSIC SCIENCE

Forensic science's shortcomings are well-documented, so this section will provide only a brief review with a focus on the areas that may be enhanced through open science reforms. From the beginning, many forensic scientific practices—especially those based on feature comparison—had no basis in academic science. Rather, they arose ad hoc in criminal investigations, their development driven by the investigators themselves (rather than independent bodies with scientific training). This meant that many forensic practices developed in a manner that was substantially divorced from scientific structures like the empirical testing of claims, blinding, randomization, and measuring error. Fingerprint identification, for instance, has appeared in U.S. courts since 1911. Examiners regularly made identifications about the source of a fingerprint as against all the world and expressly stated that their practice was infallible. Only in the past two decades have appropriately designed studies been performed and published, supporting the validity of the practice.

This nonscientific character of forensic science drew scathing criticism from attentive legal and scientific scholars. Still, courts remained deferential to forensic witnesses. There are many reasons for this. For one, legal actors lack the scientific training to appropriately question forensic practices. Moreover, forensic scientists have historically been associated with the police and prosecution, making it difficult for the criminally accused to find an independent expert, let alone pay for one. With regard to legal structures and safeguards, foundational concepts like stare decisis provide little assistance when invalid evidence has historically been admitted into court. Changing the legal standard for admitting evidence from one that defers to the scientific community to one that requires that trial judges engage with scientific concepts seems to have made little difference.

Perhaps not surprisingly, the widespread admission of untested, invalid, or misleading forensic evidence has contributed to several wrongful convictions. Many of these convictions came to light due to the rise of DNA analysis, one of the few forensic sciences to emerge from the mainstream sciences and withstand thorough validation testing.

Acknowledgement of these wrongful convictions inspired a great deal of research, but none was as momentous as a 2009 report drafted by a National Research Council committee of the National Academy of Sciences (the ‘NAS Report’). The report, confirming longstanding worries, catalogued a host of problems:

• deficient training and education among forensic scientists;

• lack of peer-reviewed and published foundational research establishing the validity of forensic methods;

• lack of protocols to minimize cognitive bias;

• insufficient standards for reporting findings and giving testimony; and

• scarce funding to support improvements to any of the foregoing.

Two federal bodies were created as a result of the NAS Report, but neither has proven as effective as the Committee wished. The Report called for an independent central regulatory body for the forensic sciences. While that did not come to be, the U.S. Department of Justice (DOJ) eventually formed the National Commission on Forensic Science (NCFS), an advisory body aimed at providing policy recommendations to the Attorney General. Some progress was also made with the DOJ adopting the NCFS’s first recommendations regarding accreditation. Just four years after its formation, however, the NCFS was abruptly decommissioned under the new presidential regime.

As recommended by the NAS Report, the National Institute of Standards and Technology (NIST) also took on some new responsibilities. In particular, it created the Organization of Scientific Area Committees (OSAC), which oversees several committees and subcommittees that create and maintain standards for the forensic scientific disciplines. Only time will tell how effective standard-setting can be in regulating forensic science and, more generally, how effective OSAC—the last institutional vestige of the NAS Report—can be. The main limitation of OSAC will likely be that it has no express powers to enforce standards and thus they may operate as mere recommendations.

While the shuttering of the NCFS was undoubtedly a setback for those concerned about the state of forensic science, there is reason to think that forensic science finds itself at an inflection point. Notably, the NAS Report and a similar 2016 report of the U.S. President's Council of Advisors on Science and Technology (the ‘PCAST Report’, more on this below) drew attention to longstanding problems in the forensic sciences. The reports appear to have (re)invigorated academic scientific efforts aimed at the forensic sciences. These efforts are girded by the fact that funding bodies continue to support research in the forensic sciences (although certainly not to the extent many would prefer). Moreover, the media continues to be interested in forensic science-driven controversies.

Usefully, the PCAST Report delineated a clear framework by which to evaluate the state of forensic science practices, and then compared several feature-comparison disciplines against that standard. In short, the PCAST Report said that forensic evidence must be both foundationally valid and then applied in a demonstrably valid way. As we will discuss in Parts III and IV, transparency and openness assist with both mandates.

By foundationally valid, the PCAST Report authors meant that the method must have been empirically tested to demonstrate that it is ‘repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application’. Repeatable in this formulation, refers to intra-examiner reliability—the same examiner should come to the same result over time. Reproducible refers to inter-examiner reliability—different examiners should come to the same conclusions. And finally, accuracy can be thought of as error control. The method should have a known and tolerable level of error, avoiding false positives and false negatives. Of the feature-comparison disciplines reviewed by PCAST in 2016, only DNA analysis of single-source samples and fingerprint analysis were foundationally valid.

As to applied validity, the PCAST Report explained that it has two components. First, the examiner must be demonstrably capable of applying the method. This capacity should typically be supported by proficiency tests. Proficiency tests are empirical demonstrations that the examiner can accurately make the relevant judgment in realistic situations. Ideally, these tests should be inserted into the examiner's coursework such that he or she is unaware that he or she is being tested. Second, the examiner must have faithfully applied that method and reported any uncertainty in the conclusion (eg the false positive rate of the method).

While the exigency of establishing foundational and applied validity may seem obvious to outside observers, the PCAST Report found that substantial hurdles still exist in establishing them across most of forensic science. Indeed, it found that considerable work still needed to be done in all of the feature comparison disciplines it reviewed. For instance, although the report found that fingerprint analysis is foundationally valid, proficiency testing should be improved (eg the tests should be more representative of actual casework), and examiners do not always faithfully apply the method.

Finally, the PCAST Report strongly recommended that forensic science develop more objective methods (ie those generally less reliant on subjective judgment), often through automated image analysis. It did so for several reasons: objective methods are generally more transparent and reliable, and they present a lower risk of human error and bias. Significant progress has been made towards developing objective systems for fingerprint and firearms analysis. However, a key limitation going forward is the lack of a large body of stimuli (eg a database containing images of fingerprints with a known ground truth). As we will see, open science may provide viable responses to such limitations.

PART III. IMPROVING SCIENCE THROUGH OPENNESS AND TRANSPARENCY

In contrast to forensic science, concepts like validation testing and blinding are orthodox in much of mainstream science. That said, undisclosed flexibility in the scientific process has still allowed for researchers’ biases and expectations to influence results. Recent metascientific research has explored how to improve the scientific process, often recommending openness and transparency that would dissuade and reveal more subtle forms of researcher bias. Predating many of these concerns, scholars and advocacy groups have long campaigned for more open access to the products of research. They have noted that research data and findings often sit behind paywalls, making it difficult for those without institutional access to use and verify that knowledge. The open science movement, therefore, encompasses the aims of democratizing knowledge and producing knowledge that is more trustworthy. As we will see, both aims are fundamentally important to forensic science's ongoing development.

In this Part, after a brief review of recent controversies in mainstream science, we will introduce the open science movement and its more specific manifestations (eg open data, open access journals). As part of this discussion, we will assess the degree to which forensic science is adopting these reforms. We generally find that while there have been some promising developments, there is still much work left to do in opening forensic science (but many reasons to take on that work).

Science in Crisis Concerns about the number of scientific findings that may be false or exaggerated have percolated for years, but have reached a fever pitch in the past decade or so. For instance, in 2016, the journal Nature asked approximately 1500 scientists whether science was experiencing a reproducibility ‘crisis'. The researchers found that 52% of those surveyed believed there was a significant crisis, 38% believed there was a slight crisis, and only 3% thought there was no crisis. Note that there is some ambiguity and inconsistency in the literature in the use of the terms ‘reproducibility’ and ‘replicability’. For the purposes of this article, we will define replication as repeating a study exactly with new data to determine if the same result is achieved. We will refer to reproduction as repeating the analysis used by an existing study on its own data to see if the results are the same. Both replicability and reproducibility are thwarted when published reports do not provide enough information about how the study was conducted or provide the raw data for re-analysis. But, more than studies not providing enough information to replicate them or reproduce their findings—in many cases, when replication attempts have been conducted, the results have contradicted the original findings. Indeed, The Nature survey was released in the wake of several large-scale failures of replication. For example, in social science, one of the largest efforts to date attempted to replicate 100 studies published in three of psychology's top journals. The researchers found the same result with the same level of statistical certainty in approximately one third of the studies and generally found considerably smaller effect sizes. In 2018, another collaboration of researchers attempted to replicate the findings of 21 social scientific studies published in Nature and Science. They found the originally reported effect in 13 of the attempts and, overall, effect sizes were about 50% smaller than in the original studies. In medicine, one influential review of preclinical research found reports of irreproducibility and irreplicability ranging from 89%–51%. Using a conservative estimate, this corresponds with $28B of lost research funds. Findings like these led Francis Collins (Director of the U.S. National Institutes of Health) and Lawrence Tabak to state that the checks and balances of science have been ‘hobbled’. Troublingly, human trials have also not escaped criticism. Researchers in the UK, for instance, found in 2018 that approximately 50% of studies were in contravention of EU laws requiring reporting of results (similar laws exist in the US). And, studies replicating clinical studies also often show contradictory results or significantly smaller effects. While we have focused on social science and medicine, that is largely because these fields have been unusually proactive in examining their own practices and admitting deficiencies. Indeed, similar problems have been reported in a variety of fields. In Neuroscientific research, for instance, many of the correlations reported between brain activation and behavior or personality measures are far higher than is statistically possible. Further, an analysis of over 3000 papers in the cognitive neuroscience field were underpowered to detect true effects, suggesting a false discovery rate of over 50% across the discipline.

Contributors to the Crisis Critically, the controversies recounted above have been followed by a raft of metascientific research aimed at determining why so many studies are proving difficult to replicate and reproduce. Much of this work builds on historic concerns about the research process, lending such concerns support from modern quantitative methods. We will now briefly review a selection of the culprits identified by recent metascientific study: ‘questionable research practices’ (QRPs), ‘publication bias’, ‘spin’, lack of replication, small sample sizes, and overreliance on simplistic statistical methods. QRPs exploit flexibility in research methods and reporting practices to make a researcher's results seem more persuasive than they actually are. Such practices include deciding whether to exclude observations after looking at how this would affect the overall results, measuring a phenomenon several different ways but only disclosing those measures that support the hypothesis and strategically stopping data collection when results reach some level of statistical confidence. These tactics rest in a gray area of scientific practice that many once viewed as defensible, trivial, or even normative. Views that would minimize the harmful impact of QRPs are now demonstrably untenable. In a widely influential paper, Joseph Simmons and colleagues employed a quantitative simulation to demonstrate that the use of QRPs increased the actual false positive rate of a literature well-beyond its reported false positive rate. Use of four QRPs increased a notionally 5% false positive rate to approximately 60%. Troublingly, metascientific research has also found that QRPs are, in fact, widely used. Anonymous surveys in psychology, ecology, and evolutionary biology find self-reported usage of QRPs ranging from approximately 3% to 60%, depending on the QRP in question. Note, however, that recent empirical work in psychology suggests that initial estimates of QRP use based on survey studies are inflated. QRPs conspire with ‘publication bias’ and ‘spin’ to provide, in some cases, a deeply misleading view of a research literature. The term publication bias refers to published scientific literature containing systematic biases as to what types of articles are published. This includes the empirically-founded observation that studies that find no effect (eg a drug had no discernable impact on a disease) tend to be published much less frequently than studies that find some effect (ie the null studies languish in file drawers, hence the colloquialism, the ‘file-drawer effect’). This can contribute to strategic research patterns. Researchers may, for instance, employ the tactic of performing several underpowered studies (ie they collect few observations) that vary in small ways, and only reporting those studies that find positive effects. Another instance of publication bias is a preference for novel results. Journals typically prefer to publish articles that purport to show some heretofore undiscovered phenomenon, rather than studies that simply attempt to replicate previous studies. This is problematic because, as mentioned above, replication lends credibility to previous research and can help uncover spurious findings. Even if publication bias is overcome, published negative results are cited less frequently (ie citation bias). Further, published reports may be ‘spun’. In other words, the report may suggest that some positive effect exists by emphasizing positive findings and deemphasizing negative ones (although both findings can be found by a careful reader, distinguishing this from some QRPs that would suppress the negative finding altogether). Y.A. de Vries and colleagues recently collected and analysed 105 trials of antidepressant medication (see Figure ). They found that a confluence of QRPs, publication bias, citation bias, and spin produced a deeply misleading portrait of a body of knowledge. In the literature they researched, it appeared (to a reader attending to the primary message of the published reports) that only a small portion of the studies found a negative effect: the drug appeared effective. However, once the researchers unearthed the unpublished studies and waded through spin and citation bias, half of the studies were actually negative. Importantly, de Vries and colleagues could only do this because clinical trials are regulated such that it possible to find true number studies that are being performed (ie they are registered, see below). As we will see below, the same cannot be said for forensic science. Finally, small sample sizes and overly simplistic statistical methods have contributed to the replicability crisis. During the opening salvos of the crisis, John Ioannidis famously predicted that over half of published findings were false (based on a theoretical model) because, among other reasons, studies typically do not use large enough samples to find the effects they are looking for (ie they are underpowered). Moreover, the commonly used statistical method of null hypothesis significance testing (NHST) is often applied with little thought. QRPs, as mentioned above, can render the results of NHST misleading by producing false positive rates that underestimate the true false positive rate. And unlike other statistical methods (eg Bayesian), NHST does not take into account the a priori likelihood of the hypothesis.

The Open Science Response (and Forensic Science's Place Within it) The scientific community is rapidly adopting transparency-related reforms as a way to improve science. The NASEM Report, for instance, strongly endorsed many open science reforms: ‘open science strengthens the self-correcting mechanisms inherent in the research enterprise’. In this section, we review some these reforms. It is important to note, however, that open science should not be construed as a panacea for all flaws and inefficiencies in the scientific process. In both the mainstream sciences and forensic science, change depends on the concerted efforts of: (1) oversight and funding bodies; (2) journal editors and publishers; and, (3) the researchers themselves. We will review the roles of these stakeholders before delving into the specifics of open science reform in the forensic sciences.

Drivers of Change in Mainstream and Forensic Science As to leadership, the National Institute of Standards and Technology (NIST) may be best placed to guide the move to open forensic science. Indeed, both the NASEM Report and the PCAST Report identified NIST as a crucial leader in the movements they described. Within the forensic sciences, NIST’s tasks may include periodically reviewing the state of foundational validity in various disciplines, advising on the design and execution of validation studies, creating and disseminating datasets, and providing grant support. These jobs may be guided by NIST’s broader role as a leader in the open science movement as it encourages open practices and, in some cases, makes funding contingent on them. Further, the National Science Foundation—active in funding both the mainstream and forensic sciences—is already requiring that researchers engage in open practices. Journal editors and publishers will also be instrumental in the transition to open forensic science, as they are in the mainstream sciences. This will especially be so if forensic science moves in the direction urged by the PCAST Report, with forensic scientists adopting an increased interest in publishing their work in reputable journals. Currently, one of the most influential models for openness in peer-review and publishing is the Transparency and Openness Promotion (TOP) guidelines. The TOP guidelines, first published in Science in 2015, are a standard set of guidelines for transparency and reproducibility practices across journals. They are comprised of eight standards: citation (eg citing data, materials, and code), data transparency (eg posting data to an online database), analytic methods (code) transparency, research materials transparency (eg surveys and stimuli), study preregistration, analysis plan preregistration (see our discussion of preregistration below), and replication. The TOP Committee defined three levels for each standard which range from journals simply encouraging the standards (level 0), to requiring and verifying that the articles have met each standard (3). The Center for Open Science provides tools and guidance for organizations wishing to implement TOP and keeps a list of those which have agreed to consider the TOP guidelines (signatories) and those which have implemented them at some level. As of March 2019, over 5000 journals and organizations are signatories and 1000 have implemented them. A number of journals have also adopted a badge system to acknowledge papers that are preregistered and have open data and open materials. This initiative is promising, with open data in a leading psychological journal increasing from 3%–23% after implementation of the badge system. Beyond government organizations and publishers, adoption of open science in forensics will depend on individual researchers and practitioners. This is already beginning. Among practitioners, the Netherlands Forensics Institute (NFI) is adopting strong transparency reforms with respect to any quality-control related issues in their labs. Further, forensics researchers may use online tools like the Open Science Framework (OSF). The OSF is a free online platform for open science where any researcher (with an academic affiliation or not) can create a webpage for a research project and use that to share data, analysis, and materials. It also contains several tools for collaboration. An example (albeit a rare one) of the use of the OSF in forensic science research is a recent Australian state police-federal government funded collaboration between a university cognitive science lab, which has adopted open science reforms, and several local police services. Through the remainder of this Part, it may be instructive to compare specific openness-related reforms in the mainstream sciences to the state of openness in forensic scientific research (Table ). To that end, we performed some preliminary research about openness in forensic science by reviewing the policies of forensic scientific journals (see Appendix A for a description of our search). We identified 30 forensic science journals and recorded whether they were open access, a TOP signatory, and adopted any of the eight TOP standards. We acknowledge that using journal standards as an index for openness is incomplete, not least because cultural differences between forensic and academic science have produced different values surrounding publishing. Still, as mentioned above, journals can be a major driver of reform and so it is useful to see what is happening among them. Table 1. Journal Impact factor TOP signatory? Open access? TOP citations TOP data TOP code Am. J. of Forensic Medicine and Pathology .64 No Hybrid 0 0 0 Aus. J. of Forensic Medicine .94 No Hybrid 0/Enc 0/Enc 0 Environmental Forensics .68 No Hybrid 0 0/Enc 0 Forensic Chemistry Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International 1.974 Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International: Genetics 5.64 Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International: Synergy No Open 0/Enc 0/Enc 0/Enc Forensic Science Rev. 2.71 No Closed 0 0 0 Forensic Science, Medicine, and Pathology 2.03 Yes Hybrid 0 0/Enc 0 Forensic Toxicology 3.92 Yes Hybrid 0 0 0 Indian J. of Forensic Medicine and Toxicology .05 No Closed 0 0 0 Int. J. of Forensic Science & Pathology .342 No Hybrid 0 0 0 Int. J. of Legal Medicine 2.31 No Hybrid 0 0/Enc 0 J. of Forensic and Legal Medicine 1.10 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Medicine 0 No Hybrid 0 0 0 J. of Forensic Practice .59 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Radiology and Imaging .51 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Research .32 No Hybrid 0 0 0 J. of Forensic Science & Criminology No Hybrid 0 0 0 J. of Forensic Sciences 1.18 No Hybrid 0 0 0 J. of Forensic Toxicology & Pharmacology .25 No Hybrid 0 0 0 J. of Law Medicine and Ethics .99 No Hybrid 0 0 0 J. of Medical Toxicology and Clinical Forensic Medicine 0 No Hybrid 0 0 0 Legal Medicine 1.25 Yes Hybrid 0/Enc 0/Enc 0/Enc Medical Law Review 1.10 No Hybrid 0 0 0 Medicine, Science and the Law .58 No Hybrid 0 0 0 Rechtsmedizin (Legal Medicine) .64 No Hybrid 0 0 0 Regulatory Toxicology and Pharmacology 2.81 Yes Hybrid 0/Enc 0/Enc 0/Enc Romanian Journal of Legal Medicine .32 No Closed 0 0 0 Science & Justice 1.85 Yes Hybrid 0/Enc 0/Enc 0/Enc

Preregistration and Registered Reports One of the most important developments emerging from the open science movement is preregistration (ie prespecifying research choices in a way that cannot be changed after seeing the data). During preregistration, researchers specify their research plans prior to carrying out the research. Preregistration puts an emphasis on making methodological and statistical decisions ahead of time: calculating sample sizes, determining data exclusion and stopping rules, making predictions and hypotheses, and establishing data analysis plans (ie which analyses will be performed to test each hypothesis?). Once submitted to an online platform, such as the OSF, or AsPredicted, preregistrations are time-stamped and uneditable. Preregistration is required in some areas of clinical medical research. In other fields, it is becoming increasingly popular: in 2012, there were merely 38 preregistrations on the OSF repository, a number that has grown to over 12,000 in 2017. Preregistration helps limit ‘over-interpretation of noise’ by making any data-contingent analytic choices salient. In other words, it becomes more difficult to engage in QRPs because researchers can no longer selectively exclude data and measures that run counter to their hypothesis, tactics that would give their findings a superficial glean of credibility. When there are no preset decision rules, it is easy for even highly-trained academic scientists to convince themselves that they would have made the choice regardless of how the data looked: Once we obtain an unexpected result, we are likely to reconstruct our histories and perceive the outcome as something that we could have, even did, anticipate all along—converting a discovery into a confirmatory result. And even if we resist those reasoning biases in the moment, after a few months, we might simply forget the details, whether we had hypothesized the moderator, had good justification for one set of exclusion criteria compared with another, and had really thought that the one dependent variable that showed a significant effect was the key outcome. Preregistration can also help address publication bias, especially with respect to the failure to publish negative findings or those that do not support a particular research agenda. Indeed, a 2018 study found increased reporting of null (ie negative) findings associated with the rise of preregistration. Within forensic science, our search did not uncover any peer-reviewed journal that encourages (or even expressly mentions) preregistration. That is not to say, however, that it has been altogether ignored. The PCAST Report stated that validation studies should be preregistered (although the studies they relied on were not preregistered): ‘The study design and analysis framework should be specified in advance. In validation studies, it is inappropriate to modify the protocol afterwards based on the results’. We concur with the PCAST Report. In fact, preregistration may be even more important in forensic scientific validation research. There are many analytic choices validation researchers can make that bias their findings, such as excluding apparent outliers (eg examiners who performed very poorly) and selectively reporting the responses for certain subsets of stimuli. Moreover, the practices those are girded by this validation research impact the criminal justice system and regularly serve as inculpatory evidence in courtrooms. Effectively invisible choices that artificially lower reported error rates are immune from cross-examination and judicial gatekeeping. Preregistration would, at least, contribute to making some of these choices open to scrutiny by academics, advocates, and other stakeholders in the criminal justice process. Given academic science's struggle with publication bias, we suspect the forensic scientific literature may also include a great many undisclosed studies that did not work out the way researchers hoped. By way of (anecdotal) example, the history of forensic bitemark identification is riddled with stories of studies conducted behind closed doors. Insider accounts are helpful in determining the results of these studies, but preregistered designs would be much more effective. Here, researchers’ motivations may be problematic in both the mainstream sciences and forensic science. Whereas mainstream scientists are motivated to accrue publications and citations by submitting exciting new findings (and not disclosing studies casting doubt on those findings), forensic scientists may be reluctant to publish results that cast doubt on their field. Similar to preregistration, registered reports are a format of empirical article where the introduction, hypotheses, procedures, and analysis plans are submitted to a journal and peer-reviewed prior to data collection. Peer-review of the research plan prior to data collection means that necessary revisions can be made before any resources are expended. The article is then either rejected or receives an in-principle acceptance (ie publication is virtually guaranteed if the researchers follow the plan). One of the main benefits of registered reports is that the publication decision is based on the rigor of the methods rather than the outcome, thus curbing publication bias. Registered reports are also often used for replication research. Since the introduction of registered reports in the journal Cortex in 2013, a total of 126 journals spanning a wide range of scientific disciplines now accept registered reports as a publication format. As with preregistrations, we did not find any forensic scientific journal that expressly mentioned registered reports or replications. This is unfortunate because these reforms could be particularly useful in forensic science. A greater focus on methodology versus outcome may nudge forensic scientists towards more careful research design, creating an iterative process that improves the standards in the field. Further, replication research would assist in assuring that latent experimenter effects are not biasing the existing literature.

Open Data, Materials, and Code Making data, research materials, and code (eg algorithms performing the statistical analysis or simulation related to a study) open and publicly accessible is central to the open science movement. Sharing these aspects of the research process allows other researchers to confirm prior findings and detect potential error (or fabrication) in that work. Data sharing also enables researchers to combine existing data into larger datasets to perform meta-analyses and tackle novel research questions (see Part IV). Despite these benefits, data has not traditionally been open. An analysis of 500 articles in 50 eminent scientific journals found that only 9% of articles had full raw data available online, despite many of the journals having policies related to open data. Troublingly, in 2005, when a group of researchers emailed the authors of 141 empirical articles published in the previous year to obtain raw data, 73% of the original authors were unwilling to share their data with their peers. Researchers with generally weaker results were less likely to respond to these emails. Like with preregistration, journals can promote open data, materials, and code. As we noted above, several TOP standards cover these aspects of the research process. By way of example, a TOP signatory journal, Science, recently updated its editorial policy to require authors to make their data available, subject to ‘truly exceptional circumstances’. Attitudes among researchers seem to be tracking these updated editorial policies. The 2017 State of Open Data Report found that awareness of open data sets and researchers’ willingness to use open datasets were positively trending. Increases in open data may also be due to better infrastructure. For example, the OSF allows researchers to upload materials, datasets, and code organized under the same project with a persistent Digital Object Identifier (DOI). Other popular cross-disciplinary open data repositories include Figshare, Zenodo, and the Harvard Science Framework. Further, Google recently launched a new initiative, Dataset Search, to help researchers find open data. This works similarly to Google Scholar as it accesses datasets from publisher's websites, personal websites, and institutional repositories. We were encouraged to see that, unlike with preregistration, forensic scientific journals appear somewhat concerned with transparency of data and code (see Table ). Our findings show that 15 of 30 journals encouraged data transparency and 11 encouraged code transparency. Still, they remain at TOP level 0 (ie mere encouragement) on this standard (and have not formally adopted TOP). As with preregistration, we believe that opening the research process will benefit forensic science in the long run: sharing of data and materials provides much efficiency and promotes error correction. Furthermore, from a criminal justice perspective, we would question the fairness of asking the criminally accused to simply trust the closed forensic scientific literature knowing what has occurred in the mainstream sciences. Still, openness itself may present significant legal issues (eg privacy).

Open Access Journals Finally, open access to journal articles has been a contentious issue for decades, inspiring some of the first discussions about open science. Typically, published articles are only available to those with (costly) subscriptions. However, there is now a trend towards making articles open access, either through fully open access journals or hybrid journals which charge authors a fee to make their article open access, if they wish. There is much variation in open access among disciplines, with the life and biomedical sciences embracing open access and several fields such as the social sciences and professional fields lagging behind. In addition to allowing greater public access to science, research has demonstrated that articles in open access journals are more likely to be downloaded and cited. Free servers also exist to allow researchers to post preprints of their research (eg LawArXiv in law and PsyArXiv in psychology). We are not aware of a preprint service dedicated to forensic science. Forensic scientific journals generally provide open access options to authors (see Table ). We only found three journals with no open access option at all. One new journal with an open focus provides only the option of open access publishing. Open access publishing in forensic science is incredibly important. Many stakeholders in the criminal justice system cannot be expected to have access to academic subscriptions (and this likely explains why so many forensic journals have open access options). This includes defense lawyers, accused parties, and forensic scientists themselves (who often are not affiliated with a university). An important issue going forward will be keeping author publishing charges manageable, especially given the limited grant funding available to forensic science researchers.

Science in Crisis

Concerns about the number of scientific findings that may be false or exaggerated have percolated for years, but have reached a fever pitch in the past decade or so. For instance, in 2016, the journal Nature asked approximately 1500 scientists whether science was experiencing a reproducibility ‘crisis'. The researchers found that 52% of those surveyed believed there was a significant crisis, 38% believed there was a slight crisis, and only 3% thought there was no crisis.

Note that there is some ambiguity and inconsistency in the literature in the use of the terms ‘reproducibility’ and ‘replicability’. For the purposes of this article, we will define replication as repeating a study exactly with new data to determine if the same result is achieved. We will refer to reproduction as repeating the analysis used by an existing study on its own data to see if the results are the same. Both replicability and reproducibility are thwarted when published reports do not provide enough information about how the study was conducted or provide the raw data for re-analysis. But, more than studies not providing enough information to replicate them or reproduce their findings—in many cases, when replication attempts have been conducted, the results have contradicted the original findings.

Indeed, The Nature survey was released in the wake of several large-scale failures of replication. For example, in social science, one of the largest efforts to date attempted to replicate 100 studies published in three of psychology's top journals. The researchers found the same result with the same level of statistical certainty in approximately one third of the studies and generally found considerably smaller effect sizes. In 2018, another collaboration of researchers attempted to replicate the findings of 21 social scientific studies published in Nature and Science. They found the originally reported effect in 13 of the attempts and, overall, effect sizes were about 50% smaller than in the original studies.

In medicine, one influential review of preclinical research found reports of irreproducibility and irreplicability ranging from 89%–51%. Using a conservative estimate, this corresponds with $28B of lost research funds. Findings like these led Francis Collins (Director of the U.S. National Institutes of Health) and Lawrence Tabak to state that the checks and balances of science have been ‘hobbled’. Troublingly, human trials have also not escaped criticism. Researchers in the UK, for instance, found in 2018 that approximately 50% of studies were in contravention of EU laws requiring reporting of results (similar laws exist in the US). And, studies replicating clinical studies also often show contradictory results or significantly smaller effects.

While we have focused on social science and medicine, that is largely because these fields have been unusually proactive in examining their own practices and admitting deficiencies. Indeed, similar problems have been reported in a variety of fields. In Neuroscientific research, for instance, many of the correlations reported between brain activation and behavior or personality measures are far higher than is statistically possible. Further, an analysis of over 3000 papers in the cognitive neuroscience field were underpowered to detect true effects, suggesting a false discovery rate of over 50% across the discipline.

Contributors to the Crisis

Critically, the controversies recounted above have been followed by a raft of metascientific research aimed at determining why so many studies are proving difficult to replicate and reproduce. Much of this work builds on historic concerns about the research process, lending such concerns support from modern quantitative methods. We will now briefly review a selection of the culprits identified by recent metascientific study: ‘questionable research practices’ (QRPs), ‘publication bias’, ‘spin’, lack of replication, small sample sizes, and overreliance on simplistic statistical methods.

QRPs exploit flexibility in research methods and reporting practices to make a researcher's results seem more persuasive than they actually are. Such practices include deciding whether to exclude observations after looking at how this would affect the overall results, measuring a phenomenon several different ways but only disclosing those measures that support the hypothesis and strategically stopping data collection when results reach some level of statistical confidence. These tactics rest in a gray area of scientific practice that many once viewed as defensible, trivial, or even normative.

Views that would minimize the harmful impact of QRPs are now demonstrably untenable. In a widely influential paper, Joseph Simmons and colleagues employed a quantitative simulation to demonstrate that the use of QRPs increased the actual false positive rate of a literature well-beyond its reported false positive rate. Use of four QRPs increased a notionally 5% false positive rate to approximately 60%. Troublingly, metascientific research has also found that QRPs are, in fact, widely used. Anonymous surveys in psychology, ecology, and evolutionary biology find self-reported usage of QRPs ranging from approximately 3% to 60%, depending on the QRP in question. Note, however, that recent empirical work in psychology suggests that initial estimates of QRP use based on survey studies are inflated.

QRPs conspire with ‘publication bias’ and ‘spin’ to provide, in some cases, a deeply misleading view of a research literature. The term publication bias refers to published scientific literature containing systematic biases as to what types of articles are published. This includes the empirically-founded observation that studies that find no effect (eg a drug had no discernable impact on a disease) tend to be published much less frequently than studies that find some effect (ie the null studies languish in file drawers, hence the colloquialism, the ‘file-drawer effect’). This can contribute to strategic research patterns. Researchers may, for instance, employ the tactic of performing several underpowered studies (ie they collect few observations) that vary in small ways, and only reporting those studies that find positive effects.

Another instance of publication bias is a preference for novel results. Journals typically prefer to publish articles that purport to show some heretofore undiscovered phenomenon, rather than studies that simply attempt to replicate previous studies. This is problematic because, as mentioned above, replication lends credibility to previous research and can help uncover spurious findings.

Even if publication bias is overcome, published negative results are cited less frequently (ie citation bias). Further, published reports may be ‘spun’. In other words, the report may suggest that some positive effect exists by emphasizing positive findings and deemphasizing negative ones (although both findings can be found by a careful reader, distinguishing this from some QRPs that would suppress the negative finding altogether).

Y.A. de Vries and colleagues recently collected and analysed 105 trials of antidepressant medication (see Figure ). They found that a confluence of QRPs, publication bias, citation bias, and spin produced a deeply misleading portrait of a body of knowledge. In the literature they researched, it appeared (to a reader attending to the primary message of the published reports) that only a small portion of the studies found a negative effect: the drug appeared effective. However, once the researchers unearthed the unpublished studies and waded through spin and citation bias, half of the studies were actually negative. Importantly, de Vries and colleagues could only do this because clinical trials are regulated such that it possible to find true number studies that are being performed (ie they are registered, see below). As we will see below, the same cannot be said for forensic science.

Finally, small sample sizes and overly simplistic statistical methods have contributed to the replicability crisis. During the opening salvos of the crisis, John Ioannidis famously predicted that over half of published findings were false (based on a theoretical model) because, among other reasons, studies typically do not use large enough samples to find the effects they are looking for (ie they are underpowered). Moreover, the commonly used statistical method of null hypothesis significance testing (NHST) is often applied with little thought. QRPs, as mentioned above, can render the results of NHST misleading by producing false positive rates that underestimate the true false positive rate. And unlike other statistical methods (eg Bayesian), NHST does not take into account the a priori likelihood of the hypothesis.

The Open Science Response (and Forensic Science's Place Within it)

The scientific community is rapidly adopting transparency-related reforms as a way to improve science. The NASEM Report, for instance, strongly endorsed many open science reforms: ‘open science strengthens the self-correcting mechanisms inherent in the research enterprise’. In this section, we review some these reforms. It is important to note, however, that open science should not be construed as a panacea for all flaws and inefficiencies in the scientific process. In both the mainstream sciences and forensic science, change depends on the concerted efforts of: (1) oversight and funding bodies; (2) journal editors and publishers; and, (3) the researchers themselves. We will review the roles of these stakeholders before delving into the specifics of open science reform in the forensic sciences.

Drivers of Change in Mainstream and Forensic Science

As to leadership, the National Institute of Standards and Technology (NIST) may be best placed to guide the move to open forensic science. Indeed, both the NASEM Report and the PCAST Report identified NIST as a crucial leader in the movements they described. Within the forensic sciences, NIST’s tasks may include periodically reviewing the state of foundational validity in various disciplines, advising on the design and execution of validation studies, creating and disseminating datasets, and providing grant support. These jobs may be guided by NIST’s broader role as a leader in the open science movement as it encourages open practices and, in some cases, makes funding contingent on them. Further, the National Science Foundation—active in funding both the mainstream and forensic sciences—is already requiring that researchers engage in open practices.

Journal editors and publishers will also be instrumental in the transition to open forensic science, as they are in the mainstream sciences. This will especially be so if forensic science moves in the direction urged by the PCAST Report, with forensic scientists adopting an increased interest in publishing their work in reputable journals. Currently, one of the most influential models for openness in peer-review and publishing is the Transparency and Openness Promotion (TOP) guidelines.

The TOP guidelines, first published in Science in 2015, are a standard set of guidelines for transparency and reproducibility practices across journals. They are comprised of eight standards: citation (eg citing data, materials, and code), data transparency (eg posting data to an online database), analytic methods (code) transparency, research materials transparency (eg surveys and stimuli), study preregistration, analysis plan preregistration (see our discussion of preregistration below), and replication. The TOP Committee defined three levels for each standard which range from journals simply encouraging the standards (level 0), to requiring and verifying that the articles have met each standard (3). The Center for Open Science provides tools and guidance for organizations wishing to implement TOP and keeps a list of those which have agreed to consider the TOP guidelines (signatories) and those which have implemented them at some level. As of March 2019, over 5000 journals and organizations are signatories and 1000 have implemented them. A number of journals have also adopted a badge system to acknowledge papers that are preregistered and have open data and open materials. This initiative is promising, with open data in a leading psychological journal increasing from 3%–23% after implementation of the badge system.

Beyond government organizations and publishers, adoption of open science in forensics will depend on individual researchers and practitioners. This is already beginning. Among practitioners, the Netherlands Forensics Institute (NFI) is adopting strong transparency reforms with respect to any quality-control related issues in their labs. Further, forensics researchers may use online tools like the Open Science Framework (OSF). The OSF is a free online platform for open science where any researcher (with an academic affiliation or not) can create a webpage for a research project and use that to share data, analysis, and materials. It also contains several tools for collaboration. An example (albeit a rare one) of the use of the OSF in forensic science research is a recent Australian state police-federal government funded collaboration between a university cognitive science lab, which has adopted open science reforms, and several local police services.

Through the remainder of this Part, it may be instructive to compare specific openness-related reforms in the mainstream sciences to the state of openness in forensic scientific research (Table ). To that end, we performed some preliminary research about openness in forensic science by reviewing the policies of forensic scientific journals (see Appendix A for a description of our search). We identified 30 forensic science journals and recorded whether they were open access, a TOP signatory, and adopted any of the eight TOP standards. We acknowledge that using journal standards as an index for openness is incomplete, not least because cultural differences between forensic and academic science have produced different values surrounding publishing. Still, as mentioned above, journals can be a major driver of reform and so it is useful to see what is happening among them.

Table 1. Journal Impact factor TOP signatory? Open access? TOP citations TOP data TOP code Am. J. of Forensic Medicine and Pathology .64 No Hybrid 0 0 0 Aus. J. of Forensic Medicine .94 No Hybrid 0/Enc 0/Enc 0 Environmental Forensics .68 No Hybrid 0 0/Enc 0 Forensic Chemistry Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International 1.974 Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International: Genetics 5.64 Yes Hybrid 0/Enc 0/Enc 0/Enc Forensic Science International: Synergy No Open 0/Enc 0/Enc 0/Enc Forensic Science Rev. 2.71 No Closed 0 0 0 Forensic Science, Medicine, and Pathology 2.03 Yes Hybrid 0 0/Enc 0 Forensic Toxicology 3.92 Yes Hybrid 0 0 0 Indian J. of Forensic Medicine and Toxicology .05 No Closed 0 0 0 Int. J. of Forensic Science & Pathology .342 No Hybrid 0 0 0 Int. J. of Legal Medicine 2.31 No Hybrid 0 0/Enc 0 J. of Forensic and Legal Medicine 1.10 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Medicine 0 No Hybrid 0 0 0 J. of Forensic Practice .59 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Radiology and Imaging .51 Yes Hybrid 0/Enc 0/Enc 0/Enc J. of Forensic Research .32 No Hybrid 0 0 0 J. of Forensic Science & Criminology No Hybrid 0 0 0 J. of Forensic Sciences 1.18 No Hybrid 0 0 0 J. of Forensic Toxicology & Pharmacology .25 No Hybrid 0 0 0 J. of Law Medicine and Ethics .99 No Hybrid 0 0 0 J. of Medical Toxicology and Clinical Forensic Medicine 0 No Hybrid 0 0 0 Legal Medicine 1.25 Yes Hybrid 0/Enc 0/Enc 0/Enc Medical Law Review 1.10 No Hybrid 0 0 0 Medicine, Science and the Law .58 No Hybrid 0 0 0 Rechtsmedizin (Legal Medicine) .64 No Hybrid 0 0 0 Regulatory Toxicology and Pharmacology 2.81 Yes Hybrid 0/Enc 0/Enc 0/Enc Romanian Journal of Legal Medicine .32 No Closed 0 0 0 Science & Justice 1.85 Yes Hybrid 0/Enc 0/Enc 0/Enc

Preregistration and Registered Reports

One of the most important developments emerging from the open science movement is preregistration (ie prespecifying research choices in a way that cannot be changed after seeing the data). During preregistration, researchers specify their research plans prior to carrying out the research. Preregistration puts an emphasis on making methodological and statistical decisions ahead of time: calculating sample sizes, determining data exclusion and stopping rules, making predictions and hypotheses, and establishing data analysis plans (ie which analyses will be performed to test each hypothesis?). Once submitted to an online platform, such as the OSF, or AsPredicted, preregistrations are time-stamped and uneditable. Preregistration is required in some areas of clinical medical research. In other fields, it is becoming increasingly popular: in 2012, there were merely 38 preregistrations on the OSF repository, a number that has grown to over 12,000 in 2017.

Preregistration helps limit ‘over-interpretation of noise’ by making any data-contingent analytic choices salient. In other words, it becomes more difficult to engage in QRPs because researchers can no longer selectively exclude data and measures that run counter to their hypothesis, tactics that would give their findings a superficial glean of credibility. When there are no preset decision rules, it is easy for even highly-trained academic scientists to convince themselves that they would have made the choice regardless of how the data looked:

Once we obtain an unexpected result, we are likely to reconstruct our histories and perceive the outcome as something that we could have, even did, anticipate all along—converting a discovery into a confirmatory result. And even if we resist those reasoning biases in the moment, after a few months, we might simply forget the details, whether we had hypothesized the moderator, had good justification for one set of exclusion criteria compared with another, and had really thought that the one dependent variable that showed a significant effect was the key outcome.

Preregistration can also help address publication bias, especially with respect to the failure to publish negative findings or those that do not support a particular research agenda. Indeed, a 2018 study found increased reporting of null (ie negative) findings associated with the rise of preregistration.

Within forensic science, our search did not uncover any peer-reviewed journal that encourages (or even expressly mentions) preregistration. That is not to say, however, that it has been altogether ignored. The PCAST Report stated that validation studies should be preregistered (although the studies they relied on were not preregistered): ‘The study design and analysis framework should be specified in advance. In validation studies, it is inappropriate to modify the protocol afterwards based on the results’.

We concur with the PCAST Report. In fact, preregistration may be even more important in forensic scientific validation research. There are many analytic choices validation researchers can make that bias their findings, such as excluding apparent outliers (eg examiners who performed very poorly) and selectively reporting the responses for certain subsets of stimuli. Moreover, the practices those are girded by this validation research impact the criminal justice system and regularly serve as inculpatory evidence in courtrooms. Effectively invisible choices that artificially lower reported error rates are immune from cross-examination and judicial gatekeeping. Preregistration would, at least, contribute to making some of these choices open to scrutiny by academics, advocates, and other stakeholders in the criminal justice process.

Given academic science's struggle with publication bias, we suspect the forensic scientific literature may also include a great many undisclosed studies that did not work out the way researchers hoped. By way of (anecdotal) example, the history of forensic bitemark identification is riddled with stories of studies conducted behind closed doors. Insider accounts are helpful in determining the results of these studies, but preregistered designs would be much more effective. Here, researchers’ motivations may be problematic in both the mainstream sciences and forensic science. Whereas mainstream scientists are motivated to accrue publications and citations by submitting exciting new findings (and not disclosing studies casting doubt on those findings), forensic scientists may be reluctant to publish results that cast doubt on their field.

Similar to preregistration, registered reports are a format of empirical article where the introduction, hypotheses, procedures, and analysis plans are submitted to a journal and peer-reviewed prior to data collection. Peer-review of the research plan prior to data collection means that necessary revisions can be made before any resources are expended. The article is then either rejected or receives an in-principle acceptance (ie publication is virtually guaranteed if the researchers follow the plan). One of the main benefits of registered reports is that the publication decision is based on the rigor of the methods rather than the outcome, thus curbing publication bias. Registered reports are also often used for replication research. Since the introduction of registered reports in the journal Cortex in 2013, a total of 126 journals spanning a wide range of scientific disciplines now accept registered reports as a publication format.

As with preregistrations, we did not find any forensic scientific journal that expressly mentioned registered reports or replications. This is unfortunate because these reforms could be particularly useful in forensic science. A greater focus on methodology versus outcome may nudge forensic scientists towards more careful research design, creating an iterative process that improves the standards in the field. Further, replication research would assist in assuring that latent experimenter effects are not biasing the existing literature.

Open Data, Materials, and Code

Making data, research materials, and code (eg algorithms performing the statistical analysis or simulation related to a study) open and publicly accessible is central to the open science movement. Sharing these aspects of the research process allows other researchers to confirm prior findings and detect potential error (or fabrication) in that work. Data sharing also enables researchers to combine existing data into larger datasets to perform meta-analyses and tackle novel research questions (see Part IV).

Despite these benefits, data has not traditionally been open. An analysis of 500 articles in 50 eminent scientific journals found that only 9% of articles had full raw data available online, despite many of the journals having policies related to open data. Troublingly, in 2005, when a group of researchers emailed the authors of 141 empirical articles published in the previous year to obtain raw data, 73% of the original authors were unwilling to share their data with their peers. Researchers with generally weaker results were less likely to respond to these emails.

Like with preregistration, journals can promote open data, materials, and code. As we noted above, several TOP standards cover these aspects of the research process. By way of example, a TOP signatory journal, Science, recently updated its editorial policy to require authors to make their data available, subject to ‘truly exceptional circumstances’. Attitudes among researchers seem to be tracking these updated editorial policies. The 2017 State of Open Data Report found that awareness of open data sets and researchers’ willingness to use open datasets were positively trending.

Increases in open data may also be due to better infrastructure. For example, the OSF allows researchers to upload materials, datasets, and code organized under the same project with a persistent Digital Object Identifier (DOI). Other popular cross-disciplinary open data repositories include Figshare, Zenodo, and the Harvard Science Framework. Further, Google recently launched a new initiative, Dataset Search, to help researchers find open data. This works similarly to Google Scholar as it accesses datasets from publisher's websites, personal websites, and institutional repositories.

We were encouraged to see that, unlike with preregistration, forensic scientific journals appear somewhat concerned with transparency of data and code (see Table ). Our findings show that 15 of 30 journals encouraged data transparency and 11 encouraged code transparency. Still, they remain at TOP level 0 (ie mere encouragement) on this standard (and have not formally adopted TOP). As with preregistration, we believe that opening the research process will benefit forensic science in the long run: sharing of data and materials provides much efficiency and promotes error correction. Furthermore, from a criminal justice perspective, we would question the fairness of asking the criminally accused to simply trust the closed forensic scientific literature knowing what has occurred in the mainstream sciences. Still, openness itself may present significant legal issues (eg privacy).

Open Access Journals

Finally, open access to journal articles has been a contentious issue for decades, inspiring some of the first discussions about open science. Typically, published articles are only available to those with (costly) subscriptions. However, there is now a trend towards making articles open access, either through fully open access journals or hybrid journals which charge authors a fee to make their article open access, if they wish. There is much variation in open access among disciplines, with the life and biomedical sciences embracing open access and several fields such as the social sciences and professional fields lagging behind. In addition to allowing greater public access to science, research has demonstrated that articles in open access journals are more likely to be downloaded and cited. Free servers also exist to allow researchers to post preprints of their research (eg LawArXiv in law and PsyArXiv in psychology). We are not aware of a preprint service dedicated to forensic science.

Forensic scientific journals generally provide open access options to authors (see Table ). We only found three journals with no open access option at all. One new journal with an open focus provides only the option of open access publishing. Open access publishing in forensic science is incredibly important. Many stakeholders in the criminal justice system cannot be expected to have access to academic subscriptions (and this likely explains why so many forensic journals have open access options). This includes defense lawyers, accused parties, and forensic scientists themselves (who often are not affiliated with a university). An important issue going forward will be keeping author publishing charges manageable, especially given the limited grant funding available to forensic science researchers.

PART IV. OPEN FORENSIC SCIENCE

Open scientific reform offers several distinctive advantages to forensic science, a field that endeavors to see justice done while avoiding error. In this section, we will survey three general ways in which openness can improve forensic science: establishing the validity of existing methods, developing new objective methods, and applying those methods in a trustworthy way (see Table ). We will end with a discussion of the barriers these reforms will face.

Table 2. Forensic science goal Recommended initiatives Benefits Foundational validity Preregistration Controlling questionable research practices (QRPs), reducing experimenter bias, reducing false positive results Registered reports Controlling QRPs, reducing experimenter bias, reducing publication bias, reducing false positive results Replication Reducing false positive results, reducing publication bias, reducing experimenter bias Multi-center collaborative studies (eg Many Labs) Promoting collaboration, isolating setting and experimenter effects, reducing type M errors Establishing ForensicsArXiv server Reducing publication bias, faster dissemination of results, research available to legal actors and forensic practitioners Objective methods Large, open source databases Promoting collaboration, large ground truth stimuli set, ability to test examiners and algorithms using ground truth stimuli Applied validity Preregistering analytic choices Controlling and revealing unconscious biases, accountability Open workflow and analysis Controlling and revealing unconscious biases, accountability Open proficiency testing and error repositories Accurate error measurements, accountability

Establishing Foundational Validity Through ‘Many Labs’ An immediate and fundamental challenge facing forensic science is establishing the validity of many of its methodologies. For subjective methods, which many forensic practices still are, the PCAST Report recommended large-scale ‘black-box’ studies of performance in situations in which the ground truth is known. In other words, we cannot know what is going on in the black-box of the examiner's brain. We can, however, infer that those subjective processes are working as expected if we expose many examiners to many samples that come from known sources, and measure how often they come to the correct answer. As we discussed above, this type of research has been surprisingly uncommon, in part, because it is resource-demanding. An amalgam of preregistration, registered reports, and replication—increasingly used in psychological research—may provide a paradigm for forensic science to follow in its validation efforts. Psychology, a relatively early-embracer of open science reforms, shares many of forensic science's struggles. Like the measurement of subjective forensic expertise, psychology often seeks to measure qualia. This poses many challenges, including the fact that individuals, unlike chemicals and atoms, vary in difficult-to-predict ways. False positive and negative results may therefore result from sampling variation and measurement error. To overcome the inherent challenges in measuring subjective processes in psychology, some researchers are relying on multi-center collaborative studies that have historically been used in some medical and genetic association research. One successful model is the ‘Many Labs’ replication projects (see also the Pipeline Project, the Psychological Science Accelerator, the Collaborative Replications and Education Project, and Study Swap). In these studies, the project leads begin by identifying a controversial or highly cited finding and seeking collaborators on the OSF or through their existing networks. The group may then consult with stakeholders like the party that initially discovered the contested finding and eventually agree on and preregister a protocol. The individual labs then recruit participants and run the protocol, each producing results that can be both pooled between labs and analysed individually or by a third party. Many Labs style projects offer a host of benefits. As we discussed above, preregistration is important in controlling QRPs and publication bias. Replication across labs also helps to isolate effects related to the setting of the study (eg whether examiners trained in a particular lab outperformed others) and any latent experimenter effects. Importantly, the large sample sizes provided by Many Labs projects contribute to control of ‘Type M’ errors, or errors related to estimating the magnitude of a study's effect. As influential statisticians have noted, the mainstream sciences have regularly been concerned with false positives and negatives, often overlooking Type M errors. Recall the large-scale 2018 effort to reproduce the outcomes of 21 studies published in Nature and Science mentioned above. The researchers in that study found that effect sizes were 50% smaller than in the original studies—considerable Type M error. Type M errors are especially important in the foundational forensic literature because courts require precise estimates of a method's error rate to ascertain its probative value. For example, research is converging to demonstrate that expert fingerprint examiners considerably outperform laypeople in identifying the source of a fingerprint. There is still, however, considerable variance in the estimates of their error (eg one false positive in 24 judgments to one in 604). Factfinders ought to be provided with accurate estimates, which large collaborative projects can help provide. Moreover, as we have noted, independent replication is central to the scientific process. Despite this principle, the PCAST Report declared fingerprint analysis to be foundationally valid on the basis of only two studies (both performed by law enforcement agencies). Adopting a Many Labs approach in forensic science may lend confidence to the Report's conclusion. In projects like Many Labs, open science reformists note the importance of independent methodological support. Such a mechanism may be especially useful in forensic science, in which the quality of methodological training has often been unevenly distributed among practitioners, researchers, and those in hybrid roles. Here, NIST may play a role similar to the successful experience found in the case of the establishment of the Independent Statistical Standing Committee (ISSC) by researchers of Huntington's disease. Members of the ISSC have strong methodological training and, importantly, no interest in the outcome of research into the disease's treatment. The Committee's role has been expanded since its establishment.

Transforming Subjective Into Objective Methods Beyond validating subjective methods, forensic science is also moving towards developing and validating objective methods. Great strides, for instance, have been made using automated image analysis to perform fingerprint identification. Going forward, the most important resource this initiative needs is access to ‘huge databases containing known prints’. Similarly, the development of objective methods to associate ammunition with a specific firearm (ie toolmark analysis) and analysis of complex DNA mixtures is similarly hampered by lack of a sufficiently large database. The PCAST Report lamented the fact that the FBI has not opened many of its databases, including those with no privacy concerns (eg toolmarks). Despite some hesitation, some programs—founded in open scientific principles—are already underway to develop objective methods using open databases. For example, the PROVEDIt Initiative has made available 25,000 DNA profiles from mixed sources that can be used to validate DNA analysis software. Similarly, an industry partnership between the University of New South Wales and the Australian Passport Office is crowdsourcing ground truth facial images to test the accuracy of facial recognition algorithms through the #Selfies4Science program. These are all promising developments, but they could be augmented by grassroots sharing of materials by individual laboratories through systems like the OSF. The collaboration behind #Selfies4Science, for instance, has not made their database available to other researchers.

Improving Applied Validity Through Openness and Transparency Most of the reforms we have discussed so far involve conducting research more transparently. In forensic science, however, there is also the matter of putting that research into practice. As the PCAST Report said, forensic scientific disciplines must be both foundationally valid and applied in a valid way. As to applied validity, open science reforms and initiatives are less directly applicable. But still, some open principles and techniques can be applied to forensic scientific practice. We will discuss three: (1) transparently reporting forensic analytic choices; (2) open forensic workflow and analysis; and, (3) open proficiency testing and error repositories. Central to all of our suggestions is transparency and removing some discretion in what practitioners report about their process. As we have seen, even well-trained academic scientists have used flexibility in their methods to generate misleading results. We should be concerned about the same issues occurring in applied forensic scientific practice. First, consider employing greater transparency in forensic practice, particularly fingerprint analysis. As part of their methodology, fingerprint examiners determine which features or ‘minutiae’ of a latent print (ie one found at a crime scene) are distinctive and will thus be important during comparison. However, practitioners—after viewing the comparison (ie exemplar) print—can go back and alter those features they deemed important. This practice, if not fully documented—and it often is not—can be highly misleading and result in undisclosed confirmation bias and circular reasoning. We suggest that examiners be required to transparently document the features they predict will be diagnostic during the analysis stage. Langenburg and Champod have developed a color-coded system—the Green-Yellow-Red-Orange (GYRO) system—for fingerprint examiners to document their analytic choices. If an examiner is highly confident in the existence of that feature in the latent print and has a high expectation that the feature will be present in an exemplar print, the examiner will mark that feature with green. If the examiner has a medium or low level of confidence in that feature, they will mark it with yellow or red, respectively. Finally, the orange color represents features that were not identified during the initial analysis of the latent print, but were observed after viewing the exemplar print. Here we do not mean to constrain examiners. Indeed, there may be cases in which such re-analysis is beneficial: an examiner may have incorrectly discounted a genuine feature as being an artefact but, upon seeing the same feature in the exemplar print, may realize that it was indeed diagnostic. However, much like a strikethrough on incorrect case note documentation, the examiner should also document edits to his or her feature analysis. As in the mainstream sciences, it is important to be open and candid about the reality of the process and the serious opportunities for bias to creep into it. More transparent and open analytic choices flows into our second and more general point: forensic laboratories should operate on the principle of transparency. Several aspects of forensic practice involve more discretion and subjectivity than judges and jurors would reasonably expect, and more than is admitted in expert reports. For instance, there is considerable variation in practices between labs about whether examiners should verify the work of other examiners blind to the original decision, whether the first examiner can choose the verifying examiner, and if discussions are permitted or encouraged between these individuals. Additionally, thorough documentation should be conducted during verification and/or technical review procedures wherein analytical conclusions come into question. Discussions between analysts inevitably influences one or both analyst much like the aforementioned fingerprint example of editing initial results once a comparison exemplar is provided. Without proper and thorough documentation of consultation between forensic practitioners, the true nature of a peer reviewed result may not be apparent or available for future analysis. While some forensics labs are adopting very transparent protocols surrounding these decisions, such openness has not yet reached orthodoxy (or even come close to this). Indeed, in our (anecdotal) conversations with forensic examiners, several have expressed hesitation at disclosing aspects of their analysis that could convey a lack of certainty (eg that analyzing a certain latent print took longer than others). This also leads us to question their openness when they are questioned in court about verification, consultation and conflict procedures. To remedy these problems, we suggest that laboratories freely publish their standard operating procedures, as well as any analytical methodology utilized. Such steps have been taken by both the Houston Forensic Science Center (HFSC) and the Idaho State Police (ISP) who publish these data on their public websites. The HFSC and ISP both have public facing websites where the public can review analytical methods and accreditation information (eg the ISP posts staff CVs and the HFSC has plans to do so in the future). The HFSC also publishes standard operating procedures, calibration, instrumentation records, batch records, quality reports, and incident/preventative action/corrective action reports. Specifically, these data could be utilized by another party to re-analyze the evidence to see if the results are the same (however, they likely would have to use the same instrument for exact findings because some analytical results can be impacted by instrumentation). Moreover, sharing details of crime lab operating procedures may result in efficiencies as labs can learn from the successes and challenges of other labs. Pursuant with open science standards, the HFSC and ISP will eventually have to consider who will be the long-term stewards of this data. Third and finally, more objective measures of error do exist and are useful in ascertaining the probative value of forensic evidence. More attention should be paid towards making these measures both as open and as useful as they can be (goals that often converge). For example, forensic examiners take proficiency tests, which determine how well they can apply a particular technique. However, these proficiency tests are typically commercially obtained and thus not open to scrutiny from the broader scientific community. Moreover, they typically do not mimic routine casework and are therefore non-blind. Nonblind proficiency tests are problematic as forensic analysts may not behave in the same manner as for routine casework due to the knowledge of being tested, thereby skewing examiner error rate. The PCAST Report strongly recommended using proficiency testing, but urged testing services to publicly release the tests so that other scientists could determine if the testing is realistic. Beyond the individual practitioner, labwise error rates inform the value of forensic expertise and should be provided more openly. Eschewing a culture that denies the possibility of error, some labs are beginning to implement policies of radical transparency by publicly reporting mistakes (but labs appears more reluctant to report errors than to adopt other transparency-related reforms). The beginnings of change here in forensic science are analogous to previous practices in the mainstream sciences. The previous closed model of science would actively suppress studies that did not fit the experimenter's narrative. Now they are being reported in a move that, in our view, improves the credibility of science.

Barriers to Open Forensic Science While we have struck an optimistic tone in our analysis of open science's applicability to forensics, there are certainly substantial barriers to any vision of open forensic science (just as there are with open science generally). We believe the advantages of open science make addressing these barriers worthwhile. Still, the challenges in implementing many of the reforms we have described deserve careful consideration. One possible resistance point to embracing openness is the culture of forensic science, which tends to resist admitting errors. It will therefore be challenging to promote transparency about mistakes that to do inevitably happen and research that does not fit with the experimenter's hypothesis. One way to advance this aim may be through increased partnerships between academic scientific labs that have embraced open science reforms and forensic scientists. A step in this direction can be seen in the American Academy of Forensic Sciences Laboratories and Educators Alliance Program (LEAP), which aims to connect academia and forensic laboratories. Federal organizations in the US may also wish to fund similar joint projects. This may not just produce strong research, but also contribute to training and education in open research methods for forensic professionals. Secondly, transitioning to open research involves significant financial costs, at least in the beginning. For example, the NASEM Report anticipated challenges in shifting the current publishing system away from a subscription-based model. Most notably, and as we have seen in the forensic scientific journals (see the discussion of open access in Part III), author publishing fees can be substantial and possibly prohibitive for many researchers. Going forward, the NASEM Report anticipated that reasonable publishing costs may be incorporated into grant funding. Indeed, funders are beginning to acknowledge the importance of open scholarly communication and even require that applicants plan to make their work freely available. Agencies funding foundational forensic science research ought to be especially attuned to making the fruits of that labor open. Unlike academic scientists who typically have access to journal articles through their institution, forensic practitioners and lawyers often do not have this luxury. As we noted above, it is incredibly important that practitioners have access to studies providing foundational research and new insights. However, as technology improves and competition increases, we may also expect publishing fees to decrease. Interstitially, forensic scientists may wish to publish their work as preprints, perhaps through the development of a Forensic ArXiv server. Beyond publishing, forensic scientific researchers may find economies by leveraging platforms and programs already developed in the open science movement. As we have discussed, the OSF and more specific initiatives like Many Labs provide useful infrastructures for forensics to build on. From an economic standpoint, there is also an issue with companies claiming trade secret privilege over the workings of forensic scientific software. This issue is exacerbated when such technologies rest on machine learning algorithms that become a black-box because they have evolved beyond their original programming. Such software should be carefully validated. And if designers are not willing to disclose the program in sufficient detail, courts may wish to limit the admission of the results of such tests. Finally, perhaps the most challenging issues facing open forensic science are those concerning privacy and security. In the open science movement, these issues have provoked a great deal of discussion. For instance, the NASEM Report acknowledged that the interests of patient confidentiality and national security may provide good cause to limit the scope of open science in some cases. When it comes to opening forensic science, the exigency of privacy and security depends on the practice and context being considered. For example, sharing materials between research groups that are not associated with individuals (eg toolmarks) do not evoke obvious privacy concerns. They may, however, have security consequences by providing adversaries insight into investigative techniques. As to the labs applying such research and providing the public more transparency about their processes, they will have to think carefully about when to limit that information (and in some cases this is already occurring). On the other end of the spectrum to practices like toolmark analysis are practices that aim to identify individuals (eg fingerprints, DNA). These practices are not themselves uniform in the privacy and consent issues they raise. For instance, DNA diverges from fingerprints as, beyond providing identifying information, it also carries a great deal of personal genetic information about the individual and his or her family. Indeed, recent advances in mapping the human genome have resulted in considerable debate about protecting genetic information (ie genetic privacy). Note, however, that unlike more controversial research, current forensic DNA analysis practices do not rely on whole-genome sequencing. In fact, the field's current knowledge of DNA analysis and existing validation studies have been greatly aided by some level of open science, both through access to government databases and collaborations between researchers providing samples from local populations (and recall our above discussion of the PROVEDIt database for validating mixed source DNA analysis). Still, as technology improves, it will always be possible that identifying information will be (mis)used in ways that cannot be currently foreseen. Despite the risks, potential threats to privacy and security should not simply end the conversation about opening some forensic science practices. Rather, it should inspire thoughtful legal-scientific policy research seeking to progress science, while respecting privacy. In the case of open forensic science, some insights may come through conversations and collaborations between those wrestling with these issues in the open science domain. Useful models may be found in the thorough consent framework used by the Personal Genome Project and the Precision Medicine Initiative, in which volunteers share their genomic data and personal health data, respectively. It should be noted that these models are still in their infancy and remain controversial.

Establishing Foundational Validity Through ‘Many Labs’

An immediate and fundamental challenge facing forensic science is establishing the validity of many of its methodologies. For subjective methods, which many forensic practices still are, the PCAST Report recommended large-scale ‘black-box’ studies of performance in situations in which the ground truth is known. In other words, we cannot know what is going on in the black-box of the examiner's brain. We can, however, infer that those subjective processes are working as expected if we expose many examiners to many samples that come from known sources, and measure how often they come to the correct answer. As we discussed above, this type of research has been surprisingly uncommon, in part, because it is resource-demanding.

An amalgam of preregistration, registered reports, and replication—increasingly used in psychological research—may provide a paradigm for forensic science to follow in its validation efforts. Psychology, a relatively early-embracer of open science reforms, shares many of forensic science's struggles. Like the measurement of subjective forensic expertise, psychology often seeks to measure qualia. This poses many challenges, including the fact that individuals, unlike chemicals and atoms, vary in difficult-to-predict ways. False positive and negative results may therefore result from sampling variation and measurement error.

To overcome the inherent challenges in measuring subjective processes in psychology, some researchers are relying on multi-center collaborative studies that have historically been used in some medical and genetic association research. One successful model is the ‘Many Labs’ replication projects (see also the Pipeline Project, the Psychological Science Accelerator, the Collaborative Replications and Education Project, and Study Swap). In these studies, the project leads begin by identifying a controversial or highly cited finding and seeking collaborators on the OSF or through their existing networks. The group may then consult with stakeholders like the party that initially discovered the contested finding and eventually agree on and preregister a protocol. The individual labs then recruit participants and run the protocol, each producing results that can be both pooled between labs and analysed individually or by a third party.

Many Labs style projects offer a host of benefits. As we discussed above, preregistration is important in controlling QRPs and publication bias. Replication across labs also helps to isolate effects related to the setting of the study (eg whether examiners trained in a particular lab outperformed others) and any latent experimenter effects. Importantly, the large sample sizes provided by Many Labs projects contribute to control of ‘Type M’ errors, or errors related to estimating the magnitude of a study's effect. As influential statisticians have noted, the mainstream sciences have regularly been concerned with false positives and negatives, often overlooking Type M errors. Recall the large-scale 2018 effort to reproduce the outcomes of 21 studies published in Nature and Science mentioned above. The researchers in that study found that effect sizes were 50% smaller than in the original studies—considerable Type M error.

Type M errors are especially important in the foundational forensic literature because courts require precise estimates of a method's error rate to ascertain its probative value. For example, research is converging to demonstrate that expert fingerprint examiners considerably outperform laypeople in identifying the source of a fingerprint. There is still, however, considerable variance in the estimates of their error (eg one false positive in 24 judgments to one in 604). Factfinders ought to be provided with accurate estimates, which large collaborative projects can help provide. Moreover, as we have noted, independent replication is central to the scientific process. Despite this principle, the PCAST Report declared fingerprint analysis to be foundationally valid on the basis of only two studies (both performed by law enforcement agencies). Adopting a Many Labs approach in forensic science may lend confidence to the Report's conclusion.

In projects like Many Labs, open science reformists note the importance of independent methodological support. Such a mechanism may be especially useful in forensic science, in which the quality of methodological training has often been unevenly distributed among practitioners, researchers, and those in hybrid roles. Here, NIST may play a role similar to the successful experience found in the case of the establishment of the Independent Statistical Standing Committee (ISSC) by researchers of Huntington's disease. Members of the ISSC have strong methodological training and, importantly, no interest in the outcome of research into the disease's treatment. The Committee's role has been expanded since its establishment.

Transforming Subjective Into Objective Methods

Beyond validating subjective methods, forensic science is also moving towards developing and validating objective methods. Great strides, for instance, have been made using automated image analysis to perform fingerprint identification. Going forward, the most important resource this initiative needs is access to ‘huge databases containing known prints’. Similarly, the development of objective methods to associate ammunition with a specific firearm (ie toolmark analysis) and analysis of complex DNA mixtures is similarly hampered by lack of a sufficiently large database. The PCAST Report lamented the fact that the FBI has not opened many of its databases, including those with no privacy concerns (eg toolmarks).

Despite some hesitation, some programs—founded in open scientific principles—are already underway to develop objective methods using open databases. For example, the PROVEDIt Initiative has made available 25,000 DNA profiles from mixed sources that can be used to validate DNA analysis software. Similarly, an industry partnership between the University of New South Wales and the Australian Passport Office is crowdsourcing ground truth facial images to test the accuracy of facial recognition algorithms through the #Selfies4Science program. These are all promising developments, but they could be augmented by grassroots sharing of materials by individual laboratories through systems like the OSF. The collaboration behind #Selfies4Science, for instance, has not made their database available to other researchers.

Improving Applied Validity Through Openness and Transparency

Most of the reforms we have discussed so far involve conducting research more transparently. In forensic science, however, there is also the matter of putting that research into practice. As the PCAST Report said, forensic scientific disciplines must be both foundationally valid and applied in a valid way. As to applied validity, open science reforms and initiatives are less directly applicable. But still, some open principles and techniques can be applied to forensic scientific practice. We will discuss three: (1) transparently reporting forensic analytic choices; (2) open forensic workflow and analysis; and, (3) open proficiency testing and error repositories. Central to all of our suggestions is transparency and removing some discretion in what practitioners report about their process. As we have seen, even well-trained academic scientists have used flexibility in their methods to generate misleading results. We should be concerned about the same issues occurring in applied forensic scientific practice.

First, consider employing greater transparency in forensic practice, particularly fingerprint analysis. As part of their methodology, fingerprint examiners determine which features or ‘minutiae’ of a latent print (ie one found at a crime scene) are distinctive and will thus be important during comparison. However, practitioners—after viewing the comparison (ie exemplar) print—can go back and alter those features they deemed important. This practice, if not fully documented—and it often is not—can be highly misleading and result in undisclosed confirmation bias and circular reasoning.

We suggest that examine