Background

In 2011, Daryl Bem published a paper in the Journal of Personality and Social Psychology describing the results from nine experiments that appeared to support the existence of psychic ability (Bem, 2011). The high-profile nature of the journal, combined with the controversial findings, resulted in Bem’s paper attracting a considerable amount of attention within both academia and the media (Carey, 2011; Halliwell, 2011). Several academics were critical of Bem’s paper, with researchers subsequently reporting a failure to replicate his experiments (Ritchie, Wiseman & French, 2012), commenting on the a priori unlikelihood of psychic ability existing (Wagenmakers et al., 2011) and questioning the validity of the original studies (Wagenmakers et al., 2011; Alcock, 2011; Francis, 2012; Schimmack, 2012). This latter strand of criticism focused on a variety of methodological and statistical issues, including the lack of a detailed analysis plan, selective reporting of data, post hoc analyses being presented as confirmatory findings, and the incomplete description of experimental procedures.

Some commentators noted that many of the criticisms aimed at Bem’s work could also apply to research within mainstream behavioral science (LeBel & Peters, 2011). Additional work on this topic identified several ’questionable research practices’ (QRPs), including the failure to publish null studies (creating the so-called ‘file drawer problem’), the alteration of hypotheses after data collection (often referred to as ‘Hypothesizing After the Results are Known’ or ‘HARKing’), and the fishing around in data for significant findings (‘p-hacking’: see, e.g., John, Loewenstein & Prelec, 2012; Neuroskeptic, 2012). This work, combined with the results from a large-scale initiative questioning the replication rates of some well-regarded psychological effects (Open Science Collaboration, 2015), laid the foundations for the recent ‘replication crisis’ in psychology (Pashler & Wagenmakers, 2012).

Some researchers have begun to address issues surrounding poor replication rates by developing procedures to help minimise QRPs (Nosek, Spies & Motyl, 2012). One of the most popular approaches involves encouraging experimenters to describe their hypotheses and planned analyses prior to data collection (Wagenmakers et al., 2012; Van ‘t Veer & Giner-Sorolla, 2016). This concept, known as ‘pre-registration’, helps minimize several of the most important QRPs, including the selective reporting of studies, HARKing and p-hacking.

There are two main forms of study registration. The first approach involves experimenters producing a description of their intended study (including the number of participants, hypotheses and planned analyses) and then submitting this information to some form of trusted study registry such as the Open Science Framework (Nosek et al., 2018).

Study registries have operated for a long time. The first registries were established by medical researchers in the 1960s, and were originally designed to help experimenters recruit participants for clinical trials rather than prevent QRPs (Dickerson & Rennie, 2003). From the mid-1980s onwards, however, medical researchers began to recognize the importance of the issues surrounding the non-publication of null results (Simes, 1986; Easterbrook et al., 1991) and so developed study registries explicitly designed to tackle the problem.

Kaplan & Irvin (2015) recently demonstrated the need for such registries. In 2000, the National Library of Medicine at the National Institutes of Health required medical researchers to preregister key aspects of their studies (including experimental protocol, sample sizes, plans for handling missing data, and statistical analyses). Kaplan & Irvin compared study outcomes before and after the mandatory need for preregistration, examining studies that had investigated the impact of drugs and dietary supplements on cardiovascular disease. Remarkably, 57% of the studies published prior to 2000 reported a significant effect, compared to just 8% of studies published after the introduction of mandatory preregistration.

In 2008, the Neuroskeptic blog (Oct. 25 2008; Nov. 3, 2008) discussed the need for preregistration within psychology. To our knowledge, psychology’s first formal study registry (reviewing all submissions and making them irreversibly public) was launched in 2012 at the University of Edinburgh and focused on parapsychological research (Watt, 2012; Watt & Kennedy, 2015). In 2013, Jona Sassenhagen from the University of Marburg was the first researcher to preregister a mainstream psychological study, albeit using a registry designed to log clinical studies (Neuroskeptic, Feb 3 2013). Since then, several online platforms have been created for study registration within the behavioral sciences, including the Open Science Framework (https://osf.io), ‘As Predicted’ (https://aspredicted.org) and The American Economic Association’s Registry for Randomized Controlled Trials (https://www.socialscienceregistry.org).

The second type of preregistration is journal-based, and involves investigators producing a complete description of their intended study (including experimental rationale, hypotheses, method and planned analyses) and then submitting this report for peer review prior to data collection. If the submission is accepted, the authors are guaranteed publication regardless of study outcome. Chambers (2017) coined the phrase ‘Registered Reports’ (RRs) to describe this procedure.

There are several ways in which RRs can help to improve the quality of research. As with all forms of pre-registration, RRs require experimenters to pre-specify several aspects of their study (including planned hypotheses, number of participants and intended analyses) and so help to prevent publication bias, p-hacking and HARKing. In addition, RRs require investigators to describe their study rationale and methodological procedures, thus presenting referees with an opportunity to help improve the theoretical basis and design of an experiment prior to data collection.

When tracing the historical roots of this idea, Chambers (2017) has noted that Rosenthal (1966), Walster & Cleary (1970) and Newcombe (1987) all outlined early versions of this idea, primarily in an attempt to combat publication bias. Similarly, Kupfersmid (1988) suggested that peer review should be conducted prior to data analysis, noting that this would help prevent publication bias and p-hacking. Weiss (1989) also recommended that a paper be reviewed prior to data collection, but suggested that this would help to prevent researchers wasting their time running poor quality studies, rather than minimising QRPs. Unfortunately, psychology journals at the time failed to adopt the procedure.

In 2012, Chambers was invited to join the Editorial Board of Cortex, and suggested that the journal help prevent QRPs by encouraging researchers to submit papers for review prior to data collection (Chambers, 2017). In 2013, Cortex adopted Chambers’ suggestion (Chambers, 2013), and Chambers & Munafo (2013) published an open letter calling for other journals to adopt the same approach. This letter was signed by a large number of psychologists and helped attract attention to the notion of RRs. Currently, over a hundred journals now accept this form of submission (Center for Open Science, 2018).

Understandably, attempts to outline the historical roots of RRs have tended to focus on previous research within mainstream psychology and the adoption of the procedure by Cortex in 2013. However, many academics are unaware that a prototype version of RRs was implemented in the mid 1970s by a little-known parapsychology journal, and that this pioneering publication policy ran successfully for many years.

In 1973, psychologist Martin Johnson was officially appointed as a professor of parapsychology at the University of Utrecht (Schouten, 1988 –1989; Parker & Mörck, 2011). Johnson remained in post until 1986 and passed away in 2011.

Due to its controversial subject matter, parapsychology has traditionally attracted a considerable amount of critical attention. During the mid-1970s much of this attention focused on identifying potential methodological and statistical shortcomings, and developing ways to help minimize these issues. Much of this debate anticipated the present-day work into QRPs in mainstream psychology, and involved detailed discussions on the impact of post hoc analyses (Wiklund, 1977) and publication bias (Rhine, 1975; Johnson, 1976).

In November 1974, Johnson gave his inaugural professorial address at the University of Utrecht and subsequently published some of the key points from the talk in a journal article (Johnson, 1975). In this article, Johnson argued that it was vital to minimise possible methodological issues in both parapsychology and mainstream psychology. He then outlined three ways of conducting research and explored the degree to which each was open to bias. The first approach simply involved a researcher carrying out an experiment on their own whilst the second approach involved them working as part of a team. As such, both approaches were relatively informal in nature and so open to several QRPs. However, the third approach described by Johnson was far more rigorous and was explicitly designed to prevent several methodological and statistical problems:

“... according to the philosophy of this model, the experimenter should define his problem, formulate his hypotheses, and outline his experiment, prior to commencing his study. He should write his manuscript, stating at least essential facts, before carrying out his investigation. This manuscript, in principle only lacking data in the tables, presentation of results, and interpretation of results, should be sent to one or more editors, and the experimenter should not initiate his study until at least one of the editors has promised to publish the study, regardless of the outcome of the experiment. In this way we could avoid selective reporting. Furthermore the experimenter will not be given the opportunity to change his hypotheses in such a way that they “fit” the outcome of the experiment.” (Page 41)

In short, in 1974, Johnson outlined many of the key attributes now associated with RRs, including investigators describing important aspects of a study prior to data collection, the reviewing of this report, and the guarantee of publication regardless of study outcome.

Johnson then teamed up with another parapsychologist and member of the Utrecht Psychology Department, Sybo Schouten, and together they launched The European Journal of Parapsychology (EJP). This little-known journal was primarily designed to publish experimental work testing the possible existence of psychic ability. In Volume 1:1 (November 1975), the Editors outlined their preference for researchers to submit papers prior to data collection (Johnson & Schouten, 1975). This initial volume also contained the paper based on Johnson’s inaugural address. Volume 1:2 (May 1976) contained another article by Johnson about the importance of this policy for combatting the non-publication of null results (Johnson, 1976). Volume 1:3 (November 1976) contained the first formal statement describing the journal’s publication policy and noted:

“A hallmark of the European Journal of Parapsychology is the attempt to avoid selective reporting, that is, the tendency to bury ’negative’ results and only to publish studies that ’turn out’. To avoid turning the journal into a graveyard for all ’unsuccessful’ studies, we require that the acceptance or rejection of a manuscript should take place prior to the phase when the experimental data are collected. The quality of the design and methodology and the rationale of the study should be judged as per se more important than the level of significance of the outcome of the study. As a practical rule, we advise a contributor of an article to submit a design of his planned study before the study is actually carried out. The rationale of the study should be stated, as well as all the hypotheses related to it. Furthermore one should try to specify the number of subjects, the number of trials, etc., plus the type of statistical methods one plans to use for one’s evaluation. Priority will be given to the publication of studies which fulfil the above-stated publication policy.”

As such, the EJP editors embraced the underlying ethos of RRs (including the importance of publishing both positive and negative results, and judging the quality of research prior to data collection) and made initial attempts to devise a system that encapsulated many of the key attributes of RRs (including encouraging researchers to submit a document specifying their experimental protocol, sample size, number of trials, rationale and hypotheses, and data analyses). It should be noted, however, that the EJP editorial guidelines did not involve many of the more elaborate checks and balances associated with many modern-day systems for RRs (including, for instance, researchers having to complete templates that require them to pre-specify key information; editors, referees and authors working together to form an ‘in-principle acceptance’ of the study; a second round of peer reviewing post data collection; logging uncompleted or withdrawn studies; the storage and publication of researchers’ initial documentation). In addition, as the initial documentation submitted to the EJP by researchers was not, to our knowledge, retained or published, it isn’t possible to retrospectively judge the degree to which researchers adhered to the editorial guidelines. These shortcomings aside, it’s clear that the EJP editors developed and carried out a prototypical version of modern-day systems for RRs.

This publication statement then appeared in every issue of the EJP from 1976 to 1992. In 1992, following the closure of Utrecht’s parapsychology laboratory, the EJP editorship transferred to the Koestler Parapsychology Unit (University of Edinburgh), and a slightly modified version of the publication policy (albeit one still emphasising an openness to reviewing manuscripts prior to data collection) appeared between 1992 and 2000. The final RR was published in the EJP in Volume 9 (1992-1993). In 2000, the EJP editorship transferred to the University of Stockholm and the publication policy no longer referred to RRs.

For about 17 years (1976 to 1993), the EJP published a mixture of RRs and non-RRs. In addition to playing an important, and little known, role in the history of RRs, this unique database presents an opportunity to conduct an exploratory and retrospective study assessing the impact of RRs on study outcome. Given that RRs were designed to reduce QRPs, it was hypothesised that RRs would contain a lower proportion of statistically significant results than non-RRs. Additional analyses aimed to explore whether such a finding could be due to two alternative explanations (namely, whether any differences were due to the RRs and non-RRs involving different types of studies or being conducted at different times). All analyses were exploratory, and all data exclusions and measures have been reported.