Needless compromises were built into the design and publication of the RP:P. These arbitrary procedural rules for conducting replications could be more fruitfully turned into recommendations for how not to conduct replications. The general theme of my objections is that collaboration initiatives, at least as they are currently organized, bureaucratize and otherwise make more complicated procedures that should be as simple as the procedures that routinely put untrustworthy science into the literature. Current rules risk ‘ghettoizing’ replications when effort should be made instead to insist on widening their acceptability, particularly in the prestigious journals that produced untrustworthy science. Furthermore, the RP:P and related initiatives inadvertently strengthen questionable publication practices which we desperately need to challenge.

Kahneman’s adversarial collaboration

Nobel Prize winner Daniel Kahneman [35] has been influential in recommending:

“when the replication is ready – after a pilot but before data collection – the replicator sends the author a detailed description of the planned procedure, including actual programs and a video when relevant.”

And if there is any doubt in his position, he further states:

“A good-faith effort to consult with the original author should be viewed as essential to a valid replication.”

Although well-meant and intended to preempt anticipated criticism of replication initiatives, Kahneman’s [32] call for involving the authors of the original studies in replication as an adversarial collaboration is unfortunate for a number of reasons. Kahneman has provided a clear rationale for this position:

“I share the common position that replications play important role in our science – to some extent by cleaning up the scientific record, mostly by deterring soppy research. However, I believe the current norms allow replicators too much freedom to define their study is a direct replication of previous research. Authors should be guaranteed a significant role in the replications of their work.”

What could Kahneman possibly mean by “too much freedom”? Ultimately neither the original authors nor those who undertake replications have the final word on whether a study can be deemed a direct replication of previous research. That should be left to post publication peer review. Replication should be as freely undertaken as original research, and so there is no reason to slap this constraint on it. Would Kahneman extend this principle to any effort to critically examine empirically existing research findings or theoretical claims? Furthermore, if we insist on authors of original research being involved in any replications, it takes pressure off them to provide sufficiently clear and transparent description of their methods in their publication of their original results. We should not coddle authors of scientific papers: they should expect attempted replications as inevitable, contingent upon how much effort replication would take and the credibility being attached to their findings.

Pre-approval by peer review of attempted replications

The strong recommendation is that investigators planning to attempt a replication should first get pre-approval by independent peer review – including the authors of the original research – of their rationale, design, and analytic plans. Again, why adopt such cumbersome rules if publication of the original research was not subject to them? Peer review can be a slow, undependable process that may introduce biases, not only from the original investigator but of theoretical and professional allies. John Ioannidis’ concept [36] of obligated replication comes to mind. This refers to a corruption of peer review whereby proponents of a dominant school of thought or theory control publication venues so they can largely select and mold what gets published. This requirement of prior peer review of replication initiatives inadvertently extends their control to even what research can be re-evaluated.

Direct, rather than conceptual replication

Whether direct replication is preferred to conceptual replication or whether internal versus external validity is to be emphasized depends a lot on context. It is common practice going back to Berkowitz and Donnerstein [37] for social psychologists to insist on the tightest of experimental procedures while at the same time claiming broadest generalizability to the same world. Fraudster Diederik Stapel [38] claimed that before he resorted to outright fabrication of data, he wrote to investigators when he could not replicate their striking findings. He often got advice, such as:

“Don’t do this test on a computer. We tried that and it doesn't work. It only works if you use pencil-and-paper forms.” “This experiment only works if you use ‘friendly’ or ‘nice’. It doesn't work with ‘cool’ or ‘pleasant’ or ‘fine’. I don’t know why.”

Amazed that he could now replicate the results, Stapel considered himself as admitted to the “Grand Fellowship of Secret Procedures.”

Any investigator who has been in the field for very long has realized that minor, seemingly arbitrary and even theoretically irrelevant modifications in procedures can lead to a considerable difference in the size and direction of results that are obtained. Insistence on direct replication as a general principle rather than a strategy requiring justification could perpetuate acceptance of results of only limited generalizability. The issue becomes more important when social or public health implications are claimed for findings.

For instance, a bug killing paradigm [39] has been used to make socially important generalizations about soldiers being put at risk for posttraumatic stress disorder when they are placed in morally injurious situations. Arguably, investigators attempting replications should not be confined to the specific species of insects as the original experiments, given the robustness and broad generalizations claimed for the original study. If the replicators fail with different insects, post-publication peer reviewers are free to dismiss any utility of pursuing this line of research – or to applaud it. Similar situations are posed by researchers who claim in heavily promoted studies that positive thinking saps energy and initiative in everyday life, based on studies of undergraduate females having their satisfaction with the hypothetical purchase of high heels assessed in interaction with a computer [40]. Given the common undeclared conflicts of interest of these investigators and such a broad claim of generalizability claimed for everyday life, skepticism should be encouraged and not constrained by these researchers’ subsequent claims of lack of fidelity to the often fragile or poorly defined original experimental conditions.

Reversing the traditional perspective that a psychology study should be tightly controlled in artificial laboratory situations, replicators might consider deliberately loosening experimental control with the intention of incorporating more real-world elements and testing the generalizability of claims across variations. Experimental realism and simulations of the context which generalizations are made should trump original investigators’ opinions about the fidelity of replications to the original manipulations.

Protecting premium top shelf journals from null findings and attempted replication

The Open Science Collaboration’s attempted replication of 100 studies was published in the prestigious journal, Science. Publishing the first paper from the replication initiative was consistent with the journal’s policy of valuing the newsworthy and innovative. Yet, we should be skeptical about whether publishing a bundled set of 100 attempted replications of studies in prestigious psychology journals is a game-changing precedent that will result in routine publication of smaller collections or a single replications in premium top shelf journals. Science is a prime example of a journal that has earned its “premium top shelf” status by not routinely publishing replications or null findings unless there is some extraordinary reason for doing so.

The prestigious psychology journals that published the original studies slated for The Open Science Collaboration effort – Journal of Personality and Social Psychology, Psychological Science, and Journal of Experimental Psychology: Learning, Memory, and Cognition- are unlikely anytime soon to give routinely attempted replications, particularly those producing null results, the same priority as original research -which the RP:P suggested is untrustworthy. An outgoing editor of Psychological Science [41] stated that he had rejected over 6000 submissions in his five years as editor without the manuscript is going out to reviewers. At the top of his three reasons was:

“The Pink Floyd Rejection: Most triaged papers were of this type; they reported work that was well done and useful, but not sufficiently groundbreaking. So the findings represented just another brick in the wall of science.”

Praise of “Pink Floyd rejection” can be turned into a critique of a particular type of publication bias that characterizes Journal of Personality and Social Psychology as well as Psychological Science. It can serve as a warning that replications of individual published studies, particularly those that do not yield positive results, are not welcomed. But such “bricks in the wall” are likely more trustworthy than the over 50 % of Journal of Personality and Social Psychology and Psychological Science articles evaluated in the RP:P that did not reproduce with the same strength of effects.

A number of compromises have been struck between organized efforts to further replicate studies in the psychological literature and professional organization publishers. Both the American Psychological Association and the Association for Psychological Science have endorsed replication initiatives, but direct them to journals other than their protected premium top shelf journals.

These compromises serve to protect the strong publication bias and therefore the unrepresentativeness of what is published in these premium top shelf journals. The prestige of JPSP and PS, as reflected in the journal impact factors by which these two journals compete against each other, is furthered by keeping out individual replications, especially those with null findings. The validity of journal impact factors has of course been subject to withering criticism, but they still matter to early career investigators attempting to advance. Deals between replication initiatives and the APS protect Psychological Science from having to accept individual replications, positive and failed, by requiring preregistration and gathering replications up and herding them into a ‘ghetto’ in Perspectives on Psychological Science. For the APA, Journal of Consulting and Clinical Psychology gets similar protection by exiling null psychotherapy trials in a special section of brief reports in the less prestigious and lower impact Journal of Psychotherapy Integration. Successful and failed replications of studies originally published in the APA’s Journal of Personality and Social Psychology are referred on to the non-APA journal Social Psychology. The energy of researchers seeking to improve the trustworthy of psychologists are deflected from continued demands for enforcement of the Pottery Barn rule (https://hardsci.wordpress.com/2012/09/27/a-pottery-barn-rule-for-scientific-journals/): journals which publish original research should be required to publish attempted replications.

The right target: questionable publishing practices rather than questionable research practices

Replication initiatives implicitly place the proximal cause of untrustworthiness of psychological science in endemic questionable research practices. Various list and taxonomies of QRP’s are available, but Simmons, Nelson, and Simonsohn’s [42] list of six ways to p-hack are a useful start, even if incomplete:

1. Stop collecting data once p < .05. 2. Analyze many measures, but report only those with p < .05. 3. Collect and analyze many conditions, but only report those with p < .05. 4. Use covariates to get p < .05. 5. Exclude participants to get p < .05. 6. Transform the data to get p < .05.

Although there is a general squeamishness about blaming authors of individual papers, replication initiatives are needed because of the high prevalence of these QRPs in the psychological literature, even in the prestigious journals which the RP:P sampled. Replication initiatives essentially expose the QRPs in published research by demonstrating that key findings cannot be reproduced when independent investigators commit themselves to transparently plan, conduct, and report their replication efforts.

But authors have incentives and protections for engaging in QRPs from strong institutional pressures to publish noteworthy, immediately newsworthy, and ostensibly novel findings versus findings that are more robust but more mundane. As long as pressure on authors from institutions continues, replication initiatives waste the effort of investigators in who might otherwise commit themselves to moving science ahead by building on the secure foundation of more trustworthy past research.

Much could be accomplished by insisting on diligent enforcement of existing rules and standards of best publication practices. Psychology has tended to take its cue from reforms in the biomedical literature where compliance, even though far from perfect, is more likely because of the pressures of government and regulatory agencies that insist on compliance as a condition for approval of pharmaceuticals and medical devices. Psychological journals adopted Consolidated Standards of Reporting of Trials (CONSORT) [43] later and less consistently than medical journals did. Until my colleagues and I protested [44], the American Psychological Association’s late adoption of CONSORT applied only to randomized evaluation of psychological interventions that were explicitly labeled randomized trials in the title or abstract. But that a randomized trial is labeled as such is a checklist item of the CONSORT checklist, not a condition under which the checklist is applicable.

Requirements that the rationale, design, analytic plans, and primary outcomes of clinical trials be registered are similarly being only slowly and inconsistently adopted for psychological interventions. There is evidence that trial registration, if it takes place at all, is after data collection has begun [45]. There is further evidence that editors and reviewers fail to consult published trial registration and protocols in evaluating manuscripts, with the effect that primary outcomes often shift in the published reports [46]. Requests for sharing of data when sharing is mandated are often rejected or simply ignored, with evidence that authors of studies with exaggerated interpretation of findings or outright errors are less responsive to requests for their data [47].