An academic scientist’s professional success depends on publishing. Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results. Prior reports demonstrate how these incentives inflate the rate of false effects in published science. When incentives favor novelty over replication, false results persist in the literature unchallenged, reducing efficiency in knowledge accumulation. Previous suggestions to address this problem are unlikely to be effective. For example, a journal of negative results publishes otherwise unpublishable reports. This enshrines the low status of the journal and its content. The persistence of false findings can be meliorated with strategies that make the fundamental but abstract accuracy motive—getting it right—competitive with the more tangible and concrete incentive—getting it published. This article develops strategies for improving scientific practices and knowledge accumulation that account for ordinary human motivations and biases.

The chief thing which separates a scientific method of inquiry from other methods of acquiring knowledge is that scientists seek to let reality speak for itself, and contradict their theories about it when those theories are incorrect. . . . Scientific researchers propose hypotheses as explanations of phenomena, and design experimental studies to test these hypotheses via predictions which can be derived from them. These steps must be repeatable, to guard against mistake or confusion in any particular experimenter. . . . Scientific inquiry is generally intended to . . . document, archive and share all data and methodology so they are available for careful scrutiny by other scientists, giving them the opportunity to verify results by attempting to reproduce them. Scientific Method, n.d.) — From http://en.wikipedia.org/wiki/Scientific_method

A True Story of What Could Have Been Two of the present authors, Matt Motyl and Brian A. Nosek, share interests in political ideology. We were inspired by the fast growing literature on embodiment that demonstrates surprising links between body and mind (Markman & Brendl, 2005; Proffitt, 2006) to investigate embodiment of political extremism. Participants from the political left, right, and center (N = 1,979) completed a perceptual judgment task in which words were presented in different shades of gray. Participants had to click along a gradient representing grays from near black to near white to select a shade that matched the shade of the word. We calculated accuracy: How close to the actual shade did participants get? The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (p = .01). Our conclusion: Political extremists perceive the world in black and white figuratively and literally. Our design and follow-up analyses ruled out obvious alternative explanations such as time spent on task and a tendency to select extreme responses. Enthused about the result, we identified Psychological Science as our fallback journal after we toured the Science, Nature, and PNAS rejection mills. The ultimate publication, Motyl and Nosek (2012), served as one of Motyl’s signature publications as he finished graduate school and entered the job market. The story is all true, except for the last sentence; we did not publish the finding. Before writing and submitting, we paused. Two recent articles have highlighted the possibility that research practices spuriously inflate the presence of positive results in the published literature (John, Loewenstein, & Prelec, 2012; Simmons, Nelson, & Simonsohn, 2011). Surely ours was not a case to worry about. We had hypothesized it; the effect was reliable. But we had been discussing reproducibility, and we had declared to our lab mates the importance of replication for increasing certainty of research results. We also had an unusual laboratory situation. For studies that could be run through a Web browser, data collection was very easy (Nosek et al., 2007). We could not justify skipping replication on the grounds of feasibility or resource constraints. Finally, the procedure had been created by someone else for another purpose, and we had not laid out our analysis strategy in advance. We could have made analysis decisions that increased the likelihood of obtaining results aligned with our hypothesis. These reasons made it difficult to avoid doing a replication. We conducted a direct replication while we prepared the manuscript. We ran 1,300 participants, giving us .995 power to detect an effect of the original effect size at α = .05. The effect vanished (p = .59). Our immediate reaction was “why the #&@! did we do a direct replication?” Our failure to replicate does not make definitive the conclusion that the original effect is false, but it raises enough doubt to make reviewers recommend against publishing. Any temptation to ignore the replication and publish the original was squashed only by the fact that our lab mates knew we ran a replication. We were accountable to them. The outcome—a dead or delayed paper—is unfortunate for our career advancement, particularly Motyl’s as he prepared for the job market. Incentives for surprising, innovative results are strong in science. Science thrives by challenging prevailing assumptions and generating novel ideas and evidence that push the field in new directions. We cannot expect to eliminate the disappointment that we felt by “losing” an exciting result. That is not the problem or at least not one for which the fix would improve scientific progress. The real problem is that the incentives for publishable results can be at odds with the incentives for accurate results. This produces a conflict of interest. The conflict may increase the likelihood of design, analysis, and reporting decisions that inflate the proportion of false results in the published literature.1 The solution requires making incentives for getting it right competitive with the incentives for getting it published. Without that, the lesson that we could take away from our experience with “Political extremists do not perceive shades of gray, literally” is to never, ever do a direct replication again. The purpose of this article is to make sure that such a lesson does not stick.

Conclusion We titled this article “Scientific Utopia” self-consciously. The suggested revisions to scientific practice are presented idealistically. The realities of implementation and execution are messier than their conceptualization. Science is the best available method for cumulating knowledge about nature. Even so, scientific practices can be improved to enhance the efficiency of knowledge building. The present article outlined changes to address a conflict of interest for practicing scientists—the rewards of getting published that are independent of the accuracy of the findings that are published. Some of these changes are systemic and require cultural, institutional, or collective change. But others can emerge “bottom-up” by scientists altering their own practices. We, the present authors, would like to believe that our motivation to do good science would overwhelm any decisions that prioritize publishability over accuracy. However, publishing is a central, immediate, and concrete objective for our career success. This makes it likely that we will be influenced by self-serving reasoning biases despite our intentions. The most effective remedy available for immediate implementation is to make our scientific practices transparent. Transparency can improve our practices even if no one actually looks, simply because we know that someone could look. Existing technologies allow us to translate some of this ideal into practice. We make our unpublished manuscripts available at personal Web pages (e.g., http://briannosek.com/) and public repositories (http://ssrn.com/). We make our study materials and tools available at personal Web pages (e.g., http://people.virginia.edu/~msm6sw/materials.html; http://people.virginia.edu/~js6ew/). We make data available through the Dataverse Network (e.g., http://dvn.iq.harvard.edu/dvn/dv/bnosek), and we are contributing to the design and construction of the Open Science Framework for comprehensive management and disclosure of our scientific workflow (http://openscienceframework.org/). Opening our research process will make us feel accountable to do our best to get it right and, if we do not get it right, to increase the opportunities for others to detect the problems and correct them. Openness is not needed because we are untrustworthy; it is needed because we are human.

Acknowledgements We thank Yoav Bar-Anan, Roger Giner-Sorolla, Jesse Graham, Hal Pashler, Marco Perugini, Bobbie Spellman, N. Sriram, Victoria Stodden, and E. J. Wagenmakers for helpful comments.

Declaration of Conflicting Interests

The authors declared that they had no conflicts of interest with respect to their authorship or the publication of this article.

Notes 1.

We endorse a perspectivist approach to science (McGuire, 2004)—the idea that all claims may be true given the appropriate conditions. In this article, when we say “true,” we mean the truth of the claim as it is stated, usually conceived as the ordinal relationship between conditions, effects, or direction of correlation (Frick, 1996). The general truth value of a claim is established by expressing the limiting conditions under which it is true. Without expressing those conditions, the claim is likely to be false or, at best, partly true. 2.

Later we will argue that this is more the perceived than the real formula for success. For now, we are dealing with perception, not reality. 3.

A reasonable justification is that I am doing innovative research on a new phenomenon. Our resources for data collection are limited. It would be a poor use of resources to invest heavily if there is no effect to detect or if I am pursuing it the wrong way. An unreasonable consequence is that if the effect being investigated does not exist, the best way to obtain a significant result by chance is to run multiple small sample studies. If the effect being investigated does exist, the best way to confirm it is to run a single high-powered test. 4.

An exception is the scientific anarchist Feyerabend (1975), who rejected the notion that there were any universal methodological rules for the scientific method and argued that science had no special status for identifying “objective” truths more than any other approach. 5.

In reality, conceptual and direct replications exist on a continuum rather than being discrete entities (Schmidt, 2009). There is no such thing as an “exact” replication outside of simulation research because the exact conditions of the original investigation can never be duplicated. Direct replication therefore means that the original conditions are reproduced such that there is no reason to expect a different result based on its present interpretation. If sample, setting, or procedural factors are essential, then those must be specified in order to have a proper theoretical understanding. As such, among other reasons, a failure to replicate could mean that the conditions necessary to elicit the original result are not yet understood (see Open Science Collaboration [2012a] for more about possible interpretations of a failure to replicate). Further, deciding that a conceptual replication (whether successful or unsuccessful) tests the same phenomenon as an original result is usually a qualitative assessment rather than an empirical one.