We investigated the relationship between ES and SS in a random sample of papers drawn from the full spectrum of psychological research and found a strong negative correlation of r = −.54 (r = −.45 after excluding extreme sample sizes). That is, studies using small samples report larger effects than studies using large samples. We also analyzed the distribution of p values in a caliper test and found about 3 times as many studies just reaching than just failing to reach significance. Finally, neither asking authors directly, nor coding power from papers, indicated that power analysis was consistently used. This pattern of findings allows only one conclusion: there is strong publication bias in psychological research!

Publication Bias in Psychology

Publication bias, in its most general definition, is the phenomenon that significant results have a better chance of being published, are published earlier, and are published in journals with higher impact factors [34]. Publication bias has been shown in a diverse range of research areas (political science [29], sociology [39], [40], evolutionary biology [42] and also in some areas of psychology [43], [44]) However, most of these prevalence estimates have been based on the analysis of some journal volumes over a specific course of time or specific meta-analyses. Based on these findings one can only argue that publication bias is a problem in a specific area of psychology (or even only in specific journals) and yet no conclusive empirical evidence for a pervasive problem has been provided, although many see pervasive publication bias as the root of many problems in psychology [27]. In contrast, we were investigating and estimating publication bias over the whole field of psychology using a random sample of journal articles. In our sample we also found publication bias for getting published, but our analysis does not allow identification of its source: failure to write up, failure to submit, failure to send out for review, failure to recommend acceptance, or failure to accept a paper. However, as Stern and Simes [45] pointed out, individual factors cannot be assumed to be independent from editorial factors, since previous experiences may have conditioned authors to expect rejection of non-significant studies [20]. Thus, anticipation of biased journal practice may influence the decision to write up and submit a manuscript. It may even influence how researchers conduct studies in the first place. There are many degrees of freedom for conducting studies and analyzing results that increase the probability of getting a significant result [26], and may therefore be used in order to minimize the danger of non-significant results [27]. Researchers then may be tempted to write up and concoct papers around the significant results and send them to journals for publication. This outcome selection seems to be widespread practice in psychology [12], which implies a lot of false positive results in the literature and a massive overestimation of ES, especially in meta-analyses. What is often reported as the “mean effect size” of a body of research can, in extreme cases, actually be the mean ES of the tail of the distribution that only consists of overestimations. Consequently, a substantial part of what we think is secure knowledge might actually be (statistical) errors [46]–[49]. Ioannidis [50], for example, analyzed frequently cited clinical studies and their later replications. He found that many studies, especially those of small SS, reported stronger effects than larger subsequent studies, i.e., replication studies found smaller ES than the initial studies. The tendency of effects to fade over time was discussed as the decline effect in Nature [51]. Most of the reported examples for the decline effect stem from sciences other than psychology, but Fanelli [52] found that results giving support to the research hypothesis increase down the hierarchy of the sciences. In psychological and psychiatric research the odds of reporting a positive result was around 5 times higher than in Astronomy and in Psychology and Psychiatry over 90% of papers reported positive results.

Publication practice needs improvement. Otherwise misestimation of empirical effects will continue and will threaten the credibility of the entire field of psychology [53]. Many solutions have been proposed, all having their specific credits.

One proposal is to apply stringent standards of statistical power when planning empirical research. A review on the reporting of sample size calculation in randomized controlled trials in medicine found that 95% of 215 analyzed articles reported sample size calculations [54]. In comparison, only about 3% of psychological articles reported statistical power analyses [8]. However, the usefulness of power analysis is debatable. For example, in medicine a research practice called sample size samba emerged as a direct consequence of requiring power analysis. Sample size samba is the retrofitting of a treatment effect worth of detection to the predetermined number of available participants and seems to be fairly common in medicine [55].

Another proposal requires that studies are registered prior to their realization. Unpublished studies can then be traced and included in systematic reviews, or at least the amount of publication bias can be estimated. For many clinical trials study registration and reporting of results is required by federal law, and some medical journals require registration of studies in advance [56].

Another proposal is the open access movement, which requires that all research must be freely viewable to the public. Related is data sharing, which requires authors to share data when requested. Data sharing practices have been found to be somewhat lacking and have been put forward as one reason impeding scientific progress and replication [41]. However, some journals do encourage data sharing, like the journal Psychological Science that earns an Open Data badge, printed at the top of an article. Finally, open-access databases, where published and unpublished findings are stored, greatly reduce bias due to publication practice (see [57]).

Still another proposal is to install replication programs where statistically insignificant results are published as part of the program [58]. However, there is skepticism about the value of replication (see the special section on behavioral priming and its replication of Perspectives on Psychological Science, Vol. 9, No 1, 2014), and on whether a wide-spread replication attempt will find enough followers in the research community. After all, the payoff from reporting new and surprising findings is larger than the payoff from replication, and replications have a lower chance of being published [59].