The “replication crisis” has been attributed to misguided external incentives gamed by researchers (the strategic-game hypothesis ). Here, I want to draw attention to a complementary internal factor, namely, researchers’ widespread faith in a statistical ritual and associated delusions (the statistical-ritual hypothesis ). The “null ritual,” unknown in statistics proper, eliminates judgment precisely at points where statistical theories demand it. The crucial delusion is that the p value specifies the probability of a successful replication (i.e., 1 – p ), which makes replication studies appear to be superfluous. A review of studies with 839 academic psychologists and 991 students shows that the replication delusion existed among 20% of the faculty teaching statistics in psychology, 39% of the professors and lecturers, and 66% of the students. Two further beliefs, the illusion of certainty (e.g., that statistical significance proves that an effect exists) and Bayesian wishful thinking (e.g., that the probability of the alternative hypothesis being true is 1 – p ), also make successful replication appear to be certain or almost certain, respectively. In every study reviewed, the majority of researchers (56%–97%) exhibited one or more of these delusions. Psychology departments need to begin teaching statistical thinking, not rituals, and journal editors should no longer accept manuscripts that report results as “significant” or “not significant.”

References

American Psychological Association . ( 1994 ). Publication manual of the American Psychological Association ( 4th ed. ). Washington, DC : Author .

Google Scholar

American Psychological Association . ( 2001 ). Publication manual of the American Psychological Association ( 5th ed. ). Washington, DC : Author .

Google Scholar

American Psychological Association . ( 2010 ). Publication manual of the American Psychological Association ( 6th ed. ). Washington, DC : Author .

Google Scholar

Anastasi, A. ( 1958 ). Differential psychology ( 3rd ed. ). New York, NY : Macmillan .

Google Scholar

Anderson, B. L., Williams, S., Schulkin, J. ( 2013 ). Statistical literacy of obstetrics-gynecology residents . Journal of Graduate Medical Education, 5, 272 – 275 . doi:10.4300/JGME-D-12-00161.1

Google Scholar

Anderson, R. L., Bancroft, T. A. ( 1952 ). Statistical theory in research. New York, NY : McGraw-Hill .

Google Scholar

Badenes-Ribera, L., Frias-Navarro, D., Iotti, B., Bonilla-Campos, A., Longobardi, C. ( 2016 ). Misconceptions of the p-value among Chilean and Italian academic psychologists . Frontiers in Psychology, 7, Article 1247. doi:10.3389/fpsyg.2016.01247

Google Scholar

Badenes-Ribera, L., Frias-Navarro, D., Monterde-i-Bort, H., Pascual-Soler, M. ( 2015 ). Interpretation of the p value: A national survey study in academic psychologists from Spain . Psicothema, 27, 290 – 295 . doi:10.7334/psicothema2014.283

Google Scholar

Bakker, M., Hartgerink, C. H. J., Wicherts, J. M., van der Maas, H. L. J. ( 2016 ). Researchers’ intuitions about power in psychological research . Psychological Science, 27, 1069 – 1077 . doi:10.1177/0956797616647519

Google Scholar SAGE Journals | ISI

Bakker, M., van Dijk, A., Wicherts, J. M. ( 2012 ). The rules of the game called psychological science . Perspectives on Psychological Science, 7, 543 – 554 . doi:10.1177/1745691612459060

Google Scholar SAGE Journals | ISI

Bakker, M., Wicherts, J. M. ( 2011 ). The (mis)reporting of statistical results in psychology journals . Behavior Research Methods, 43, 666 – 678 . doi:10.3758/s13428-011-0089-5

Google Scholar

Benjamin, D. J., Berger, J., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., . . . Johnson, V. ( 2017 ). Redefine statistical significance. Retrieved from psyarxiv.com/mky9j

Google Scholar

Bokhari, A. ( 2017 , March 29 ). J Scott Armstrong: Fewer than 1 percent of papers in scientific journals follow scientific method . breitbart.com. Retrieved from http://www.breitbart.com/tech/2017/03/29/j-scott-armstrong-fraction-1-papers-scientific-journals-follow-scientific-method/

Google Scholar

Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., Pierce, C. A. ( 2015 ). Correlational effect size benchmarks . Journal of Applied Psychology, 100, 431 – 449 . doi:10.1037/a0038047

Google Scholar ISI

Bozarth, J. D., Roberts, R. R. ( 1972 ). Signifying significant significance . American Psychologist, 27, 774 – 775 . doi:10.1037/h0038034

Google Scholar ISI

Brandstätter, E., Gigerenzer, G., Hertwig, R. ( 2006 ). The priority heuristic: Making choices without trade-offs . Psychological Review, 113, 409 – 432 . doi:10.1037/0033-295X.113.2.409

Google Scholar ISI

Breiman, L. ( 2001 ). Statistical modeling: The two cultures . Statistical Science, 16, 199 – 231 . doi:10.1214/ss/1009213726

Google Scholar ISI

Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., Munafò, M. R. ( 2013 ). Power failure: Why small sample size undermines the reliability of neuroscience . Nature Reviews Neuroscience, 14, 365 – 376 . doi:10.1038/nrn3475

Google Scholar ISI

Cohen, J. ( 1962 ). The statistical power of abnormal-social psychological research: A review . Journal of Abnormal and Social Psychology, 65, 145 – 153 . doi:10.1037/h0045186

Google Scholar ISI

Cohen, J. ( 1969 ). Statistical power analysis for the behavioral sciences. New York, NY : Academic Press .

Google Scholar

Colquhoun, D. ( 2014 ). An investigation of the false discovery rate and the misinterpretation of p-values . Royal Society Open Science, 1(3), Article 140216. doi:10.1098/rsos.140216

Google Scholar

Cumming, G. ( 2008 ). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better . Perspectives on Psychological Science, 3, 286 – 300 . doi:10.1111/j.1745-6924.2008.00079.x

Google Scholar SAGE Journals | ISI

Cumming, G. ( 2014 ). The new statistics: Why and how . Psychological Science, 25, 7 – 29 . doi:10.1177/0956797613504966

Google Scholar SAGE Journals | ISI

Danziger, K. ( 1987 ). Statistical method and the historical development of research practice in American psychology . In Krüger, L., Gigerenzer, G., Morgan, M. S. (Eds.), The probabilistic revolution, Vol. 2: Ideas in the sciences (pp. 35 – 47 ). Cambridge, MA : MIT Press .

Google Scholar

Danziger, K. ( 1990 ). Constructing the subject: Historical origins of psychological research. Cambridge, England : Cambridge University Press .

Google Scholar

Dulaney, S., Fiske, A. P. ( 1994 ). Cultural rituals and obsessive-compulsive disorder: Is there a common psychological mechanism? Ethos, 22, 243 – 283 . doi:10.1525/eth.1994.22.3.02a00010

Google Scholar ISI

Falk, R., Greenbaum, C. W. ( 1995 ). Significance tests die hard . Theory & Psychology, 5, 75 – 98 . doi:10.1177/0959354395051004

Google Scholar SAGE Journals | ISI

Finch, S., Cumming, G., Thomason, N. ( 2001 ). Reporting of statistical inference in the Journal of Applied Psychology: Little evidence of reform . Educational and Psychological Measurement, 61, 181 – 210 . doi:10.1177/00131640121971167

Google Scholar ISI

Fisher, R. A. ( 1935 ). The design of experiments. Edinburgh, Scotland : Oliver & Boyd .

Google Scholar

Fisher, R. A. ( 1955 ). Statistical methods and scientific induction . Journal of the Royal Statistical Society, Series B, 17, 69 – 78 .

Google Scholar

Fisher, R. A. ( 1956 ). Statistical methods and scientific inference. Edinburgh, Scotland : Oliver & Boyd .

Google Scholar

Freedman, L. P., Cockburn, I. A., Simcoe, T. S. ( 2015 ). The economics of reproducibility in preclinical research . PLOS Biology, 13(6), Article e1002165. doi:10.1371/journal.pbio.1002165

Google Scholar ISI

García-Berthou, E., Alcaraz, C. ( 2004 ). Incongruence between test statistics and P values in medical papers . BMC Medical Research Methodology, 4, Article 13. doi:10.1186/1471–2288-4-13

Google Scholar

Garcia-Pérez, M. A., Alcalá-Quintana, R. ( 2016 ). The interpretations of scholars’ interpretations of confidence intervals: Criticism, replication, and extension of Hoekstra et al. (2014) . Frontiers in Psychology, 7, Article 1042 . doi:10.3389/fpsyg.2016.01042

Google Scholar

Gigerenzer, G. ( 1987 ). Probabilistic thinking and the fight against subjectivity . In Krüger, L., Gigerenzer, G., Morgan, M. S. (Eds.), The probabilistic revolution, Vol. 2. Ideas in the sciences (pp. 11 – 33 ). Cambridge, MA : MIT Press .

Google Scholar

Gigerenzer, G. ( 1993 ). The Superego, the Ego, and the Id in statistical reasoning . In Keren, G., Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 313 – 339 ). Hillsdale, NJ : Erlbaum .

Google Scholar

Gigerenzer, G. ( 2004 ). Mindless statistics . Journal of Socio-Economics, 33, 587 – 606 . doi:10.1016/j.socec.2004.09.033

Google Scholar

Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M., Woloshin, S. ( 2007 ). Helping doctors and patients to make sense of health statistics . Psychological Science in the Public Interest, 8, 53 – 96 . doi:10.1111/j.1539-6053.2008.00033.x

Google Scholar SAGE Journals

Gigerenzer, G., Krauss, S., Vitouch, O. ( 2004 ). The null ritual: What you always wanted to know about null hypothesis testing but were afraid to ask . In Kaplan, D. (Ed.), Handbook on quantitative methods in the social sciences (pp. 391 – 408 ). Thousand Oaks, CA : Sage .

Google Scholar

Gigerenzer, G., Muir Gray, J. A. (Eds.). ( 2011 ). Better doctors, better patients, better decisions: Envisioning health care 2020. Cambridge, MA : MIT Press .

Google Scholar

Gigerenzer, G., Murray, D. J. ( 1987 ). Cognition as intuitive statistics. Hillsdale, NJ : Erlbaum .

Google Scholar

Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., Krüger, L. ( 1989 ). The empire of chance: How probability changed science and everyday life. Cambridge, England : Cambridge University Press .

Google Scholar

Greenland, S. ( 1990 ). Randomization, statistics, and causal inference . Epidemiology, 1, 421 – 429 .

Google Scholar

Greenland, S. ( 2011 ). Null misinterpretation in statistical testing and its impact on health risk assessment . Preventive Medicine, 53, 225 – 228 .

Google Scholar ISI

Greenland, S. ( 2012 ). Nonsignificance plus high power does not imply support for the null over the alternative . Annals of Epidemiology, 22, 364 – 368 . doi:10.1016/j.annepidem.2012.02.007

Google Scholar

Greenwald, A. G., Gonzalez, R., Harris, R. J., Guthrie, D. ( 1996 ). Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology, 33, 175 – 183 . doi:10.1111/j.1469-8986.1996.tb02121.x

Google Scholar ISI

Guilford, J. P. ( 1942 ). Fundamental statistics in psychology and education (1st ed.). New York, NY : McGraw-Hill .

Google Scholar

Guilford, J. P. ( 1956 ). Fundamental statistics in psychology and education ( 3rd ed. ). New York, NY : McGraw-Hill .

Google Scholar

Guilford, J. P. ( 1965 ). Fundamental statistics in psychology and education ( 4th ed. ). New York, NY : McGraw-Hill .

Google Scholar

Hafenbrädl, S., Hoffrage, U. ( 2015 ). Toward an ecological analysis of Bayesian inference: How task characteristics influence responses . Frontiers in Psychology, 6, Article 939. doi:10.3389/fpsyg.2015.00939

Google Scholar

Haller, H., Krauss, S. ( 2002 ). Misinterpretations of significance: A problem students share with their teachers? Methods of Psychological Research Online, 7(1), 1 – 20 . Retrieved from https://www.metheval.uni-jena.de/lehre/0405-ws/evaluationuebung/haller.pdf

Google Scholar

Hoekstra, H., Finch, S., Kiers, H. A. L., Johnson, A. ( 2006 ). Probability as certainty: Dichotomous thinking and the misuse of p values . Psychonomic Bulletin & Review, 13, 1033 – 1037 . doi:10.3758/BF03213921

Google Scholar ISI

Hoekstra, H., Morey, R. D., Rouder, J. N., Wagenmakers, E.-J. ( 2014 ). Robust misinterpretation of confidence in-tervals . Psychonomic Bulletin & Review, 21, 1157 – 1164 . doi:10.3758/s13423-013-0572-3

Google Scholar ISI

Horton, R. ( 2016 ). Offline: What is medicine’s 5 sigma? The Lancet, 385, 1380 .

Google Scholar

Ioannidis, J. P. A. ( 2005 ). Why most published research findings are false . PLOS Medicine, 2(8), Article e124. doi:10.1371/journal.pmed.0020124

Google Scholar ISI

Ioannidis, J. P. A. ( 2014 ). How to make more published research true . PLOS Medicine, 11(10), Article e1001747. doi:10.1371/journal.pmed.1001747

Google Scholar ISI

Ioannidis, J. P. A., Greenland, S., Hlatky, M. A., Khoury, M. J., Macleod, M. R., Moher, D., . . . Tibshirani, R. ( 2014 ). Increasing value and reducing waste in research design, conduct, and analysis . The Lancet, 383, 166 – 175 . doi:10.1016/s0140-6736(13)62227-8

Google Scholar ISI

John, L. K., Loewenstein, G., Prelec, D. ( 2012 ). Measuring the prevalence of questionable research practices with incentives for truth telling . Psychological Science, 23, 524 – 532 . doi:10.1177/0956797611430953

Google Scholar SAGE Journals | ISI

Jones, C. W., Keil, L. G., Holland, W. C., Caughey, M. C., Platts-Mills, T. F. ( 2015 ). Comparison of registered and published outcomes in randomized controlled trials: A systematic review . BMC Medicine, 13, Article 282. doi:10.1186/s12916-015-0520-3

Google Scholar

Jones, L. V., Tukey, J. W. ( 2000 ). A sensible formulation of the significance test . Psychological Methods, 5, 411 – 414 . doi:10.1037/1082-989X.5.4.411

Google Scholar ISI

Lecoutre, M. P., Poitevineau, J., Lecoutre, B. ( 2003 ). Even statisticians are not immune to misinterpretations of Null Hypothesis Significance Tests . International Journal of Psychology, 38, 37 – 45 . doi:10.1080/00207590244000250

Google Scholar

Lehrer, J. ( 2010 , December 13 ). The truth wears off: Is there something wrong with the scientific method? The New Yorker. Retrieved from https://www.newyorker.com/magazine/2010/12/13/the-truthwears-off

Google Scholar

Loftus, G. R. ( 1993 ). Editorial comment . Memory & Cognition, 21, 1 – 3 . doi:10.3758/BF03211158

Google Scholar

Madden, C. S., Easley, R. W., Dunn, M. G. ( 1995 ). How journal editors view replication research . Journal of Advertising, 24, 77 – 87 .

Google Scholar

Makel, M. C., Plucker, J. A., Hegarty, B. ( 2012 ). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7, 537 – 542 . doi:10.1177/1745691612460688

Google Scholar SAGE Journals | ISI

Marszalek, J. M., Barber, C., Kohlhart, J., Holmes, C. B. ( 2011 ). Sample size in psychological research over the past 30 years . Perceptual and Motor Skills, 112, 331 – 348 . doi:10.2466/03.11.pms.112.2.331-348

Google Scholar SAGE Journals | ISI

Melton, A. W. ( 1962 ). Editorial . Journal of Experimental Psychology, 64, 553 – 557 . doi:10.1037/h0045549

Google Scholar

Mirowski, P. ( 2011 ). Science-mart: Privatizing American science. Cambridge, MA : Harvard University Press .

Google Scholar

Mullard, A. ( 2011 ). Reliability of ‘new drug target’ claims called into question . Nature Reviews Drug Discovery, 10, 643 – 644 . doi:10.1038/nrd3545

Google Scholar ISI

Neyman, J., Pearson, E. S. ( 1933 ). On the problem of the most efficient tests of statistical hypotheses . Philosophical Transactions of the Royal Society of London. Series A, 231, 289 – 337 . doi:10.1098/rsta.1933.0009

Google Scholar

Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., Wicherts, J. M. ( 2016 ). The prevalence of statistical reporting errors in psychology (1985–2013) . Behavior Research Methods, 48, 1205 – 1226 . doi:10.3758/s13428-015-0664-2

Google Scholar

Nunnally, J. C. ( 1975 ). Introduction to statistics for psychology and education. New York, NY : McGraw-Hill .

Google Scholar

Oakes, M. ( 1986 ). Statistical inference: A commentary for the social and behavioral sciences. New York, NY : Wiley .

Google Scholar

Open Science Collaboration . ( 2015 ). Estimating the reproducibility of psychological science . Science, 349, Article aac4716. doi:10.1126/science.aac4716

Google Scholar ISI

Pashler, H., Coburn, N., Harris, C. R. ( 2012 ). Priming of social distance? Failure to replicate effects on social and food judgments . PLOS ONE, 7(8), Article e42510. doi:10.1371/journal.pone.0042510

Google Scholar ISI

Pashler, H., Wagenmakers, E.-J. ( 2012 ). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7, 528 – 530 . doi:10.1177/1745691612465253

Google Scholar SAGE Journals | ISI

Pearson, E. S. ( 1939 ). “Student” as statistician . Biometrika, 30, 210 – 250 . doi:10.2307/2332648

Google Scholar

Pearson, E. S. ( 1962 ). Some thoughts on statistical inference . Annals of Mathematical Statistics, 33, 394 – 403 . doi:10.1214/aoms/1177704566

Google Scholar

Prinz, F., Schlange, T., Asadullah, K. ( 2011 ). Believe it or not: How much can we rely on published data on potential drug targets? Nature Reviews Drug Discovery, 10, 712 . doi:10.1038/nrd3439-c1

Google Scholar ISI

Rami, M. K. ( 2014 ). Power and effect size measures: A census of articles published from 2009-2012 in the Journal of Speech, Language, and Hearing Research . American International Journal of Social Science, 3, 13 – 19 .

Google Scholar

Rosenthal, R. ( 1993 ). Cumulating evidence . In Keren, G., Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 519 – 559 ). Hillsdale, NJ : Erlbaum .

Google Scholar

Scheibehenne, B., Greifeneder, R., Todd, P. M. ( 2010 ). Can there ever be too many options? A meta-analytic review of choice overload . Journal of Consumer Research, 37, 409 – 425 . doi:10.1086/651235

Google Scholar ISI

Schoenfeld, J. D., Ioannidis, J. P. A. ( 2013 ). Is everything we eat associated with cancer? A systematic cookbook review . American Journal of Clinical Nutrition, 97, 127 – 134 . doi:10.3945/ajcn.112.047142

Google Scholar ISI

Schooler, J. ( 2011 ). Unpublished results hide the decline effect . Nature, 470, 437. doi:10.1038/470437

Google Scholar

Schünemann, H., Ghersi, D., Kreis, J., Antes, G., Bousquet, J. ( 2011 ). Reporting of research: Are we in for better health care by 2020? In Gigerenzer, G., Muir Gray, J. A. (Eds.), Better doctors, better patients, better decisions: Envisioning health care 2020 (pp. 83 – 102 ). Cambridge, MA : MIT Press .

Google Scholar

Sedlmeier, P., Gigerenzer, G. ( 1989 ). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309 – 316 . doi:10.1037/0033-2909.105.2.309

Google Scholar ISI

Simmons, J. P., Nelson, L. D., Simonsohn, U. ( 2011 ). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant . Psychological Science, 22, 1359 – 1366 . doi:10.1177/0956797611417632

Google Scholar SAGE Journals | ISI

Smaldino, P. E., McElreath, R. ( 2016 ). The natural selection of bad science . Royal Society Open Science, 3(9), Article 160384. doi:10.1098/rsos.160384

Google Scholar

Snedecor, G. W. ( 1937 ). Statistical methods ( 1st ed. ). Ames : Iowa State Press .

Google Scholar

Sterling, T. D. ( 1959 ). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa . Journal of the American Statistical Association, 54, 30 – 34 . doi:10.2307/2282137

Google Scholar ISI

Sterling, T. D., Rosenbaum, W., Weinkam, J. ( 1995 ). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa . The American Statistician, 49, 108 – 112 . doi:10.2307/2684823

Google Scholar ISI

Stigler, S. M. ( 1999 ). Statistics on the table: The history of statistical concepts and methods. Cambridge, MA : Harvard University Press .

Google Scholar

Stroebe, W., Strack, F. ( 2014 ). The alleged crisis and the illusion of exact replication . Perspectives on Psycholog-ical Science, 9, 59 – 71 . doi:10.1177/1745691613514450

Google Scholar SAGE Journals | ISI

Szucs, D., Ioannidis, J. P. A. ( 2017 ). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature . PLOS Biology, 15(3), Article e2000797. doi:10.1371/journal.pbio.2000797

Google Scholar

Tversky, A., Kahneman, D. ( 1971 ). Belief in the law of small numbers . Psychological Bulletin, 76, 105 – 110 . doi:10.1037/h0031322

Google Scholar

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., Kievit, R. A. ( 2012 ). An agenda for purely confirmatory research . Perspectives on Psychological Science, 7, 632 – 638 . doi:10.1177/1745691612463078

Google Scholar SAGE Journals | ISI

Walker, K. F., Stevenson, G., Thornton, J. G. ( 2014 ). Discrepancies between registration and publication of randomised controlled trials: An observational study . Journal of the Royal Society of Medicine Open, 5(5). doi:10.1177/2042533313517688

Google Scholar SAGE Journals

Wegwarth, O., Schwartz, L. M., Woloshin, S., Gaissmaier, W., Gigerenzer, G. ( 2012 ). Do physicians understand cancer screening statistics? A national survey of primary care physicians in the United States . Annals of Internal Medicine, 156, 340 – 349 . doi:10.7326/0003-4819-156-5-201203060-00005

Google Scholar ISI

Welch, H. G. ( 2011 ). Overdiagnosed: Making people sick in the pursuit of health. Boston, MA : Beacon Press .

Google Scholar

Westover, M. B., Westover, K. D., Bianchi, M. T. ( 2011 ). Significance testing as perverse probabilistic reasoning . BMC Medicine, 9, Article 20. doi:10.1186/1741-7015-9-20

Google Scholar