The results of many scientific papers are wrong. There are many reasons for this, including p-hacking, publication bias, and the general inability to replicate results. But there's another, more mundane cause: incorrect calculation of p-values in statistical tests. This could be caused by simple transcription errors when plugging numbers into a statistical tool, incorrect rounding, or misapplication of the test itself (say, applying a two-sided test when a 1-sided p-value is appropriate). Such errors should be picked up in the peer review process, but given that even expert statisicians sometimes struggle to explain p-values, it's not surprising that some errors get through.

That's why Michèle B. Nuijten, a PhD student at Tilburg University, created the R package statcheck. Given a paper to be published in a psychology journal, statcheck searches for statistical results from \(t\), \(F\), \(r\), \(\chi^2\), and \(Z\) tests, and compares the published p-value to a value calculated by R. This is possible only because the American Psychological Association Style Guide has a very specific format for reporting statistical results, listing the p-value next to the reported test statistic. Statcheck also attempts to detect if the surrounding language mentions a "one-sided" or "one-tailed" test and calculates the p-value in R accordingly (although this process isn't perfect). Anyone can use statcheck by uploading a PDF or HTML version of their paper to the statcheck web application, or by using the statcheck function within R directly.

statcheck compares reported p-values with computed p-values and reports discrepancies, noting whether the difference would have changed the outcome of the test.

Nuijten recounts the origins and development of statcheck in an interesting article in Retraction Watch. One major surprise: when they applied statcheck to p-values reported in eight major psychology journals from 1985 to 2013:

Half of the papers in psychology contain at least one statistical reporting inconsistency, and one in eight papers contain an inconsistency that might have affected the statistical conclusion.

Since then, they've further automated statcheck by automatically sharing the results of its analyses for 50,000 papers at PubPeer. Not everyone was pleased by the notifications (a former president of the Association for Psychological Science called it 'methodological terrorism'), but the process did reveal more inconsistencies in published papers.

For more on statcheck, check out its website at the link below.

Michèle B. Nuijten: R package “statcheck”: Extract statistics from articles and recompute p values (Epskamp & Nuijten, 2016)