In testing a point null hypothesis H 0 against an alternative hypothesis H 1 based on data x obs , the P value is defined as the probability, calculated under the null hypothesis, that a test statistic is as extreme or more extreme than its observed value. The null hypothesis is typically rejected — and the finding is declared statistically significant — if the P value falls below the (current) type I error threshold α = 0.05.

From a Bayesian perspective, a more direct measure of the strength of evidence for H 1 relative to H 0 is the ratio of their probabilities. By Bayes’ rule, this ratio may be written as:

$$\frac{{\rm{\Pr }}\left({H}_{1}\left|{x}_{{\rm{obs}}}\right.\right)}{{\rm{\Pr }}\left({H}_{0}\left|{x}_{{\rm{obs}}}\right.\right)}=\frac{f\left({x}_{{\rm{obs}}}\left|{H}_{1}\right.\right)}{f\left({x}_{{\rm{obs}}}\left|{H}_{0}\right.\right)}\times \frac{{\rm{\Pr }}\left({H}_{1}\right)}{{\rm{\Pr }}\left({H}_{0}\right)}\equiv {\rm{BF}}\times \left({\rm{prior}}\,{\rm{odds}}\right)$$ (1)

where BF is the Bayes factor that represents the evidence from the data, and the prior odds can be informed by researchers’ beliefs, scientific consensus, and validated evidence from similar research questions in the same field. Multiple-hypothesis testing, P-hacking and publication bias all reduce the credibility of evidence. Some of these practices reduce the prior odds of H 1 relative to H 0 by changing the population of hypothesis tests that are reported. Prediction markets3 and analyses of replication results4 both suggest that for psychology experiments, the prior odds of H 1 relative to H 0 may be only about 1:10. A similar number has been suggested in cancer clinical trials, and the number is likely to be much lower in preclinical biomedical research5.

There is no unique mapping between the P value and the Bayes factor, since the Bayes factor depends on H 1 . However, the connection between the two quantities can be evaluated for particular test statistics under certain classes of plausible alternatives (Fig. 1).

Fig. 1: Relationship between the P value and the Bayes factor. The Bayes factor (BF) is defined as \(\frac{f\left({x}_{{\rm{obs}}}\left|{H}_{1}\right.\right)}{f\left({x}_{{\rm{obs}}}\left|{H}_{0}\right.\right)}\). The figure assumes that observations are independent and identically distributed (i.i.d.) according to x ~ N(μ,σ 2), where the mean μ is unknown and the variance σ 2 is known. The P value is from a two-sided z-test (or equivalently a one-sided \({\chi }_{1}^{2}\)-test) of the null hypothesis H 0 : μ = 0. Power (red curve): BF obtained by defining H 1 as putting ½ probability on μ = ±m for the value of m that gives 75% power for the test of size α = 0.05. This H 1 represents an effect size typical of that which is implicitly assumed by researchers during experimental design. Likelihood ratio bound (black curve): BF obtained by defining H 1 as putting ½ probability on μ = ±\(\hat{x}\), where \(\hat{x}\) is approximately equal to the mean of the observations. These BFs are upper bounds among the class of all H 1 terms that are symmetric around the null, but they are improper because the data are used to define H 1 . UMPBT (blue curve): BF obtained by defining H 1 according to the uniformly most powerful Bayesian test2 that places ½ probability on μ = ±w, where w is the alternative hypothesis that corresponds to a one-sided test of size 0.0025. This curve is indistinguishable from the ‘Power’ curve that would be obtained if the power used in its definition was 80% rather than 75%. Local-H 1 bound (green curve): \({\rm{BF}}=\frac{1}{-ep\phantom{\rule{0em}{0ex}}ln\phantom{\rule{0em}{0ex}}p}\), where p is the P value, is a large-sample upper bound on the BF from among all unimodal alternative hypotheses that have a mode at the null and satisfy certain regularity conditions15. The red numbers on the y axis indicate the range of Bayes factors that are obtained for P values of 0.005 or 0.05. For more details, see the Supplementary Information. Full size image