Nature magazine reports “‘One-size-fits-all’ threshold for P values under fire: Scientists hit back at a proposal to make it tougher to call findings statistically significant.”

Researchers are at odds over when to dub a discovery ‘significant’. In July, 72 researchers took aim at the P value, calling for a lower threshold for the popular but much-maligned statistic. In a response published on 18 September, a group of 88 researchers have responded, saying that a better solution would be to make academics justify their use of specific P values, rather than adopt another arbitrary threshold. P values have been used as measures of significance for decades, but academics have become increasingly aware of their shortcomings and the potential for abuse. In 2015, one psychology journal banned P values entirely. The statistic is used to test a ‘null hypothesis’, a default state positing that there is no relationship between the phenomena being measured. The smaller the P value, the less likely it is that the results are due to chance — presuming that the null hypothesis is true. Results have typically been deemed ‘statistically significant’ — and the null hypothesis dismissed — when P values are below 0.05.

It is past time to eliminate p-values entirely, as I argue in Uncertainty. See most recently “P-values vs. Bayes Is A False Dichotomy“, and the book page with links to many articles. There is a third way besides Bayes and P.

Especially if you are a p-value supporter, re-read these two lines:

The statistic is used to test a ‘null hypothesis’, a default state positing that there is no relationship between the phenomena being measured. The smaller the P value, the less likely it is that the results are due to chance — presuming that the null hypothesis is true.

Now without using causal language, because p-values (nor any probability model) cannot discover cause, explain what “no relationship” means. Then, again without using causal language, explain “results are due to chance”. (Don’t forget to do this.)

I assert, and prove in Uncertainty, that you cannot explain “no relationship” or “due to chance” without using causal language or by assuming probability exists. I mean exists in the proper metaphysical sense, the same sense as saying the screen on which you are reading this exists (or paper, if some generous soul has printed it out for you).

Chance does not exist. That which does not exist cannot cause anything to happen. Probability does not exist. That which does not exist cannot cause anything to happen. Nothing can therefore be “due to” chance. Probability cannot establish a relationship in any ontic sense.

Probability is epistemic. It is a epistemological measure, not necessarily quantitative, between a set of premises (or assumptions, measurements, etc.) and a proposition of interest. That, and nothing more. (This is no different than what logic is, of course.)

That simple statement is the third way. Eliminate p-values entirely, and Bayesian inference of non-observable parameters, and concentrate on probability. In science, which centers around observables, given a model, make probabilistic predictions of never-before-seen-in-any-way observables. And then check those predictions against reality. This is what civil engineers do when building bridges, and it is what solid-state physicists do when creating circuits.

Why not do the same for psychology, medicine, sociology, and other statistics-relying fields?

Answer: why not, indeed.

Setting specific thresholds for standards of evidence is “bad for science”, says Ronald Wasserstein, executive director of the American Statistical Association, which last year took the unusual step of releasing explicit recommendations on the use of P values for the first time in its 177-year history. Next month, the society will hold a symposium on statistical inference, which follows on from its recommendations. Wasserstein says he hasn’t yet taken a position on the current debate over P value thresholds, but adds that “we shouldn’t be surprised that there isn’t a single magic number”.

There isn’t, though the vast majority of users of p-values think there is. The threshold picked is mesmerizing. The number 0.04999 brings joy, 0.05001 tears. This happens.

I’m not a member of the American Statistical Association (or any other organization), so won’t be at the meeting Wasserstein mentions. I have a small paper coming out soon (I thought it would be out by now) in the Journal of the American Statistical Association, in answer to a discussion on p-values, detailing the third, i.e. predictive, way. I don’t guess it will show before the conference, which is in a couple of weeks.

I only heard of the conference after it was set, so I’ll miss that opportunity to spread the word (in an official talk). But if you’re going, or go, let us know about it in the comments below.

Thanks to Marcel Crok for notifying us of this article.

Share this: Facebook

Reddit

Twitter

Pinterest

Email

More

Tumblr

LinkedIn



WhatsApp

Print



