Mini-paper out in JAMA by Matt Vassar and pals: “Evaluation of Lowering the P Value Threshold for Statistical Significance From .05 to .005 in Previously Published Randomized Clinical Trials in Major Medical Journals”. Thanks to Steve Milloy for the tip.

Authors scanned JAMA, Lancet, and NEJM for wee ps, and then asked how many study’s p’s survived being wee after dividing the magic number by 10. Seventy-percent was their answer. Meaning 30% of official findings would have to be tossed for not achieving super significance.

Somewhat amusingly, and unnecessarily, they computed regression models on the results and reported 95%—and not 99.5%—confidence intervals.

Never mind. Making ps weer does not solve any of the logical and philosophical difficulties of p-values, as in part is as partly explained in this peer-reviewed (and therefore perfectly true and indisputable) paper: Manipulating the Alpha Level Cannot Cure Significance Testing.

As a bonus, here is just one of a dozen or two criticisms of p-values that will appear in a new peer-reviewed (and therefore true and indisputable) paper in January. This is not the strongest criticism, nor even in the top five. But it alone is enough to quash their use.

(I’m leaving it in LaTeX format so you can get a hint about the citations.)

Excerpt

P-values are Not Decisions

If the p-value is wee, a decision is made to reject the null hypothesis, and vice versa (ignoring the verbiage “fail to reject”). Yet the consequences of this decision are not quantified using the p-value. The decision to reject is just the same, and therefore just as consequential, for a p-value of 0.05 as one of 0.0005. Some have the habit of calling especially wee p-values as “highly significant”, and so forth, but this does not accord with frequentist theory, and is in fact forbidden by that theory because it seeks a way around the proscription of applying probability to hypotheses. The p-value, as frequentist theory admits, is not related in any way to the probability the null is true or false. Therefore the size of the p-value does not matter. Any level chosen as “significant” is, as proved above, an act of will.

A consequence of the frequentist idea that probability is ontic and that true models exist (at the limit) is the idea that the decision to reject or accept some hypothesis should be the same for all. Steve Goodman calls this idea “naive inductivism”, which is “a belief that all scientists seeing the same data should come to the same conclusions,” \cite{Goo2001}. That this is false should be obvious enough. Two men do not always make the same bets even when the probabilities are deduced from first principles, and are therefore true. We should not expect all to come to agreement on believing a hypothesis based on tests concocted from {\it ad hoc} models. This is true, and even stronger, in a predictive sense, where conditionality is insisted upon.

Two (or more) people can come to completely different predictions, and therefore difference decisions, even when using the same data. Incorporating decision in the face of uncertainty implied by models is only partly understood. New efforts along these lines using quantum probability calculus, especially in economic decisions, are bound to pay off, see e.g. \cite{NguSri2019}.

A striking and in-depth example of how using the same model and same data can lead people to {\it opposite} beliefs and decisions is given by Jaynes in his chapter “Queer uses for probability theory”, \cite{Jay2003}.

Share this: Facebook

Reddit

Twitter

Pinterest

Email

More

Tumblr

LinkedIn



WhatsApp

Print



