PLoS Medicine | www .plosmedicine.org 0700

standardization of laboratory and

statistical methods, outcomes, and

reporting thereof to minimize bias.

Claimed Research F indings

May Often Be Simply Accurate

Measures of the Pre vailing Bias

As shown, the majority of modern

biomedical research is operating in

areas with very low pre- and post-

study probability for true ﬁ ndings.

Let us suppose that in a research ﬁ eld

there are no true ﬁ ndings at all to be

discovered. History of science teaches

us that scientiﬁ c endeavor has often

in the past wasted effort in ﬁ elds with

absolutely no yield of true scientiﬁ c

information, at least based on our

current understanding. In such a “null

ﬁ eld,” one would ideally expect all

observed effect sizes to vary by chance

around the null in the absence of bias.

The extent that observed ﬁ ndings

deviate from what is expected by

chance alone would be simply a pure

measure of the prevailing bias.

For example, let us suppose that

no nutrients or dietary patterns are

actually important determinants for

the risk of developing a speciﬁ c tumor.

Let us also suppose that the scientiﬁ c

literature has examined 60 nutrients

and claims all of them to be related to

the risk of developing this tumor with

relative risks in the range of 1.2 to 1.4

for the comparison of the upper to

lower intake tertiles. Then the claimed

effect sizes are simply measuring

nothing else but the net bias that has

been involved in the generation of

this scientiﬁ c literature. Claimed effect

sizes are in fact the most accurate

estimates of the net bias. It even follows

that between “null ﬁ elds,” the ﬁ elds

that claim stronger effects (often with

accompanying claims of medical or

public health importance) are simply

those that have sustained the worst

biases.

For ﬁ elds with very low PPV, the few

true relationships would not distort

this overall picture much. Even if a

few relationships are true, the shape

of the distribution of the observed

effects would still yield a clear measure

of the biases involved in the ﬁ eld. This

concept totally reverses the way we

view scientiﬁ c results. Traditionally,

investigators have viewed large

and highly signiﬁ cant effects with

excitement, as signs of important

discoveries. Too large and too highly

signiﬁ cant effects may actually be more

likely to be signs of large bias in most

ﬁ elds of modern research. They should

lead investigators to careful critical

thinking about what might have gone

wrong with their data, analyses, and

results.

Of course, investigators working in

any ﬁ eld are likely to resist accepting

that the whole ﬁ eld in which they have

spent their careers is a “null ﬁ eld.”

However, other lines of evidence,

or advances in technology and

experimentation, may lead eventually

to the dismantling of a scientiﬁ c ﬁ eld.

Obtaining measures of the net bias

in one ﬁ eld may also be useful for

obtaining insight into what might be

the range of bias operating in other

ﬁ elds where similar analytical methods,

technologies, and conﬂ icts may be

operating.

How Can W e Improv e

the Situation?

Is it unavoidable that most research

ﬁ ndings are false, or can we improve

the situation? A major problem is that

it is impossible to know with 100%

certainty what the truth is in any

research question. In this regard, the

pure “gold” standard is unattainable.

However, there are several approaches

to improve the post-study probability.

Better powered evidence, e.g., large

studies or low-bias meta-analyses,

may help, as it comes closer to the

unknown “gold” standard. However,

large studies may still have biases

and these should be acknowledged

and avoided. Moreover, large-scale

evidence is impossible to obtain for all

of the millions and trillions of research

questions posed in current research.

Large-scale evidence should be

targeted for research questions where

the pre-study probability is already

considerably high, so that a signiﬁ cant

research ﬁ nding will lead to a post-test

probability that would be considered

quite deﬁ nitive. Large-scale evidence is

also particularly indicated when it can

test major concepts rather than narrow,

speciﬁ c questions. A negative ﬁ nding

can then refute not only a speciﬁ c

proposed claim, but a whole ﬁ eld or

considerable portion thereof. Selecting

the performance of large-scale studies

based on narrow-minded criteria,

such as the marketing promotion of a

speciﬁ c drug, is largely wasted research.

Moreover, one should be cautious

that extremely large studies may be

more likely to ﬁ nd a formally statistical

signiﬁ cant difference for a trivial effect

that is not really meaningfully different

from the null [32–34].

Second, most research questions

are addressed by many teams, and

it is misleading to emphasize the

statistically signiﬁ cant ﬁ ndings of

any single team. What matters is the

T able 4. PPV of Research Findings for Various C ombinations of Pow er (1 − β ), R atio

of T rue to Not-T rue Relationships (R) , and Bias (u)

1 − β Ru Practical Example PPV

0.80 1:1 0.10 A dequately powered RCT with little

bias and 1:1 pre-study odds

0.85

0.95 2:1 0.30 C onﬁ rmator y meta-analysis of good-

quality RCT s

0.85

0.80 1:3 0.40 Meta-analysis of small inconclusiv e

studies

0.41

0.20 1:5 0.20 Underpo wered, but well-performed

phase I/II RCT

0.23

0.20 1:5 0.80 Underpo wered, poorly performed

phase I/II RCT

0.17

0.80 1:10 0.30 Adequately pow ered exploratory

epidemiological study

0.20

0.20 1:10 0.30 Underpowered explor atory

epidemiological study

0.12

0.20 1:1,000 0.80 Discovery-oriented exploratory

research with massive testing

0.0010

0.20 1:1,000 0.20 As in previous example , but

with more limited bias (more

standardized)

0.0015

The estimated PPVs (positive predictive values) are deriv ed assuming α = 0.05 for a single study.

RCT , randomized controlled trial.

DOI: 10.1371/journal.pmed.0020124.t004

August 2005 | V olume 2 | Issue 8 | e124