TL;DR

It seems that a much higher proportion of landmark findings in biomedicine than previously thought are false.

Research in the scientific method suggests that this is largely due to lack of rigor in scientific research.

Albeit most blame researchers for this status, the true culprit may be a misalignment of incentives.

The new generation is facing the impossible dilemma of abiding to principles of good practice versus incentives.

We need to change how research is funded and rewarded if we are to make more of what is published true.

Article

The scientific establishment is arguably in a state of turmoil. Theoretical estimates suggesting that most published findings in biomedicine are false (Ioannidis, 2005), are now progressively corroborated by empirical evidence. An attempt by Amgen researchers to reproduce landmark studies in cancer biology could only reproduce 6/53 studies (Begley and Ellis, 2012) and a similar attempt by BayerHealth could only adequately validate 9–12/47 of high-profile studies in biomedicine (Prinz et al., 2011). Indeed, a series of articles published in the prestigious medical journal The Lancet estimated that roughly 85% of global research investment (about $200 billion in 2010), is wasted due to lack of appropriate scientific practice (Macleod et al., 2014; Røttingen et al., 2013). Unfortunately, these problems are not inconsequential.

One of the most obvious such examples is the now retracted 1998 article in The Lancet by Wakefield et al., which purported that the measles, mumps and rubella (MMR) vaccine is associated with ‘pervasive developmental disorder in children’ (Wakefiled et al., 1998). These claims were made on the basis of a case series of twelve children, a study design and sample size that simply cannot make such bold claims. Despite multiple refutations of that early finding by better designed and more powerful studies (Jain et al., 2015), the repercussions of that paper have been far-reaching and have impacted public health and spending ever since. Another paper by Turner et al., published in the New England Journal of Medicine, another prestigious medical journal, revealed how the published literature on anti-depressant trials conveys a very skewed image of the truth (Turner et al., 2008). Even though out of 74 clinical trials registered with the Food and Drug Administration (FDA) only 38 identified a ‘positive’ result, most trials with a ‘negative’ result were either not published, or spun in such a way as to convey a ‘positive’ message; as such, out of 52 published trials, only 3 in fact conveyed a ‘negative’ result. So, who is to blame for this status quo in biomedical research?

As humans, we need to balance practicing science with absolute and uncompromising rigor on the one hand, with the reality of survival on the other.

The bullet for these shortcomings is most commonly taken by the researchers. Despite calls for open data and transparency, researchers do not routinely make such data available; even when they do, these data tend to be in too messy a state to make any sense of. Despite calls for appropriate statistical analyses and use of p-values, most prominent of which was the recent statement on p-values by the American Statistical Association (Wasserstein, 2016), researchers keep hunting for p-values below 0.05 and selecting appropriate analyses to achieve that, a practice known as “p-hacking”. Despite calls for publishing null effects and approaching published studies systematically rather than cherry-picking the ones most suited to a researcher’s arguments, the grand majority of literature is still one of ‘positive’ effects. Nevertheless, “The truth is rarely pure and never simple” (Oscar Wilde, The importance of being Earnest).

The problem is that we researchers, in the grand scheme of science, are a cogwheel within a cogwheel, and more than scientists we are human. As humans, we need to balance practicing science with absolute and uncompromising rigor on the one hand, with the reality of survival on the other. Our current reward systems (such as getting published in a prestigious venue, receiving an award or receiving tenure) more often than not, favor statistically significant results capable of making headlines, more than a robust study design, statistical methods and replicability (Macleod et al., 2014). Similarly, funding and tenure, ultimately favor those in well-known universities, within well-known and populous labs, capable of making their way into prestigious publications, churning out volumes of research and accumulating the number of citations they necessitate (Greenberg, 2009). This is the game.

We are part of a system that selectively penalizes those that choose to abide by integrity, rigor and conscientious science and this will remain the case, unless we target the problem at its root.

Can we blame those who choose to abide by the rules of the game? Can we blame those who choose to analyze data, often accumulated over a substantial period of time, with analysis B rather than analysis A, because analysis B yields a p-value less than 0.05 and immediately improves the chances of reaching a prestigious journal? Can we blame them for not advocating for a change? The game has implicitly promised that creating this kind of headline research that can get into prestigious journals, attract citations and lead to interviews in venues such as the New York Times will secure funding and prestige. What would happen if the rules of the game changed all of a sudden? Would these researchers have to work themselves up the ladder again?

I think that we are all to blame for the state of biomedical research quality. Researchers, doctors, funding bodies, journals, readers, politicians, are all to blame. We need to accept that most research is unreliable and often false, but we need to realize that it could have been nothing but false. We are part of a system that selectively penalizes those that choose to abide by integrity, rigor and conscientious science and this will remain the case, unless we target the problem at its root. We scientists need to work on creating knowledge that approximates the truth at the best of our ability and without having to balance this objective with ones that should not affect science, such as obtaining “positive” results or outcompeting colleagues.

A scientist’s work should be valued for its integrity, its rigor, its maturity and its robustness, rather than its volume and ability to harness a p-value.

However, as a young scientist myself, this is rarely the kind of research I am being taught by the scientific establishment. My peers and I are mostly taught that publishing your research in anything less than the top 5 journals in terms of impact factor is poor performance. We are being taught that our experiments have failed and are not worth publishing until they cross the meaningless cut-off point of 0.05. Concurrently, we are not being taught how to design experiments appropriately and utilize such concepts as randomization, blinding and validation (Begley, 2013). We are also not being taught how to report our experiments using guidelines of best practice and in such a way as to facilitate re-analysis and synthesis of our work within systematic reviews, meta-analyses or posterior probabilities.

The scientific method is not static — as for any method in science, the scientific method itself ought to evolve. Over the past couple of decades, hundreds of papers have been published on how to improve on this method such that more of what we publish is true (Ioannidis et al., 2014). However, because of how science is funded and rewarded at the moment, current scientists have been paralyzed in adopting any of those proposals and my generation is in a state of checkmate.

I hereby plead that the scientific establishment and the world let us embrace the evolution of the scientific method, be taught what we now know are important principles in scientific research and communication and let us practice the kind of science we should. A scientist’s work should be valued for its integrity, its rigor, its maturity and its robustness, rather than its volume and ability to harness a p-value. Let us be earnest about science, the way we should be earnest about working for the betterment of humanity at the best of our ability. It is for the benefit of all of us that we all, scientists or not, demand that the new generation of scientists be allowed to stand on the shoulders of giants that came before us and practice the kind of science we now know we should.

If you are interested in the study of how scientific research works and promoting evidence-based anything, you may be interested in checking out and subscribing to this sub-reddit on meta-research.

References

Begley CG. Six red flags for suspect work. Nature. 2013 May 23;497(7450):433–4. doi: 10.1038/497433a.

Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012 Mar 28;483(7391):531–3. doi: 10.1038/483531a.

Greenberg SA. How citation distortions create unfounded authority: analysis of a citation network. BMJ. 2009;339:b2680.

Ioannidis JP. Why most published research findings are false. PLoS Med. 2005 Aug;2(8):e124.

Ioannidis JP. How to make more published research true. PLoS Med. 2014 Oct 21;11(10):e1001747. doi: 10.1371/journal.pmed.1001747. eCollection 2014 Oct.

Jain A, Marshall J, Buikema A, Bancroft T, Kelly JP, Newschaffer CJ. Autism occurrence by MMR vaccine status among US children with older siblings with and without autism. JAMA. 2015 Apr 21;313(15):1534–40. doi: 10.1001/jama.2015.3077.

Macleod MR, Michie S, Roberts I, Dirnagl U, Chalmers I, Ioannidis JP, Al-Shahi Salman R, Chan AW, Glasziou P. Biomedical research: increasing value, reducing waste. Lancet. 2014 Jan 11;383(9912):101–4. doi: 10.1016/S0140–6736(13)62329–6.

Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011 Aug 31;10(9):712. doi: 10.1038/nrd3439-c1.

Røttingen JA, Regmi S, Eide M, et al. Mapping of available health research and development data: what’s there, what’s missing, and what role is there for a global observatory? Lancet 2013; 382: 1286–307.

Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med. 2008 Jan 17;358(3):252–60. doi: 10.1056/NEJMsa065779.

Wakefield AJ, Murch SH, Anthony A, Linnell J, Casson DM, Malik M, Berelowitz M, Dhillon AP, Thomson MA, Harvey P, Valentine A, Davies SE, Walker-Smith JA. Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children. Lancet. 1998 Feb 28; 351(9103):637–41.

Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. Am Stat. 2016 Jun 09; 70(2): 129–33