Derek Lowe's commentary on drug discovery and the pharma industry. An editorially independent blog from the publishers of Science Translational Medicine . All content is Derek’s own, and he does not in any way speak for his employer.

Just how much crap is out there in the scientific literature? “Quite a bit” comes the answer from anyone with real experience of it, but that’s not too quantitative. Here, though is an analysis of one (perfectly respectable) journal of its own output, and the results are. . .well, they range from “pretty bad” to “honestly, I expected even worse”, depending on your level of cynicism.

Molecular and Cellular Biology and its parent organization (the American Society for Microbiology) went back over the published papers in the journal from 2009-2016 (960 total, 120 random papers per year), looking for doctored/duplicated images (which is still one of the easiest ways to spot sloppiness and fraud). The procedure they used seems effective, but does not scale very well: basically, the first step was to have Elisabeth Bik look at each paper (here’s an interview with her and the other co-authors). She seems to have a very good eye for image problems, and as an amateur astronomer, I can tell you that she would have made a very effective comet or supernova hunter for for the exact same reasons. “Cuts and beautifications” were not scored as problematic – there needed to be outright duplications and/or serious alterations.

What they found was 59 papers with clear duplications, and in each case the authors were contacted:

The 59 instances of inappropriate image duplications led to 42 corrections, 5 retractions and 12 instances in which no action was taken (Table 1). The reasons for not taking action included origin from laboratories that had closed (2 papers), resolution of the issue in correspondence (4 papers), and occurrence of the event more than six years earlier (6 papers), consistent with ASM policy and Federal regulations established in 42 CFR § 93.105 for pursuing allegations of research misconduct. Of the retracted papers, one contained multiple image issues such that a correction was not an appropriate remedy, and for another retracted paper, the original and underlying data was not available, but the study was sufficiently sound to allow resubmission of a new paper for consideration, which was subsequently published.

Interestingly, this paper also records the amount of time all this took, and it’s substantial – at least 6 hours of staff time per paper, involving hundreds of emails overall and a lot of back-and-forthing. As usual, cleaning something up takes a lot more time than the act of making it messy in the first place. To that point, the journal introduced pre-publication screening of images in 2013, and the incidence of trouble did indeed decline notably starting in that year. (They didn’t tell Elisabeth Bik when the policy was introduced, so as not to bias her).

As those figures show, the good news is that many of the duplicated images appear to be sheer carelessness, and could be fixed. But at least 10% of the paper flagged had to be pulled completely. Extrapolating from this experience (and that of two other journals previously studied) leads to a rough estimate that the 2009-2016 Pubmed literature database (nearly 9 million items) should have about 35,000 papers removed from it completely (and, of course, that means that a lot more papers in it still need to be fixed up). Overall, the number of junk papers can be described as “small but still significant”, and there’s no reason to have them cluttering up the literature.

Increased screening in the editorial phase seems to be worth the effort – it adds time, but not nearly as much as the time it takes to go back and fix things later. (And that fits with another time-honored piece of advice, that if you don’t want shit to land on you, then do not allow it to rise in the first place). This is, as the authors note, more of a recent problem due to the proliferation of the digital tools needed to mess around in this way – and, to be fair, these tools also allow for faster, easier honest mistakes to be made as well. And it admits of modern solutions, too – software to catch image duplications has been (and is being) worked on by several groups, and should obviate the need to clone Elisabeth Biks.

That takes us back around to the question in the first paragraph, about how much crap is out there. Papers with clearly fraudulent images in them are obviously in that category, but there are many other less obvious ways that papers can be fraudulent. So I would call that 35,000 estimate a likely undercount, even given that there are many papers in PubMed that don’t have images of this sort in them.

But beyond fraud, there is the larger universe of papers that are basically honest but are simply no good – statistically underpowered studies, nonreproducible procedures, inadequate descriptions, conclusions that don’t necessarily follow from the data presented. The literature has always had these things in it. Poor-quality work has not been waiting on image-editing programs to make it possible; we all come with the necessary software pre-installed between our ears. Clearing out the frauds is an obvious first step, but it’s also (unfortunately) the easiest. The other stuff is, as it’s always been, on the readers to look out for.

Which brings up one last point, which has been made here and in other places before. In these days of modern times, as the Firesign Theatre used to put it, some of those customers of the scientific literature are not human beings. Machine-learning software has great promise for analyzing the huge pile of knowledge that we’ve generated, but such algorithms are easily poisoned by the garbage in/garbage out problem. Data curation is and always will be a crucial step in getting any machine learning effort to yield useful conclusions, and studies like this one just remind us that curating the biomedical literature is no simple thing.