ABOVE: © ISTOCK.COM, VARIJANTA

PubMed, the National Library of Medicine’s repository of millions of abstracts and citations, has long been one of the most highly regarded sources for searching biomedical literature. For some members of the scientific community, the presence of predatory journals, publications that tend to churn out low-quality content and engage in unethical publishing practices—has been a pressing concern.

“If a predatory journal is confined on its website, which is often of low-quality, the chance that patients or scholars will read and cite these articles is very low,” says Andrea Manca, a professor of physiology at the University of Sassari in Italy. “The problem is that when they are displayed in the most popular biomedical database that we have, there are many [people] who think if a journal is on PubMed, then it is fine—which is not true, unfortunately.”

This was the case for Susanta Pahari, a professor of chemistry at the Skyline University Nigeria who, while browsing through the literature, found papers from four journals he suspected as predatory on PubMed. To find out whether these were indeed problematic, he reached out to the anonymous curator of a website that maintains the now-defunct Beall’s list of potential predatory publishers and scanned through the Directory of Open Access Journals (DOAJ), a “whitelist” of credible open-access publications—and confirmed his suspicions for two of the four journals. They were not on the list of reputable publications, despite landing up on PubMed. This discovery “was surprising to me,” Pahari tells The Scientist.

While unusual, articles from journals with poor reputations do make it into the repository. Just how—and how big a problem it is—has been the interest of a number of scholars and librarians.

In 2017, Manca, Franca Deriu, a professor of physiology at the University of Sassari, and their colleagues conducted two studies that pinpointed more than 200 predatory journals across the disciplines of neuroscience, neurology, and rehabilitation, and discovered that several of those also appeared on PubMed. According to Manca, despite those findings and their attempts to bring attention to this issue through subsequent commentaries, “predatory journals are still there.”

See “Identifying Predatory Publishers”

David Moher, a clinical epidemiologist at the Ottawa Hospital Research Institute, notes that this issue might be particularly problematic for patients using PubMed to identify papers for information regarding healthcare and for researchers who are conducting systematic reviews or meta-analyses. “What we think is that perhaps some additional scrutiny could be brought to bear on decisions that the National Library of Medicine are making about journals to include in their databases,” says Moher, who, along with Manca, recently coauthored a paper on this issue in CMAJ.

Several university libraries have now posted warnings about the issue, along with guidelines for how readers can identify articles from reputable publications.

The source of the leak

To understand how predatory journals might get into PubMed, it’s important to first recognize the database’s components. PubMed was originally created in 1996 as a public interface to MEDLINE, the National Library of Medicine’s (NLM’s) database of citations and abstracts from selected journals in the medical and the life sciences. While MEDLINE references still make up a majority of articles on PubMed, the second-largest chunk of listed papers now comes from PubMed Central (PMC), an online, freely-accessible archive of articles from journals and publishers with agreements with the NLM, manuscripts from authors complying with funders’ open-access policies, and historical content archived by digitization projects. (The differences between PubMed, PubMed Central, and MEDLINE are described in detail on the NLM website.)

The fact that these articles have to be on PMC is a bit of a challenge. —Jerry Sheehan, NLM

Both MEDLINE and PMC have quality-control measures in place. MEDLINE has a long-standing, rigorous selection process, through which a federal advisory committee conducts a thorough evaluation of journals, examining things such as their publishing practices and the scientific merit of their contents. Journals that are accepted into PMC go through a similar—but more recently implemented—appraisal process. Accepted manuscripts, however, are deposited into PMC without review.

According to Manca, content from predatory publishers likely seeps into PubMed via PMC, where he and his colleagues have been able to find papers from several predatory journals. “PubMed Central should be refined in terms of the contents that they have,” he says.

Accepted manuscripts deposited by authors funded by the National Institutes of Health (NIH) “go into PMC and get a citation in PubMed regardless of where they’re published, because they’re all under a legal or policy requirement from the NIH to be made available on PMC,” says Jerry Sheehan, deputy director of the NLM.

See “On Blacklists and Whitelists”

Sheehan tells The Scientist that the NLM is aware of concerns that articles from non-reputable journals are entering PubMed through that route. “The fact that these articles have to be on PMC is a bit of a challenge,” he adds. “At the same time, those are articles that result in research that was funded by the NIH, so there’s some ability to recognize that there was a very selective peer review process that occurred in the funding of the research that was reported.” Still, to try to curtail this problem, the NIH issued guidelines to help authors identify credible journals in which to publish their work in 2017.

How big of a problem is it?

The concerns raised about low-quality content on PMC seeping onto PubMed spurred Peace Williamson, a medical librarian at the at the University of Texas at Arlington, and her colleague to investigate the composition of articles on PubMed, as well as quality-control procedures NLM had in place. Their study, which was published in JMLA in January, revealed that more than 90 percent of the content on PubMed came from MEDLINE, and that 85 percent of author-deposited accepted manuscripts were published in MEDLINE journals.

Based on their findings, Williamson says she personally doesn’t feel that the presence of predatory publishers on PubMed is a pressing problem. Still, “it would be better to be able to [identify] how things got into PubMed,” she tells The Scientist. “Being more apparent about that would be helpful to the user.”

The presence of predatory journals may be worse on some other repositories of scholarly literature. Catherine Smith, a professor of information sciences at the University of Wisconsin-Madison, tells The Scientist that in a preliminary analysis, which she presented at the Medical Library Association conference last year, she and her colleague found that PubMed actually had fewer articles from predatory publishers than other digital resources, such as Scopus and Google Scholar. “I thought the NLM did pretty well in this study,” Smith says.

Ultimately, it’s important for both authors and readers to be mindful of the journals they submit to or the articles that they read, Williamson says. While there is some level of quality expectation with resources such as PubMed, “even things that get published in the New England Journal of Medicine get retracted—so the onus is on us to practice good critical appraisal methods when we look at literature.”