The journal impact factor (JIF)1 is without a doubt the most widely used, misused and abused bibliometric index in academic science. Journals are ranked within their field based on JIF, and JIF is seen as a reflection of the importance of a journal's publications. The contributions of individual scientists are also gauged based on the JIF of the journals where their work is published, and in academic settings, funding and promotion decisions rely heavily on JIF. Not surprisingly, there is intense pressure on journal editors to game the system and increase their journals’ JIF in ways that do not contribute to advancing science and that in many cases distort the scientific process.

Increasingly, there is healthy pushback against this embarrassing evolution. The Declaration on Research Assessment (DORA), for instance, which has been signed by over 1400 organizations and over 14 000 individuals, urges the elimination of journalbased metrics, such as JIF, as indicators of individual article quality or to assess individual scientist contributions for hiring, promotion or funding. The declaration states that, even for journal appraisal, JIF should be seen as only one among many metrics.2

Unfortunately, there are still many institutions where JIF mania rules, and, therefore, JIF manipulations flourish.3-9 It has been admitted10 that “in many cases all that the JIF indicator now measures is how assiduously a journal's editors are playing the JIF ‘game’11 and their willingness to steer as close as possible to, and perhaps even to cross, the boundary between appropriate and inappropriate behaviour in pursuit of that goal.” Thus, editors willing to inflate the impact factor of their journals with any number of tricks are rewarded; since rankings are relative, the journals of those who do not engage in these tricks suffer the consequences. Sometimes, this trickery may not be initiated by editors, but it may be a secondary consequence of pressure from publishers or from professional society officers who oversee journals.

There is a belief that a higher JIF leads to more (and, ideally, better) articles submitted and thus also published by a journal. If so, a journal may genuinely improve as JIF increases. Volume of submissions may indeed increase, because many scientists naively decide where to send their paper based on JIF. However, volume may dissociate from quality. When the JIF of the European Journal of Clinical Investigation increased from 2.71 to 3.09 last year (a change well within the range of random chance from sampling fluctuation), there was a large increase of submissions. For some countries, the increase was startling; submissions from China more than doubled. Since the 1990s, Chinese research institutions have applied a cash‐per‐publication reward policy, and although the specific reward system differs across institutions, it is typically JIF‐based. In some institutions, a JIF of 3 or higher is an important reward threshold,12, 13 suggesting some sort of improbable magic weight when the threshold is crossed. Not surprisingly, the quality of extra submissions was low; only 4% of papers submitted from China were accepted, a lower percentage than for previous submissions from China and much lower than the overall acceptance rate of the journal.

Many scientists not only submit their papers to higher JIF venues, but also prefer to cite papers from periodicals with higher JIF even when these are not better or more suitable than similar papers published in journals with lower JIF. As evidence of this, when identical versions of some reference papers are published concomitantly in several journals, citations are given predominantly to the journals with the highest JIF.14 Thus, JIF feeds itself with more citations leading to more downstream citations and even higher JIF.

3-9 Publishing large numbers of papers that get cited (thus increase the numerator of JIF) without being counted in the denominator of the JIF calculation; only research articles and reviews are included in the JIF denominator, but citations of other types of publications, including editorials, letters and other items gather citations and count in the numerator, thus inflating JIF for free. Increasing self‐citations to the journal, either by loading the aforementioned editorials and letters with journal self‐citations or by requesting from submitting authors as part of article revisions to include citations to other recent articles of the journal without genuine scientific justification (coercive self‐citation). Publishing reviews, which usually get more citations than regular research articles, regardless of the scientific quality of the reviews. Publishing some types of papers with questionable scientific value, because they expect they will be widely cited as standard references by members of a wider professional society (often the society running the journal) and related communities. Inappropriate use of JIF is unlikely to stop, unless its manipulations are explicitly discredited and, when they are egregious, meaningfully penalized. A substantial literature has catalogued the main stratagems used to inflate JIF.The main ones are as follows:

This last stratagem typically occurs via the publication of professional society guidelines, position statements and disease definitions developed by experts with little or no methodological rigour.15 These documents become attractive semi‐compulsory citations for papers published in the respective field. However, most expert‐based clinical guidelines are suboptimal.16 Exceptions of stellar guidelines, which are based on rigorous evidence synthesis and follow standards for developing trustworthy guidelines,17, 18 do exist, but subpar guidelines are the rule. In reality, many professional guidelines are no more than massively multi‐authored editorials. Most disease definition statements are also driven primarily by expert‐based rather than evidence‐based processes. These often label more people as sick and in need of medical care,19, 20 and such disease mongering serves primarily the interests of professional specialists and their industry sponsors.

The advent of these expert‐based blockbusters is relatively recent—and pervasive. Among the 100 most‐cited publications of 2016 indexed in Web of Science (as of 5/28/2019), 12 are clinical guidelines, and four are disease definitions. Conversely, the 100 most‐cited publications of 2000 included only one paper on defining tumour response and one disease definition paper that actually used a careful investigational approach.

Appropriate editorials, letters, reviews, guidelines and disease definitions can have positive scientific value. Editorials offer opportunities to express important viewpoints and shape novel concepts. Letters can contribute to the self‐correction of science. Reviews may have more scientific value than regular articles, especially when systematic and well‐done. Guidelines and disease definitions can improve patient care and health when they adhere to evidence‐based approaches.17, 18 Furthermore, appropriate self‐citations are indispensable to place scientific work in proper context against previous work and avoid exaggerated novelty claims. Conversely, JIF gaming and coercive self‐citation machinations9 are parodies of science.

Clarivate Analytics, the company issuing JIFs through Journal Citation Reports (JCR), meticulously reports information on each journal that can help users to sort out potential cases of spurious JIF inflation. Table 1 lists 12 journals as illustrative examples along with data routinely listed in JCR: JIF 2017 (calculated based on citations in 2017 of papers published in 2015 and 2016), JIF 2017 excluding self‐citations to publications in the same journal, Median Citations per Article (MCA) published in the JIF calculation time frame, and number and types of papers that made extremely extraordinary contributions to JIF by receiving more citations than 10 times the value of JIF. The chosen journals include some of the top, most respected journals across science and medicine, two journals where one of us (JPAI) serves as editor‐in‐chief or associate editor and four cardiology journals that have achieved recent dramatic JIF increases. Specifically, between 2000 and 2017, the JIFs of European Heart Journal, European Journal of Heart Failure, Europace, and Revista Espanola de Cardiologia, increased from 3.87, 1.15, 0.44 and 0.70, respectively, to 23.43, 10.68, 5.23 and 5.17, respectively (505%‐1088% increases). To put that in perspective, the aggregate impact factor for the JIF category of Cardiac and Cardiovascular Systems increased from 3.37 in 2003 (earliest available data currently provided by Clarivate Analytics) to 4.36 in 2017 (29% increase).

Table 1. Information readily available in Journal Citation Reports that can help obtain insights into journal impact factor inflation Journal JIF JIF (without self‐citations) MCA Papers among top 10 cited Nature 41.6 41.0 25 4 (1 review, 3 original) Science 41.1 40.6 21 6 (6 original) PLoS Medicine 11.7 11.3 6 No such papers New England Journal of Medicine 79.3 78.5 36 3 (3 industry trials) JAMA 47.7 46.6 23 1 (1 sepsis definition) British Medical Journal 23.6 22.1 7 1 (1 reporting guideline) Journal of Clinical Epidemiology 4.2 3.8 2 1 (1 method) EJCI 3.1 2.9 2 No such papers European Heart Journal 23.4 21.8 10 6 (6 expert‐based guidelines) Revista Espanola de Cardiologia 5.2 3.4 1 No such papers European Journal of Heart Failure 10.7 8.9 6 1 (1 expert‐based guideline) Europace 5.2 4.5 2 2 (2 expert‐based guidelines)

The MCA can be helpful to predict the likely number of citations for a typical article in the journal. When considering where to submit an article, scientists often peruse the JIFs of possible target journals. While JIF can be predictive of total or mean future citations across a large population of published papers, it says little about how many citations a specific paper may receive when published. This is because 10%‐20% of the papers published in a journal get 80%‐90% of the citations.1 MCA is more representative for the “typical” article in that highly skewed distribution. It is not a perfect indicator; even though MCA is much lower than JIF, it still conveys unrealistically high expectations to prospective submitting authors. As an example, the New England Journal of Medicine tops JIF among medical journals (79.26). Its MCA is 36. Yet both numbers reflect primarily the massive publication of highly networked and aggressively advertised industry‐funded trials, hundreds of which have been published in this journal over the years. Unless a prospective submitting author has such an industry‐funded trial to submit, the median expected citation performance of an accepted article will be a tiny fraction of JIF—5 to 10 citations per year is common. Similarly, if a journal publishes many blockbuster guidelines, these vastly affect JIF and also distort MAC. Regardless, >95% of journals have MCA ≤3. Two journals with JIF 2.5 and 5.0 typically differ by 0‐1 points in MCA; the one with higher JIF may even have lower MCA. But, as ludicrous as it is, the (uncertain) prospect of 1 more or 1 less citation per year may decide careers, promotions and funding. In China, it can result in an immediate financial gain or loss.

Self‐citing boost: the per cent increase in JIF due to self‐citations Skewness and nonarticle inflation: the per cent inflation of JIF vs MCA Expert‐based blockbusters: the presence of blockbuster expert‐based clinical guidelines and disease definitions among extremely cited (>10 JIF) papers. While the readily available metrics of Table 1 cannot meaningfully inform a prospective author where to submit his/her work, they can be used, as shown in Table 2 , to calculate three measures that capture the main mechanisms of JIF inflation:

Table 2. Key measures that capture mechanisms of JIF inflation Journal Self‐citing boost Skewness & nonarticle inflation Expert‐based blockbusters Nature 1 66 0 Science 1 96 0 PLoS Medicine 3 95 0 New England Journal of Medicine 1 120 0 JAMA 2 107 1 British Medical Journal 6 237 0 Journal of Clinical Epidemiology 12 112 0 EJCI 7 54 0 European Heart Journal 7 134 6 Revista Espanola de Cardiologia 52 417 0 European Journal of Heart Failure 20 78 1 Europace 15 162 2

Self‐citations naturally vary. Highly specialized journals in disciplines with few journals may justifiably have higher self‐citation boosts. Across 9522 journals in 2015, the percentage of self‐citations contributing to JIF calculation was 11.5%4; thus, the self‐citing boost was 11.5/(100‐11.5) = 13.4%. However, this figure largely reflects the overall self‐citation gaming; justified self‐citations must be lower. In this context, the self‐citation boost for the European Journal of Heart Failure (20%) seems high and that of the Revista Espanola de Cardiologia (52%) is extremely high.

Skewness and nonarticle inflation increases when a journal publishes more articles with extreme citations, many reviews that get more cited much more frequently than regular articles or many citable items that do not count in the JIF denominator (eg editorials). Again, Revista Espanola de Cardiologia is an outlier. Closer examination in Web of Science (as of June 9, 2019) shows that in 2015‐2016 it published 154 “Articles” and 13 “Reviews,” which are citable items and count in the JIF denominator, but also 249 “Letters” and 136 “Editorials,” which do not count in the denominator but are loaded with journal self‐citations.

Finally, European Heart Journal propelled its JIF through publishing extremely highly cited expert‐based guidelines from its own professional society; six such papers each received citations that were 10‐31 times its JIF and 25‐73 times its MCA. In other words, publishing each of these expert‐based guidelines gave as many citations to the journal as 25‐73 other “typical” articles. Europace published 2 expert‐based blockbusters that were cited 33 and 87 times its MCA, respectively. European Journal of Heart Failure published 1 such blockbuster that was worth 56 times the MCA. The same blockbuster guideline on heart failure was published concomitantly in both European Heart Journal and European Journal of Heart Failure, boosting the JIF of both. Similarly, the same blockbuster guideline on ventricular arrhythmias was published concomitantly in both European Heart Journal and Europace.

None of the other nine examined journals published any such blockbuster expert‐based clinical guidelines. There was only one blockbuster disease definition paper published in this sample, by JAMA, but it was based on rigorous, carefully conducted definition and validation processes.21, 22 Conversely, the guideline development process of the European Society of Cardiology (ESC) stipulates “a structured literature search aiming to identify the best evidence” and briefly adds that “a formal literature review must also be performed by the Task Force,”23 but it leaves unclear how often this is implemented in a rigorous way. There is no clause asking for methodologists to be involved in the process. One of us (JPAI) was actually invited to perform a systematic review to support an ESC guideline to be published in European Heart Journal. But then, the Task Force met, discussed, and made recommendations and even voted on them within a few months, before the systematic review was done! Communication with the Task Force chair and the ESC guideline committee chair (both of them otherwise excellent, expert clinicians) suggested that this weird practice was common. Meeting bureaucratic timing milestones in expert committees (even for topics where there is absolutely no clinical need for urgency) is apparently far more important than having a proper systematic review in place.

The perusal of the 12 journals shows that only the four journals with dramatic JIF increases exhibit stigmata of massive inflation. Of course, this does not “prove” unethical or gaming behaviour. Rather it paints a disconcerting picture and suggests that, at a minimum, explanation is needed. One journal seems to sustain itself on massively publishing self‐citing editorials and letters, and the other three share among them blockbuster expert‐based clinical guidelines. Clinical guidelines are, by definition, citation‐avid constructs, practically enforced with authoritative vengeance upon wide professional communities. Some additional types of articles besides guidelines and disease definitions are also citation‐avid constructs, such as research reporting guidelines, methods, software, database and burden of disease reference papers. However, these typically (not always) depend on more rigorous scientific methodology and usually are highly valuable. We probably need more such papers, while we need fewer expert guidelines not based on rigorous evidence synthesis.

Clarivate Analytics actually has a process that suppresses journals for a year (ie does not provide them any JIF) when they exhibit extremely high self‐citation rates or when there is evidence of stacking between journals in citation farms; this occurs when journals massively cross‐cite each other to boost their JIFs.24 However, very few journals are suppressed in this fashion (only 20 in the 2017 edition, including 16 for too high self‐citation and 4 for stacking citation cartels).21 This is because the required self‐citation or stacking thresholds for suppression are exorbitant; self‐citations of suppressed journals had doubled or even tripled their JIF.24

Probably many hundreds and possibly thousands of journals use self‐citation and other stratagems extensively, but remain below the Clarivate radar screen. For at least 925 journals in 2015 more than a third of the citations contributing to JIF calculation were self‐citations.4 Clarivate should seriously consider decreasing the self‐citation thresholds that trigger JIF suppression. Otherwise, the message is given that boosting JIF by 50% through self‐citations is quite acceptable, and even desirable given the huge stakes involved. Moreover, better, more sensitive methods are needed to detect gaming schemes with sophisticated cross‐journal and cross‐society structures. Many fields with generally large impact factors may simply be infested with powerful citation cartels rather than possess higher scientific interest.

Authors should pick target journals based on relevance and scientific rigour and quality, not spurious impact factors. Inspecting inflation measures is more informative for choosing a journal than JIF, because prominent inflation may herald spurious editorial practices and thus poor quality. Authors who submit to journals with high‐impact inflation may become members of bubbles. They even run the risk of having their work published in journals that are eventually formally discredited if Clarivate decides to make a more serious effort to curtail spurious gaming.

While JIFs continue to circulate, scientists and institutions need to be sensitized that many journals probably engage in spurious or even outright ridiculous stratagems to boost their numbers. Hopefully, revealing these tricks may help curtail these damaging practices. JIF is just only bibliometric indicator among many others.25 Its gaming has nothing to do with good science. Moreover, Clarivate Analytics also has a unique opportunity to halt the damage to science that is propagated by JIF mania and manipulation. JIF is a highly flawed, easily gameable metric. If JIF was discus throwing, athletes would have their performance appraised within a centimetre of accuracy, but be allowed to offer themselves a 20‐30 metres bonus without disclosing this, plus be able to choose the weight of the discus they want to throw. JIF has outlived its utility. Clarivate should take the bold step to remove it entirely from its JCR editions. Instead it can replace it with Median Citations per Item indicators calculated separately for Articles, Reviews and Other types of papers. These metrics are far more appropriate and more difficult to game than JIF. Everything that contributes citations to the numerator should also count in the denominator. Moreover, these metrics should be calculated excluding self‐citations. This will remove the temptation to self‐cite for the sake of self‐citing. Expert‐based guidelines and disease definitions should also be counted as other types, rather than Articles or Reviews, since they are akin to editorials. Until Clarivate endorses this simple step, serious scientists should beware of impact metrics that are too good to be true.