The mystery of the phantom reference

Prof. Anne-Wil Harzing, Middlesex University

With Prof. Pieter Kroonenberg, Leiden University

Web: www.harzing.com

Email: anne@harzing.com

© Copyright 2017 Anne-Wil Harzing. All rights reserved.

First version, 26 October 2017

Introduction

Through my work with Publish or Perish I get in touch with many academics who are doing bibliometric work, oftentimes as a “research hobby”. In one of these exchanges, Pieter Kroonenberg, a Dutch emeritus professor in Statistics, told me about an interesting puzzle he had come across. When looking at the author guidelines for an Elsevier journal that he intended to submit to he noticed the following reference:

Van der Geer, J., Hanraads, J.A.J., Lupton, R.A., 2000. The art of writing a scientific article. J Sci. Commun. 163 (2) 51-59. [The journal name can also be found with its full title Journal of Science Communications]

Picture: Paris Musee Cluny by Pieter Kroonenberg

He was intrigued to see that one of his former colleagues Prof. John van de Geer had a “hidden side”, publishing about the art of academic writing in addition to his work on experimental psychology and multivariate analysis. But, wait a minute…, this reference referred to Van der Geer instead of Van de Geer. Still…, the paper looked interesting so he ventured to look it up. However, despite strenuous efforts he was unable to find it. An (Italian) journal with a similar name did exist, but its full name was Journal of Science Communication rather than Communications and it had only started in 2002. Looking at the original reference again, it did strike him as a little odd for a journal to have published 163 volumes in a discipline that normally equates volumes to years. Moreover, the second author seemed to have only ever published this particular article, which obviously is rather strange for someone writing about the art of writing a scientific article.

To cut a long story short, the article appeared to be completely made up and did not in fact exist. It was a “phantom reference” that had been created merely to illustrate Elsevier's desired reference format. Even so, Pieter found that in the Web of Science there were nearly 400 articles citing this non-existing reference and many more citing articles appeared in the more comprehensive Google Scholar. The fact that academics don’t always take the necessary care in their referencing behaviour is something that is not unfamiliar to me. Early on in my career, I even wrote an article about this: Are referencing errors undermining our scholarship and credibility? But even so, how could authors cite a publication that didn’t in fact exist?

Step 1: Origin of the phantom reference and citations in the Web of Science

So let’s take a closer look at what is happening here, i.e. describe the problem. As indicated above, the origin of the phantom reference is the reference style section of the author guidelines produced by Elsevier journals. [Please note that Elsevier has recently changed the year of the Van der Geer reference to 2010.]

Using the search terms below, the Web of Science Cited reference search results in 398 citing references [search date 24 Oct 2017]. There were a dozen or so additional citations which I ignored as they were referring to the new 2010 reference or omitting one of the authors. It is clear that there are quite a few “stray citations”, created by either inaccurate referencing or data entry errors by Web of Science data entry typists, but that’s a minor and well-known problem and not the focus of this white paper.

Step 2: What are the characteristics of papers citing the phantom reference?

My first attempt to solve the puzzle of why authors would cite a phantom reference was to verify whether there was any particular type of articles engaging in this practice. It turned out the answer was yes: I found most occurrences of the phantom reference – nearly 90% – to be in proceedings papers (see left-hand picture below). Of these conference proceedings papers, nearly two thirds had been published in Procedia conference volumes (see right-hand picture below), a series published by Elsevier in some 25 different subject categories. Although published by Elsevier the paper selection and/or peer review is the responsibility of the organizers of the conferences in question.

However, a key draw card for conference organizers and conference participants alike might be that many of these proceedings series are indexed in both Scopus and the Web of Science. As many university administrators, rightly or wrongly, only “count” publications listed in these databases, a publication in the Procedia series might be quite attractive, especially for academics that consider a publication in the top journals in their field to be out of their reach, or too much trouble.

The Web of Science lists nearly 85,000 papers published in one of the Procedia series since 2009. Nearly two thirds of these were published in either the Procedia Social and Behavioral Sciences or Procedia Engineering series, the two series responsible for the majority of papers citing the phantom reference (see right-hand picture above). Since February 2017 Elsevier has stopped accepting new proposals for several subject categories, including these two. This might explain why there are 74 papers citing the phantom reference in 2016, but only 19 in 2017 (see picture below).

In order to get a feel for the quality of these conferences I looked at a range of papers in the Social Sciences – the only field I am sufficiently familiar with – and clearly not all of them are of the level that would normally be expected at conferences in this field. Some even consist of fewer than 3 pages of fairly incoherent statements with every sentence starting on a new line. The English language in many of the papers I looked at, was also quite poor, possibly a reflection of the fact that the vast majority of the authors came from countries such as China, Malaysia, Turkey, Russia, Romania and Iran, where there isn’t a strong tradition of writing in English, especially in the Social Sciences. In addition, references were often incomplete and formatted unsystematically. These quality problems might well have prompted Elsevier to stop publishing proceedings in this field.

Step 3: How is the phantom reference cited in the citing papers?

The fact that most citations to the phantom reference occurred in fairly low-quality conference papers written by authors from emerging economies seemed to point to quality control as a potential source of our problem. Even so, authors do need to cite the reference in the first place. Hence my next step to solve the puzzle was to find out in what context the phantom reference was cited. I thus verified the top-20 most cited articles (all articles with 10 or more citations) that cited the phantom reference, reasoning that these were most likely to be “credible” articles. They could thus be expected to have experienced at least some level of quality control and would constitute a “best case” scenario. Out of these 20 papers citing the phantom reference, 17 papers were published by Elsevier and 15 were journal articles (a proportion much higher than the 11% journal articles in the total sample).

I was able to access 12 of these 20 papers. In six of the 12 papers, the phantom reference was first-listed in the reference list, in three it was listed as the last reference, in two articles it was in the middle of a list and in one article it wasn’t in the reference list at all. In eight articles (#1, 2, 4, 5, 8, 12, 16, 19) the phantom reference was used to support a statement in the article that was completely unrelated to the topic of the phantom reference. [If article #4 and #5 look similar it is because they are. The two articles are nearly word for word identical.] In three of the remaining four articles (#14, 15, 20) the reference wasn’t actually listed in the article itself even though it was included in the reference list. In the last case (#18) the phantom reference wasn’t listed in either the article or the reference list, which threw up the additional puzzle of why the Web of Science would report this article as citing our phantom reference.

Article 1: Journal article in Separation & Purification Technology

Article 2: Journal article in Journal of Electroanalytical Chemistry

Article 4: Journal article in Spectrochimica Acta Part A

Article 5: Journal Article in Journal of Molecular Structure

Article 8: Journal article in Materials Letters

Article 12: Journal article in Nano Energy

Article 14: Conference paper in Procedia Food Science

There is no #16 in the article itself.

Article 15: Conference paper in Procedia Social and Behavioral Sciences

There is no Van der Geer reference in the article.

Article 16: Journal article in Central European Journal of Chemistry

Article 18: Conference paper in Procedia Social and Behavioral Sciences

There is no Van der Geer reference in the article or in the reference list.

Article 19: Conference paper in Procedia Computer Science

Article 20: Conference paper in Procedia Social and Behavioral Sciences

There is no Van der Geer reference in the article.

Step 4: The breakthrough

The fact that the phantom reference occurred so often in the first or last position in the list of references made me wonder whether it had simply been “left behind” by the authors by mistake. But then again, why would they have entered it in the first place? And suddenly – when I stumbled upon a recent conference in Renewable Energy that still listed its submission template on its website – it finally dawned on me: they didn’t! The conference template lays out the entire article format, starting with the title, authors and affiliations…

…. and finishing off with acknowledgments, appendices, and references. The latter would include the Van der Geer article as an example of how to format a journal article.

Obviously authors were meant to replace the template text with their own text for each section of the template. But what if some authors with poor English language skills and/or little experience in publishing simply didn’t understand this? Or what if they simply kept the Van der Geer reference in their document while they were filling it out, using it as a “model reference” to follow for their own references, and then just forgot to remove it after completing their own reference list. As can be seen in article #15 and article #20, some authors in the Social Sciences also left in the example for a book and book chapter – both common publication formats in the Social Sciences – giving some credence to this assumption.

Extenuating and facilitating circumstances

However, in order for these simple mistakes to have the effect they did we need to understand the following “extenuating and facilitating” circumstances:

First, there are nearly 85,000 Procedia conference papers. The phantom reference appeared in several hundreds of them. That means less than 0.5% of the Procedia papers contained this mistake. Whilst unfortunate, one might consider this to be an acceptable “margin of error”. Many of the other papers citing the phantom reference were also conference proceedings. It is likely that these conferences simply “borrowed” their template from the Elsevier Procedia conferences, thus “importing” the potential for mistakes.

Second, conference editors for these conferences are likely to have done little to no quality control. Their main motivation might not have been producing high quality conference proceedings. I looked up some recent conferences and apart from a few that seemed quite established, most are mainly targeted at novice and inexperienced researchers. Substantive fees are charged for publication of the paper in the Procedia conference proceedings, with publication being presented as a bonus feature of the conference and the Web of Science listing of the proceedings prominently featured.

What is slightly puzzling is that approximately forty of the papers citing the phantom reference were included in very established journals. However, as most of these journals were Elsevier journals they might well have used similar types of article templates. It is unclear to me how our phantom reference came to “support” statements relating to semi-conductors, electrocoagulation, blood pressure, or cancer drug resistance. My hunch is that it might be a combination of “anonymised” referencing through the system of numbered referencing, which makes spotting errors harder for authors and editors, and a bug in the typesetting or proofing software used. Remember, 40 errors is not a huge number given the hundreds of thousands of articles that Elsevier might have published since 2006.

In sum

Just like many other mysteries, our mystery of the phantom reference ultimately had a very simple explanation: sloppy writing and sloppy quality control. An academic incentive system that makes publication in Web of Science listed conference proceedings popular invokes the law of big numbers. Thus the actual number of mistakes rose to be high enough to be noticeable, even though the mistake was only committed by a tiny fraction of the authors.

In a way we can be glad that our phantom reference IS a phantom reference. If this had been an existing publication, the mistakes might have had far more serious consequences. Four hundred inaccurate citations might be a drop in the ocean in a sea of hundreds of thousands of publications. However, for many individual authors four hundred citations might make the difference between a mediocre and a good citation record or getting a job or not.

Hence, the key conclusion I would draw is: be careful before taking unusual citation levels at face value. Do some due diligence, or let someone with bibliometric knowledge do so. If something looks fishy, it probably IS fishy!

Related blogposts

Press coverage for the white paper

References

Adler, N.; Harzing, A.W. (2009) When Knowledge Wins: Transcending the sense and nonsense of academic rankings , The Academy of Management Learning & Education , vol. 8, no. 1, pp. 72-95. Available online... [Winner of the 2009 AMLE Outstanding article of the year award, free download courtesy of AoM.] - ESI top 1% most Highly Cited Paper

, , vol. 8, no. 1, pp. 72-95. Available online... [Winner of the 2009 AMLE Outstanding article of the year award, free download courtesy of AoM.] - ESI top 1% most Highly Cited Paper Harzing, A.W. (2002) Are our referencing errors undermining our scholarship and credibility? The case of expatriate failure rates, Journal of Organizational Behavior, vol. 23, no. 1, pp. 127-148. Available online...