With so much new literature published each year, why are authors increasingly citing older papers?

Late last year, computer scientists at Google Scholar published a report describing how authors were citing older papers. The researchers posed several explanations for the trend that focused on the digitization of publishing and the marvelous improvements to search and relevance ranking.

However, as I wrote in my critique of their paper, the trend to cite older papers began decades before Google Scholar, Google, or even the Internet was invented. When you are in the search business, everything good in this world must be the result of search.

In order to validate their results, the helpful folks at Thomson Reuters Web of Science sent me a dataset that included the cited half-life for 13,455 unique journal names reported in their Journal Citation Report (the report that discloses journal Impact Factors). Rather than relying on the individual citation as the unit of observation (the approach used by Google Scholar), we base our analysis on the cited half-life of journals. This approach has the obvious advantage of scale, allowing us to approach the problem using thousands of journals rather than tens of millions of citations.

In order to approximate a citation-based analysis, each journal was weighted by the number of papers it published, so that small quarterly journals don’t have the same weight as mega-journals like PLOS ONE. Each journal was also classified into one or more subject categories and measured each year over the 17-year observation period. Our variable of interest is the cited half-life, which is the median age of articles cited in a given journal for a given year. By definition, half of the articles in a journal will be older than the cited half-life; the other half will be younger. The concept of half-life can also be applied to article downloads.

For the entire dataset of journals, the mean weighted cited half-life was 6.5 years, which grew at a rate of 0.13 years per annum. For those journals that had been indexed continuously in the dataset over the 17 years, the mean weighted cited half-life was 7.1 years, which grew at the same rate. For the newer journals, the cited half-life was just 5.1 years, but grew at a rate of 0.19 years per annum.

Focusing on the journals for which we have a continuous series of cited half-life observations, 91% (209 of 229) of subject categories experienced increasing half-lives. Some of these categories grew significantly more than average. For example, Developmental Biology journals grew at 0.25 years per annum, Genetics & Heredity journals grew at 0.20 years per annum and Cell Biology journals grew at 0.17 years per annum.

Conversely, the cited half-life of 20 (9%) of journal categories decreased over the observation period. With few exceptions, these fields covered the general fields of Chemistry and Engineering. For example, the cited half-life for journals classified under Energy & Fuels declined by 0.11 years per annum, Chemistry-Multidisciplinary declined by 0.07 years per annum, Engineering-Multidisciplinary by 0.05 years per annum, and Engineering-Chemical by 0.04 years per annum. Granted, these are smaller declines, but they do run contrary to overall trends.

We also discovered that cited half-life increases with total citations, meaning, as a journal attracts more citations, a larger proportion of these citations target older articles. This can be seen in Figure 2, as journal categories move from the bottom left to the upper right quadrant of the graph over the observation period.

The next figure highlights the trajectory of highly-cited journals from 1997 to 2013, illustrating how cited half-life increases with the total citations to a journal. While most highly-cited journals move toward the upper-right quadrant of the graph, we highlight three chemistry journals that run contrary to this trend: Journal of the American Chemical Society, Angewandte Chemie-Int Ed., and Chemical Communications. Those readers wishing to speculate why Chemistry and Engineering journals were bucking the overall trend are welcome to do so in the comment section below.

Readers are also welcome to explore the data (for categories and for journals). The files (.swf) require the Adobe Flash plug-in. Mac users may need to hold the Control key and selecting one’s browser when opening these files. Categories may be be split into component journals. Other controls moderate the size, speed and display of the data.

In sum, we were able to validate the claims by the Google Scholar team that scholars have been citing older materials, with some exceptions.

The citation behavior of authors reflects cultural, technological, and normative behaviors, all acting in concert. While digital publishing and technologies were invented to aid the reader in discovering, retrieving, and citing the literature, the trend appears to predate many of these technologies. Indeed, equal credit may be due to the photocopier, the fax machine, FTP, and email as is given to Google, EndNote, or the DOI.

Nevertheless, a growing cited half-life might also reflect major structural shifts in the way science is funded and the way scientists are rewarded. A gradual move to fund incremental and applied research may result in fewer fundamental and theoretical studies being published. Giving credit to these founders may require authors cite an increasingly aging literature.

Correction note: Table 1 of the manuscript “Cited Half-Life of the Journal Literature” (arXiv) contains a sorting error. A corrected version (v2) was submitted and will become live at 8pm (EDT). Thanks to Dr. Jacques Carette, Dept. of Computing and Software at McMaster University for spotting this error.