Abstract Background The ability of a scientist to maintain a continuous stream of publication may be important, because research requires continuity of effort. However, there is no data on what proportion of scientists manages to publish each and every year over long periods of time. Methodology/Principal Findings Using the entire Scopus database, we estimated that there are 15,153,100 publishing scientists (distinct author identifiers) in the period 1996–2011. However, only 150,608 (<1%) of them have published something in each and every year in this 16-year period (uninterrupted, continuous presence [UCP] in the literature). This small core of scientists with UCP are far more cited than others, and they account for 41.7% of all papers in the same period and 87.1% of all papers with >1000 citations in the same period. Skipping even a single year substantially affected the average citation impact. We also studied the birth and death dynamics of membership in this influential UCP core, by imputing and estimating UCP-births and UCP-deaths. We estimated that 16,877 scientists would qualify for UCP-birth in 1997 (no publication in 1996, UCP in 1997–2012) and 9,673 scientists had their UCP-death in 2010. The relative representation of authors with UCP was enriched in Medical Research, in the academic sector and in Europe/North America, while the relative representation of authors without UCP was enriched in the Social Sciences and Humanities, in industry, and in other continents. Conclusions The proportion of the scientific workforce that maintains a continuous uninterrupted stream of publications each and every year over many years is very limited, but it accounts for the lion’s share of researchers with high citation impact. This finding may have implications for the structure, stability and vulnerability of the scientific workforce.

Citation: Ioannidis JPA, Boyack KW, Klavans R (2014) Estimates of the Continuously Publishing Core in the Scientific Workforce. PLoS ONE 9(7): e101698. https://doi.org/10.1371/journal.pone.0101698 Editor: Luís A. Nunes. Amaral, Northwestern University, United States of America Received: November 22, 2013; Accepted: June 10, 2014; Published: July 9, 2014 Copyright: © 2014 Ioannidis et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: The authors have no support or funding to report. Competing interests: Two of the authors (KWB and RK) are employed by a small company (SciTech Strategies Inc). Nevertheless, there is no conflict of interest in the authors' submission, nor does this alter adherence to PLoS ONE policies on sharing data and materials.

Introduction The ability of a scientist to maintain a continuous stream of publication has not been well studied. It is well documented that the number of scientists has been growing at a fast pace over time [1], [2]. This growth creates a very large scientific workforce. It would be interesting to evaluate the size and stability of this expanding network. Stability may be more important than the mere size, because uninterrupted, continuous occupation with research may be a major criterion in ensuring achievements for scientists. Scientific investigation may require persistent effort and continuity over many years. Even though major contributions can happen in any single paper, the typical trajectory for a dedicated scientist requires a continuous effort and this may be reflected in a continuous stream of publishing. There are a large number of studies on the age trajectories of scientific careers. The accumulated evidence exhibits substantial variability across disciplines and individuals, and in the relative contributions of scientists at different stages of their career and with different chronologic and academic age [3]–[8]. Different forces, such as changes in creativity/innovation versus building of more collaborations and co-authorship patterns may affect the evolution of scientific productivity, as scientists age. Furthermore, with mounting publish or perish pressure [9], researchers without a continuous publication record often either quit or are forced to quit, since it becomes difficult to attract further funding for their work [10]. At the same time, the potential effect of irregularities in academic lives cannot be ignored. Continuous productivity may be influenced by changing patterns of extensive co-authorship [8] and important differences may exist across scientific disciplines on whether they depend mostly on “core” or “elite” researchers [11]. Some disciplines may have a greater need for cumulative production of information (“cumulative sciences”), while others may work equally well with more sporadic publications of major works that are spaced apart in time. It would be interesting to obtain empirical evidence and estimates on these patterns and answer some important questions. How many researchers have an uninterrupted, continuous presence (UCP) in the scientific literature over multiple years? Are these scientists more influential than others and do they account for a substantial component of the most highly-cited scientists? Finally, are there some characteristics that separate such researchers with UCP in the scientific literature from others? Here, we aimed to address these questions by analyzing the entire Scopus database in the period 1996–2011.

Methods Database Using a workable XML version of the entire Scopus database obtained from Elsevier in August 2012, we identified how many authors have published at least one item in each and every year in the 16-year period of 1996–2011. This was the definition of UCP for the covered 16-year period. The unit of analysis adopted throughout the paper is the calendar year, unless specified otherwise. In a sensitivity analysis, we also used a two-year window for the unit of analysis, instead of one-year, i.e. UCP required the publication of at least one item in each and every of the 8 two-year periods in 1996–2001. Using Scopus, we have previously identified a total of 15,153,100 different author identifiers that have published at least one indexed item in the period 1996–2011 [12]. The database includes all genres of published items, the large predominance though are journal articles. Patents and books are not included in this version of the Scopus database. Validation of author identifiers We used Scopus author identifiers for this project rather than attempting to disambiguate authors on our own. It is possible that some author identifiers that include publications for every year during our 16 year period are cases of polysemy (two or more different authors with the same name grouped under the same author identifier). In most of these cases neither author would qualify for UCP if the author profile were separated correctly. To check for this possibility, we sampled randomly 20 author identifiers with UCP and evaluated their publications by hand to verify that they belonged to a single author publishing continuously over the 16 year period. We also sampled randomly 20 different author identifiers without UCP to evaluate whether they referred to unique authors who may have published any other Scopus-indexed papers under other author identifiers, and thus may have qualified for UCP if the author identifiers were properly joined. In this case in-depth evaluation meant perusal of all individual publications for all author identifiers with similar first and last names. Groups of scientists based on persistence and continuity of publication record Scientists were grouped into those that had and those who did not have UCP. We also evaluated two other groups. “Skip” authors are defined as those who skipped any year(s) in 1996–2011, and then resumed publication in a subsequent year. This does not count those who have published consecutively for two or more years in the beginning of the 1996–2011 period but not after that (those who did not resume) or at the end of this period but not before that (those who started continuous publication during the time period). “Skip-1” authors are defined as those who would have published in all 16 years had they not skipped only a single year between 1997 and 2010. Citation metrics Citation metrics pertain to the number of citations to papers indexed in Scopus for the years in 1996–2011 and are limited to the citations received until the end of 2011. We measured the number of published items, the citations they had received, and the Hirsch h-index [13] of each author identifier for all Scopus author identifiers and provide descriptive statistics for them for the UCP, no UCP, Skip, and Skip-1 groups. In order to understand whether differences in citation metrics are due entirely to differences in the number of papers published by authors in each of the four groups, we also performed analyses that show the median citation count and the median h-index for authors in each of the four groups conditioned on the number of papers that they had published over these 16 years. We performed these analyses for the entire database and also separately limited for researchers in Medical Research to use a more homogeneous sample of investigators. These analyses were performed for the entire database using definitions of 1- and 2-year windows for UCP and other categories, and using the 1-year window definition for Medical Research. Estimation of birth and death dynamics One can define the “UCP-birth” and “UCP-death” years of an author as the calendar years that start and end their chain of uninterrupted, continuous annual (or bi-annual, in the 2-yr window case) publications. This does not coincide necessarily with the first and last published paper of each author: some authors that do achieve UCP during their careers may have some skipped years early in their careers (before they start their UCP period); or they may publish some scattered papers over skipped years at the end of their careers (after the end of their UCP period). We estimated cumulative retention rates of authors by start year. The start year in these analyses was defined as the year in which an author publishes after not publishing the previous year. This is a proxy for the UCP-birth start year. Retention rates for future years were then imputed using year-to-year retention rates (Y2YRR) from the earlier available data. We then estimated the numbers of authors who would be expected to have uninterrupted, continuous presence for 16 years (UCP-16-births) using these imputed Y2YRR rates. For example, the number of authors starting their 16-year UCP in 1999 was estimated by taking the known value from 2011 (n = 23,941 authors publishing each and every year for the 13 years between 1999 and 2011) and multiplying that by the Y2YRR rates for years 14–16 (92% each year). A similar analysis was carried out using scientists who ceased publication in a particular year. Ending year is defined as the year in which an author published which was immediately followed by a year in which that author did not publish. This is a proxy for the UCP-death end year. UCP rates for earlier years were estimated by extrapolation using Y2YRR from the more recent data. Comparison of various characteristics of scientists We compared the different groups of author identifiers in terms of the main scientific field of the authors. We used a previously developed classification that allocates each paper to a separate scientific discipline and then each author is allocated to a specific discipline depending on what is the most common discipline of the papers he/she has authored – for details see references [14], [15] The resulting 13 scientific fields are Mathematics/Physics, Chemistry, Engineering, Earth Sciences, Biology, Biotechnology, Infectious Disease, Medical Research, Health Sciences, Brain Research, Social Sciences, Humanities, and Computer Sciences/Electrical Engineering. We also compared the different groups of author identifiers in terms of region of Scopus-listed address (North America, Europe, Asia-Pacific, South America, Africa, Middle East, unknown), and sector (academic, hospital, government, industry, society/academy, non-profit, unknown) by in depth evaluation of 10,000 randomly selected author identifiers from each group. Region and sector data are based on our own analysis and curation of the affiliation data associated with the publications of each author. Comparisons of groups used the Fisher-Freeman-Halton exact test with Bonferroni correction of the p-value for the number of comparisons. UCP with multiple papers published each and every year We estimated the number of author identifiers that would fulfill the criteria for UCP during 1996–2011, if the minimum of publications published in each and every year in this period were 2, 3, 4, or 5, instead of just 1.

Discussion Our evaluation of the entire Scopus database for the period 1996–2011 shows that, overall, only a very small fraction of researchers (<1% of the over 15 million publishing scientists) have an uninterrupted, continuous presence in the scientific literature and these investigators account for the lion’s share of authors who eventually have high citation impact. There is some variability on the relative prevalence of these investigators across different scientific disciplines, geographical regions, and sectors. The concentration of 87% of the most highly-cited papers among ∼1% of scientists represents a heavy-tail phenomenon that is much stronger than the heavy-tail phenomena described for the concentration of influential papers in specific high-profile journals [16] or the concentration of most citations to a relatively modest proportion of papers (80/20 law) [17], [18]. Authors with uninterrupted, continuous presence over all these 16 years eventually had a much higher citation impact than other authors. To some extent this higher impact is generated through a larger volume of published papers. However, the citation impact in the UCP authors goes beyond just publishing more papers. Even after conditioning on the number of papers, the total citations and h-index of their work were higher than those of non-UCP authors; the exception was authors with fewer than 3 papers per year and who did not have any discernible difference in citation impact regardless of whether they had UCP or not. The vast pool of authors without a continuous presence in the literature probably includes very different categories of people. First, some excellent scientists may intentionally prefer to publish sparingly in the journal literature, especially in the humanities and social sciences where books are a predominant form of communication; however, for most fields of current research, not publishing anything over a year is unlikely to be a desired choice, especially in academic circles, in contrast to industry where other deliverables are more important than publications and for hospital clinicians where patient care is more important than published track records. Second, for many researchers interrupted productivity may reflect life events (e.g., childbearing). Empirical studies have addressed for example gender differences in the continuity of scientific careers [19]. Interrupted productivity may also often reflect limitations and obstacles that scientists face, e.g., insufficient funding or infrastructure or other difficulties that create gaps in their productivity or even lead them to abandon science. Third, many authors may only be ancillary personnel or trainees rather than principal investigators. Fourth, we observed some variability in the prevalence of UCP across scientific disciplines. In the cumulative sciences, such as medical research, that depend on the incremental, continuous accumulation of relatively small bits of information, UCP is highly desirable; conversely, in other disciplines such as the social sciences and humanities, continuous publication on an annual basis may not be as necessary or desirable and many successful scientists may have more sporadically, scattered in time, publications of major works. Scientific disciplines with cumulative profiles however account for the large majority of publishing scientists currently. Regardless of the exact career qualifications and trajectories of individual authors, our analysis suggests that even though the global scientific workforce is enormous, its continuously publishing core is still limited. Given that there are many thousands of universities and research institutions and each has tens and hundreds of teams and departments, the concentration of ∼150,000 researchers can quickly get rarified. Many teams, departments, or even whole institutions may have none or very few researchers who belong to this core and even fewer who have also considerable impact. With higher UCP-birth than UCP-death rates, this core is apparently growing, but growth remains small in absolute numbers and thus potentially vulnerable. Moreover, part of the growth may also reflect more extensive indexing of journals that already existed anyhow, rather than genuine growth in the number of continuously productive scientists. This artifact has been detected in previous analyses of the growing number of total articles [20]. Widespread interruption and non-continuity may be a sign of system inefficiency, regardless of whether it reflects mature scientists, ancillary personnel, or aspiring trainees who cannot maintain a continuous presence in the scientific literature. Of course, one should allow for differences across various disciplines and research sectors (e.g. academia versus industry) in interpreting these results. Differences also exist in more granular microenvironments, e.g. on the way tenure is granted in different institutions and fields and whether there is a requirement for continuous productivity once tenure has been granted. Nevertheless, there is mounting evidence that the current scientific enterprise may be focusing more on disciplines that require incremental continuous contributions; this is also confounded by the funding and entrepreneurial support of the scientific endeavor [21] and it furthermore affect norms for the training of young doctoral students [22]–[25]. In many disciplines, doctoral students may be enrolled in high numbers, offering a cheap workforce for materializing resource-intensive incremental research agendas. However, in these cases, the research system may be exploiting the work of millions of young scientists for a number of years without being able to offer continuous, long-term stable investigative careers to the majority of them. The best course of action in response to this picture can be debated. One option is simply to support further those researchers who succeed into maintaining uninterrupted, continuous presence, since they may be pivotal in generating high-impact science. One possible disadvantage is whether this may lead to further polarization of research in already well-established, readily prolific or conforming [26] lines of investigation. A different approach is to give more opportunities to a wider pool of scientists, especially younger ones, to help them secure continuity of productivity and excellence. Peculiarities related to the needs and aspirations of specific scientific fields and accommodation of life events also need to be considered in any strategic planning. Eventually, the stability and continuity of the publishing scientific workforce may have important implications for the efficiency of science.

Author Contributions Conceived and designed the experiments: JPAI. Performed the experiments: KWB. Analyzed the data: JPAI KWB RK. Contributed reagents/materials/analysis tools: KWB RK. Wrote the paper: JPAI.