Definitions

Let us begin by introducing the operational definitions of top scientist and junior researcher that we shall retain throughout the rest of the paper. We say that a researcher is a top scientist in a given year if she belongs to the top 5% of cited authors in her discipline for that same year. Such a choice is dictated by the need to find a reasonable balance between the numbers of top and non-top scientists in our following analyses. Furthermore, this choice leads to significant stability in our classification, as in more than 95% of cases in our dataset, once a researcher becomes a top scientist she remains one until the end of her career.

We then classify as junior researchers scientists who are in their first 3 years of academic activity. More precisely, we classify a scientist as a junior researcher for the first 3 years since her first publication, which we reasonably expect to roughly cover the duration of a Ph.D. The main results presented in the following are qualitatively unchanged when extending such period to the first 5 years after the publication of the first year.

Institutional prestige and impact

We begin our analysis by pooling together all researchers from four disciplines (Cell Biology, Chemistry, Physics, and Neuroscience—see “Methods” section and Supplementary Tables 1–4 for the lists of journals we consider for each discipline) whose career started between 1980 and 1998 and lasted at least 20 years, who have at least ten publications, and who have published at least one paper every 5 years. In total, we have 22,601 such researchers (see Table 1 for a detailed breakdown in terms of disciplines). Within such pool of authors with long-lived careers, the unconditional probability of being a top scientist in the 20th career year is 24.8%. Let us now proceed to condition this probability based on the institutional prestige a junior researcher is embedded in. We assign an institutional prestige score to each junior researcher in the dataset in order to generate a continuous prestige spectrum, which will allow us to analyze individual career trajectories at a granular level. We do so by means of the average adjusted Nature Index (see “Methods” section) of the researcher’s institution and of the institutions her coauthors are affiliated with. We cross-check such a score by computing the Kendall correlation between the ranking of the institutions based on their Nature Index and the widely recognized Leiden ranking8, getting a correlation coefficient of 0.98 for Cell Biology, 0.94 for Chemistry, 0.94 for Neuroscience, and 0.97 for Physics, respectively.

Table 1 Matched pair analysis results Full size table

In Fig. 1 (left panel) we report the number of junior researchers falling within each quintile of the institutional prestige distribution, divided into three groups: those who did not coauthor a paper with a top-cited scientist early in their career (15,495 authors, shown in blue), those who coauthored papers with one top-cited scientist (4573 authors, shown in orange), and those who coauthored papers with more than one top-cited scientist (2533 authors, shown in red). In Supplementary Fig. 1 we show the distribution of the number of unique top coauthors for members of the latter group. In Fig. 1 (right panel) we show the probability for authors belonging to such groups of being a top-cited scientist themselves in their 20th career year.

Fig. 1 Relationship between early career institutional prestige and probability of becoming a top scientist. a Number of junior researchers in each quintile of the distribution of institutional prestige. b Probability of being a top scientist in the 20th career year as a function of institutional prestige (ribbon bands denote 95% confidence intervals). In both panels authors are grouped based on whether in their first 3 career years they coauthored papers with one (orange), multiple (red), or no (blue) top scientists. The grey shaded area in b represents the unconditional probability of becoming a top scientist for the entire pool of junior researchers Full size image

The left panel reveals, as one would intuitively expect, a positive correlation between institutional prestige and coauthorship with top scientists. The right panel, in turn, shows a positive correlation between institutional prestige and the probability of becoming a top-cited scientist in the long run. Yet, regardless of the relative position in terms of institutional prestige, such probability is significantly higher for researchers who coauthored papers with one top scientist, and markedly higher for those who did so with more than one top scientist.

Furthermore, the right panel shows that, on average, the probability of becoming a top-cited scientist is below the aforementioned unconditional one (grey shaded area) for almost the entire pool of junior researchers lacking a top coauthor in their early career, with only those in the top quintile of institutional prestige managing to do better. Conversely, junior researchers who publish with top-cited scientists are in the opposite situation, and achieve better-than-average impact regardless of their position in terms of institutional prestige. In Supplementary Fig. 2 we show that patterns very similar to those in Fig. 1b are obtained when considering the citations accrued by the three groups of junior researchers throughout their career.

Different dimensions of early career excellence

We now proceed to expand this analysis by assessing how excellence in different aspects of academia relates with long-term impact by splitting all junior researchers in our pool into eight mutually exclusive groups based on early career performance according to different indicators. Namely, we consider institutional prestige (I), productivity (P), measured by the number of papers published within the first 3 career years, and the citations received within the first 3 career years (C). We group junior researchers depending on whether they belong to the top 10% of authors across such dimensions. (Authors are compared against their peers in the same discipline who started their career in the same year. In all cases where the top decile falls within a group of scientists with the same number of papers of citations, we only select those scientists whose number of papers or citations is strictly larger than the top decile. In Supplementary Table 5 we report the values of such thresholds for all disciplines and years.) For example, we label as I the group of researchers belonging to the top 10% in terms of institutional prestige, as IP (IC) the group of researchers belonging to the top 10% in both institutional prestige and productivity (citations), and as IPC the group of authors belonging to the top 10% of all three dimensions.

The top panel in Fig. 2 shows, for each category, the number of junior researchers in our dataset who coauthored at least one paper with a top-cited scientist vs. those who did not. As it can be seen, the only group where the latter are the clear majority is the one of researchers who do not belong to the top 10% of any category. In all other cases, there is either a balance or a majority of junior researchers who coauthored with a top scientist, highlighting the presence of an overall positive correlation between coauthorship with top scientists and early career performance across all the dimensions we consider.

Fig. 2 Relationship between long-term impact and early career performance. a Number of junior researchers belonging to the top 10% in various categories of early career performance (I denotes institutional prestige, P denotes productivity, and C denotes citations received. All three such quantities are computed based on the first 3 career years). b Probability for authors belonging to each group of being a top scientist in their 20th career year. c Number of citations received per paper published by authors belonging to each group between their 4th and 20th career year. In b and c we report 95% confidence intervals, and we report the \(p\)-values obtained via \(t\)-tests to assess the statistical significance of differences between the sub-group of junior researchers who coauthor work with a top scientist in the first 3 career years and the sub-group of those who do not. \(^*p\ <\ 0.05\); \(^{**}p\ <\ 0.01\); \(^{***}p\ <\ 0.001\) Full size image

The middle panel shows the probability of becoming a top-cited scientist for authors belonging to each of the above categories. Overall, independently of coauthorship with top scientists, we find this probability to be progressively higher as we consider authors belonging to the top 10% of more categories, signalling a positive correlation between early and long-run career impact. Notably, such probability is above 50% for junior researchers belonging to the top decile of two dimensions (IP, IC, and PC), and hovers above 75% for junior researchers in the top decile of all three categories (IPC).

However, in all categories except the latter we find the above probability to be systematically higher for the sub-groups of junior researchers with an early career paper coauthored with a top-cited scientist, and such differences are found to be statistically significant by a \(t\)-test in the cases labelled as “None”, I, P, and PC. The results clearly show that the relative increase in the probability of becoming a top scientist tends to be larger in less exclusive groups, particularly in the group of junior researchers who do not belong to the top 10% of any of the categories considered. Indeed, for this group the coauthorship with a top scientist almost doubles such probability, which jumps from 15.7 to 27.2%. Large increases in the probability of becoming a top scientist are also apparent for the I and P groups. At the opposite end, coauthorship with a top scientist does not make a difference for junior researchers in the IPC group. One could interpret this as evidence that members of the latter group are with high probability already on the pathway to long-term career impact, regardless of their coauthors. In contrast, coauthorship with a top scientist truly has potential career-altering consequences for junior researchers who are not in the top 10% of any of the categories we considered. In the following, we elaborate more on the mechanics leading to such consequences.

The bottom panel in Fig. 2 shows analogous results in terms of citations received per paper published between the 4th and 20th career year. We observe similar patterns to those shown in the middle panel, i.e. we find the sub-groups of junior researchers who coauthor with top scientists to systematically receive more citation than their peers in all categories. For the sake of readability, here we only show aggregate results. In Supplementary Figs. 3–6, we show equivalent figures for each of the four disciplines we consider.

In Supplementary Fig. 7 we show instead a breakdown of Fig. 2c, showing the citations received by the junior researchers between their 4th and 20th career year from papers published with and without top scientists as coauthors, in order to assess the contribution of the latter to the junior researchers’ impact. In Supplementary Fig. 8, we specialise the latter case to each discipline. When aggregating all disciplines, we see again that those who coauthored work with top scientists in the first 3 years still achieve greater impact than those who did not, with statistically significant differences in the same group as in Fig. 2c, except for the one labelled as C. When considering individual disciplines, we still observe relevant differences between the junior researchers who coauthor work with top scientists and those who do not, with the former still typically attracting more citations per paper published than the former. However, in most disciplines such differences are statistically significant only for those sub-groups of junior researchers who belong to the top 10% of their field in just one or none of the dimensions considered (i.e. the sub-groups labelled as ‘None’, I, P, and C). This result suggests that the impact of early career coauthorship with a top scientist is somewhat inversely proportional to the impact already achieved by a junior researcher, and in the following we will demonstrate that this is indeed the case.

Matched paired analysis

The above results begin to reveal a systematic competitive advantage for junior researchers who coauthor with a top scientist when considered as a group, but do not yet quantify such advantage at the level of individual careers. Figuratively speaking, this could only be measured by tracking a young researcher in two parallel careers where all factors remain identical, except that in one she gets to write a paper with a top scientist and in the other she does not. This is akin to a medical trial situation, where the effectiveness of a new drug has to be assessed by forming a treatment and a control group.

We follow this line of reasoning and form two such groups in order to carry out a matched pair experimental design. Namely, in each of the disciplines considered we identify pairs of junior researchers with similar early career profiles in terms of institutional prestige, productivity, and impact (i.e. number of citations accrued), with the only difference being that only one of the two has coauthored a paper with a top scientist during her first 3 career years (we shall refer to this as treatment). We form such pairs via propensity score matching, following ref. 41 (see Supplementary Fig. 9). We then proceed to assess whether this has a detectable long-term effect by computing the average number of citations accrued between career years 4 and 20 by authors belonging to each group, both including and excluding those received by the papers published during the first 3 career years. In order to discount productivity as a possible confounding factor, we also compute the average number of citations received per paper published between career years 4 and 20. In particular, we focus on authors with low early career impact (i.e. with no more than ten citations received in the first 3 career years) in order to focus on the group of junior researchers who can benefit the most from the interaction with a top scientist. Overall, there are 2324 such authors in Cell Biology, 5635 in Chemistry, 5605 in Neuroscience, and 5414 in Physics.

The results of the analysis are reported in Table 1. In all four disciplines, we identify several hundreds of matched pairs of junior researchers with similar early career profiles, except for the presence/lack of a top coauthor. In all disciplines we find the treatment group of junior researchers who coauthored with a top scientist to achieve a higher impact, regardless of the specific citation metric, and we find the differences with respect to the control group to be statistically significant in all cases, both when testing sample averages via \(t\)-tests and when testing the entire distributions via Kruskal–Wallis tests (in order to account for the skewness in the data, especially in the case of citations).

This result demonstrates the long-lasting competitive advantage associated with early career coauthorship with top scientists. In order to understand the mechanism through which such a competitive advantage materialises, we measure how often on average junior researchers belonging to the two above groups get to coauthor papers with top scientists between years 4 and 20 of their careers. The results from this analysis show that the treatment group consolidates its early competitive advantage by getting more opportunities to further collaborate with top scientists than the control group. This happens both in terms of the number of different top coauthors (excluding those already accounted for in the first 3 career years for the treatment group) and the number of individual coauthorship events with top scientists. Indeed, we find statistically significant differences between the treatment and control groups in all disciplines, with the former outperforming the latter in terms of repeated access to top scientists.

In Supplementary Table 6 we show that within pairs the junior researcher in the treatment group is the most cited in absolute terms (\(p\ <\ 0.001\) in Chemistry, Physics, and Neuroscience, \(p\ <\ 0.01\) in Cell Biology, one-tailed binomial test), and also the one who subsequently gets to coauthor more times with top scientists (\(p\ <\ 0.001\) in Chemistry, Physics, and Neuroscience, \(p=0.38\) in Cell Biology, one-tailed binomial test). As an additional robustness control, in Supplementary Table 7 we show that the matched pair analysis results do not change when matching junior researchers based on their first 5 career years. Furthermore, in Supplementary Table 8 we report additional results obtained when including the number of unique coauthors for publications during the first 3 career years as an additional covariate. We find our results to be qualitatively unchanged for the most part, although with reduced statistical significance in Chemistry and Neuroscience.

Put together, the above results suggest that coauthorship with a top scientist potentially represents a good predictor of impact in a long-lived academic career. This is confirmed by the outcomes of discipline-specific linear and logistic regressions, where we use early career coauthorship with at least one top scientist as a binary regressor against future impact, while controlling for institutional prestige, productivity, and impact in the first 3 career years (see the regression plot in Fig. 3). As dependent variables, we use the number of citations accrued in the first 20 career years in the case of linear regressions, and a binary variable to indicate whether a junior researcher had become a top scientist herself (i.e. among the top 5% cited scientists in her discipline) in her 20th career year in the case of logistic regressions, respectively. We systematically find coauthorship with at least one top scientist to be a statistically significant predictor of long-term future impact. Odds ratios for early collaboration with top coauthors in logistic regressions are: 1.19 for Cell Biology, 1.15 for Chemistry, 1.14 for Neuroscience, and 1.14 for Physics.