Systematic literature review

We aimed at assessing adaptive phenotypic responses to climate change across six broad taxa of animals: arachnids, insects, amphibians, reptiles, birds and mammals. We distinguished between two climatic variables: temperature and precipitation. We relied on the authors of the original studies for their expertise and knowledge of the biology of the species and system in the: (1) choice of the appropriate time window over which the annual means of the climatic values were calculated, rather than using a single time window for all species. For instance, if in a bird study the mean temperature over the 2 months preceding nesting was used as an explanatory variable for the timing of egg laying, we used this specific climatic variable; (2) choice of the specific climatic variables, be it air, sea surface or soil temperature, rather than using a single climatic variable across all species; and (3) choice of the spatial scale of the study, so that the measured variables were considered local at that scale. We focused on studies that recorded both changes of at least one climatic variable over time and changes in either morphological or phenological traits for at least one studied species. Phenological traits reflect shifts in timing of biological events, for example, egg-laying date, antler cast date or mean flight date in insects. Morphological traits reflect the size or mass of the whole body or its parts (e.g., bill length, wing length, body mass).

To assess whether trait changes were adaptive, we only used studies that measured selection on the trait(s) of interest by means of linear selection differentials29 using one of the following fitness components: recruitment, reproduction, and adult survival. Linear selection differentials for all studies were calculated following Lande and Arnold29, as the slope of the linear model, with relative fitness (individual fitness divided by mean fitness) as response and the z-transformed trait value as predictor. Only studies that reported SE estimates along with annual linear selection differentials were retained. For the majority of studies, we extracted selection differentials directly from the published studies, and for 12 studies, we calculated them ourselves using the respective individual-level data shared by the authors.

To identify the studies satisfying the above-mentioned criteria, we searched the Web of Knowledge (search conducted on 23 May 2016, Berlin) combining the following keywords for climate change (‘climate change’ OR ‘temperat*’ OR ‘global change’ OR ‘precipit’), adaptation (‘plastic*’ OR ‘adapt*’ OR ‘selection’ OR ‘reaction norm’) and trait category (‘body size’ OR ‘body mass’ OR ‘body length’ OR ‘emerg* date’ OR ‘arriv* date’ OR ‘breed* date’). For taxa, we used broad taxon names in the first search (‘bird*’ OR ‘mammal*’ OR ‘arachnid*’ OR ‘insect*’ OR ‘reptil*’ OR ‘amphibia*’ OR ‘spider*’). Next, to increase the probability of finding the relevant papers, we run the search by using instead of taxa names detailed names below the level of the Class, as follows: (‘rodent*’ OR ‘primat*’ OR ‘rabbit*’ OR ‘hare’ OR ‘mole’ OR ‘shrew*’ OR ‘viverrid*’ OR ‘hyaena’ OR ‘bear*’ OR ‘seal*’ OR ‘mustelid*’ OR ‘skunk*’ OR ‘Ailurid*’ OR ‘walrus*’ OR ‘pinniped*’ OR ‘canid*’ OR ‘mongoos*’ OR ‘felid*’ OR ‘pangolin*’ OR ‘mammal*’ OR ‘bird*’ OR ‘flamingo*’ OR ‘pigeon*’ OR ‘grouse*’ OR ‘cuckoo*’ OR ‘turaco*’ OR ‘rail*’ OR ‘wader*’ OR ‘shorebird*’ OR ‘penguin*’ OR ‘stork*’ OR ‘pelican*’ OR ‘condor*’ OR ‘owl*’ OR ‘hornbill*’ OR “hoopoe*’ OR ‘kingfisher*’ OR ‘woodpecker*’ OR ‘falcon*’ OR ‘parrot*’ OR ‘songbird*’ OR ‘turtle*’ OR ‘tortoise*’ OR ‘lizard*’ OR ‘snake*’ OR ‘crocodil*’ OR ‘caiman*’ OR ‘alligator*’ OR ‘reptil*’ OR ‘frog*’ OR ‘salamander*’ OR ‘toad*’ OR ‘amphibia*’ OR ‘insect*’ OR ‘beetle*’ OR ‘butterfl*’ OR ‘moth*’ OR ‘mosquito*’ OR ‘midge*’ OR ‘dragonfly*’ OR ‘wasp*’ OR ‘bee*’ OR ‘ant*’ NOT (‘fish*’ OR ‘water*’ OR ‘aquatic*’)). These detailed taxon names were combined with the same keywords for climate change, adaptation and trait as before. Finally, we joined the unique records from each of these two searches in a single database.

The literature search returned 10,090 publications, 56 of which were retained after skimming the abstracts. Of these 56 publications, 23 contained the data necessary to assess the three conditions required to infer adaptive responses and were used for assembling the final dataset (PRCS dataset). In cases where several publications reported on the same study system (same species in the same location, measuring the same traits and selection via exactly same fitness components), we retained the publication that reported data for the longest time period. We assembled the PRCS dataset by directly extracting the data from the identified 23 publications wherever possible, or by contacting the authors to ask for the original data. Data were extracted either from tables directly or from plots by digitizing them with the help of WebPlotDigitiser or the metagear package in R61. In the process of contacting the authors, one research group offered to share relevant unpublished data on two more species, adding two more studies to the dataset, totalling 25 publications. The PRCS dataset consisted of 71 studies containing data on annual values of climatic factors, annual phenotypic trait values and annual linear selection differentials for 17 species in 13 countries (Supplementary Data 3).

The remaining 33 publications from the originally selected 56 (58 with the shared unpublished data considered as two publications) did not report data on selection, but presented data on the annual values of climatic factors and mean population phenotypic traits, totalling 4764 studies that covered 1401 species. We retained these studies and combined them together with the 71 studies in the PRCS dataset to assemble the ‘PRC’ dataset (Supplementary Data 3). With the PRC dataset, we did not aim for comprehensive coverage of the literature published on the topic. Instead, we used this larger PRC dataset to verify whether the smaller PRCS dataset was representative in terms of climate change over time and trait change in response to a climatic factor. A flowchart showing the numbers of studies included at each stage of the systematic literature review is given in Supplementary Fig. 10.

Assessing whether the responses are adaptive

Separate analyses were conducted for the PRCS and PRC datasets and, within each of them, for temperature and precipitation. All analyses were conducted using linear models as no deviation from linearity was detected by visual inspection of the relations between (1) year and climate, (2) trait and climate and (3) selection and year for each study (Supplementary Figs. 11–15). First, we assessed for each study condition 1 necessary to infer adaptive responses (i.e. the extent to which the climatic variable changed directionally over years). To this end, we fitted for each study a mixed-effects model with the climatic variable as the response and the year as a fixed covariate, taking into account temporal autocorrelation (as random effect):

$${\mathrm{Clim}}_t = \alpha + \beta _{{\mathrm{Clim}}} \times {\mathrm{Year}}_t + \varepsilon _t + \varepsilon ,$$ (1)

where Clim is the quantitative climate variable, Year the quantitative year covariate, t the time, ε t is a Gaussian random variable with mean zero and following an AR1 model over years, and ε is an independent Gaussian random variable with mean zero and variance representing the residual variance of the study. β Clim is the regression coefficient reflecting the slope of the climatic variable on the year for the study (Fig. 1b). To avoid overfitting, we refitted the same model without the AR1 structure and retained, for each study, the model structure leading to the lowest marginal AIC62. This approach was applied to all the models fitted to each study (i.e. to assess conditions 2 and 3, and change of selection across years, as described below).

We assessed condition 2, the relation between the trait and the climatic variable, separately for phenological and morphological traits. For this, we fitted a mixed-effects model for each study with mean annual population trait values as a response and the climatic variable and the year as fixed effects. Year was included as a quantitative predictor in this model to account for the effects of variables other than the considered climatic variable, which had changed with time and could have affected the trait. Examples of such variables are any environmental alterations, such as land use change and succession but also other climatic variables, which potentially could have affected the trait, but for which we did not have data. In this model, we took into account temporal autocorrelation in the response variable and weighted the residual variance by the variance of the response variable (i.e. the reported squared SE of the mean annual population trait values) to account for between-year variation in uncertainty associated with mean annual population trait values. Prior to fitting the models we z-transformed trait values (i.e. subtracted the mean and divided by their reported standard deviation) to later compare the effect of the climatic variable on different traits. Accordingly, we also transformed the weights of the residual variance by dividing the reported SEs by the SD of the mean annual population trait values per study. The fitted model was:

$${\mathrm{Trait}}_t = \alpha + \beta _{{\mathrm{Trait}}} \times {\mathrm{Clim}}_t + \gamma\, \times {\mathrm{Year}}_t + \varepsilon _t + \varepsilon ,$$ (2)

where Trait is the mean phenotypic trait (z-scaled across the years within the study), Clim the quantitative climate covariate, Year the quantitative year covariate, t the time, ε t is a Gaussian random variable with mean zero and following a AR1 model over years, and ε is an independent Gaussian random variable with mean zero and variance proportional to the estimated variance of the mean phenotypic trait (which depends on t). β Trait is the regression coefficient reflecting the slope of the trait on the climatic variable for the study (Fig. 1c).

We assessed condition 3 of whether the trait change was associated with fitness benefits in a two-step procedure. In the first step, we fitted for each study an intercept-only mixed-effects model with annual linear selection differentials as a response. We allowed for temporal autocorrelation and weighted the residual variance by the variance of the annual linear selection differentials (i.e. the reported squared SE of the annual selection differentials). The fitted model was:

$${\mathrm{Sel}}_t = \alpha + \varepsilon _t + \varepsilon ,$$ (3)

where Sel is the estimate of the yearly linear selection differential, t is the time, ε t is a Gaussian random variable with mean zero and following an AR1 model over years, and ε is an independent Gaussian random variable with mean zero and variance proportional to the estimated variance of the annual linear selection differential (which depends on t). The intercept α describes a non-zero mean of the autoregressive process. The predictions from the fitted model (Sel t ), including the random effect, are estimates of annual linear selection differentials, and their inverse-variance weighted average is termed ‘weighted mean annual selection differential’, WMSD (Fig. 1e). The variance used in weighting is the prediction variance. The SE of the WMSD is deduced from these weights and from the covariance matrix of the predictions (see source code of the function extract_effects() in our R package ‘adRes’ for details).

In the second step, to assess whether the response is adaptive, we considered WMSD in combination with the slopes obtained for the previous two conditions, as follows: we defined a trait change to be adaptive in response to climate if the climate-driven change in phenotype occurred in the same direction as linear selection. In contrast, if the climate-driven change in phenotype occurred in the direction opposite to selection, then the response was considered maladaptive. We measured the climate-driven change in phenotype as the product of the slopes obtained for conditions 1 and 2. A WMSD estimate of zero indicates a lack of selection11. A WMSD of zero together with no trait change could indicate a stationary optimum phenotype, and a WMSD of zero together with a significant change in trait could indicate that a moving optimum phenotype is perfectly tracked by phenotypic plasticity (a negligible WMSD could also imply a flat fitness surface, i.e. no fitness penalty for deviating from the optimum). To assess whether the trait is adaptive, we plotted for each study the WMSD against the product of slopes extracted from conditions 1 and 2, which quantifies the observed climate-driven trait change over time (Fig. 5). The studies qualify as adaptive if their WMSD has the same sign as the product of the slopes assessing conditions 1 and 2.

We also fitted a modified version of the model specified in Eq. (3) to assess a potential temporal (linear) change in the annual linear selection differentials over years. To this end, for each study we fitted a mixed-effects model that accounted for temporal autocorrelation. We weighted the residual variance by the variance of the annual linear selection differentials (i.e. the reported squared SE) to account for uncertainty in the estimates of annual selection differentials. The fitted model was:

$${\mathrm{Sel}}_t = \alpha + \beta _{{\mathrm{Sel}}} \times {\mathrm{Year}}_t + \varepsilon _t + \varepsilon ,$$ (4)

where Sel is the estimate of the yearly linear selection differential, Year the quantitative year covariate, t the time, ε t a Gaussian random variable with mean zero and following an AR1 model over years, and ε is an independent Gaussian random variable with mean zero and variance proportional to the estimated variance of the yearly linear selection differential (which depends on t). β Sel is the regression coefficient that corresponds to the slope of the annual linear selection differentials on the year for the study.

Meta-analyses

To demonstrate general responses across species and locations, we require each of the three conditions necessary to infer adaptive responses to be met consistently across studies, for example, that, on average, temperature increased over time, warmer temperatures were associated with advancing phenology and advancing phenology corresponded to fitness benefits (i.e. negative selection on phenological traits given the two above-mentioned conditions are satisfied). To test for such general trends in adaptive responses across studies, we fitted three mixed-effects meta-analyses to the PRCS dataset, two for the first two conditions and the third to assess whether WMSD differed from zero. We tested the third condition in two ways. First, we performed a binomial test to compare the proportion of studies exhibiting adaptive (i.e. same sign for WMSD and the climate-driven trait change over time) vs. maladaptive (i.e. WMSD and the climate-driven trait change over time differ in their signs) responses to climate change. Second, we performed a mixed-effects meta-analysis similar to the three other ones.

First, we assessed whether, across studies, the values of the climatic factor changed with time by using the slope of a climatic factor on year (obtained from the mixed-effects models of condition 1 for each study, see above) as response (i.e. effect size in meta-analysis terminology), and study identity and publication identity as qualitative variables defining random effects influencing the intercept. Second, to assess whether climate change was associated with trait changes across studies, we used the slope of the z-transformed trait on the climatic factor (obtained from the mixed-effects models of condition 2 while accounting for the effect of year on the trait) as response and study identity and publication identity as qualitative variables defining random effects influencing the intercept. We fitted separate models for phenological and morphological traits, because our dataset contained fewer studies of the latter compared to the former. Since morphological traits included either measures of body mass or size (e.g. wing, tarsus and skull length), we tested whether the effect of temperature depended on the type of measure by including it as a fixed-effect covariate with three levels (body mass, size and body condition index; we distinguished body condition index from the two other levels as it has elements of both of them). Analogously, we assessed whether the effect of temperature on phenology depended on the type of phenological measure used, by including it as a fixed-effect covariate with three levels, similarly to Cohen et al.17: arrival, breeding/rearing (e.g. nesting, egg laying, birth, hatching) and development (e.g. time in a certain developmental stage, antler casting date). Third, to assess whether, across studies, traits were under positive or negative selection during the study period, we used as response the WMSD values obtained from the mixed-effect models for the first step assessing condition 3. In this model, we also used study identity and publication identity as qualitative variables defining random effects influencing the intercept. We tested whether selection depended on generation length and differed among fitness components by including these latter variables as fixed effects in the model. Generation length was extracted from the literature, mainly using the electronic database of BirdLife International. Similarly, to assess whether across studies there was a directional linear change in the annual linear selection differentials over time, we fitted a mixed-effects model using as response the slopes of the annual linear selection differentials on time (obtained with Eq. (4)). This model included study identity and publication identity as qualitative variables defining random effects influencing the intercept. Finally, to assess whether responses were on average adaptive, we also ran a mixed-effects meta-analytic model using as response the product of the WMSD with the sign of the climate-driven trait change over time. We included study identity and publication identity as qualitative variables defining the random effects in this model. We fitted separate models for phenological and morphological traits to test whether both WMSD and the product of WMSD with the sign of the climate-driven trait change differed from zero.

For each type of climatic variable (temperature and precipitation) in the PRC dataset, we fitted two mixed-effects meta-analyses, analogous to the mixed-effects meta-analytic models we ran on the PRCS dataset. With these meta-analyses we assessed whether across the studies (1) there was a directional change in the climatic values over time and (2) traits were affected by the climatic variable. As responses (i.e. effect sizes) in these models, we used the slopes extracted for each study from the respective mixed-effects models fitted analogously to those used for the PRCS dataset (see section above). For both morphological and phenological traits, we assessed whether the effect of climate on traits differed among taxa by including taxon as a fixed effect. For morphological traits, we also assessed whether the responses to climate differed among endothermic and ectothermic animals, by including endothermy as a fixed effect in the model.

All data analyses were conducted in R version 3.5.063 and implemented in the R package ‘adRes’, which is provided for the sake of transparency and reproducibility. Mixed-effects models for each study and mixed-effects meta-analytic models were fitted using restricted maximum likelihood (ML) with the spaMM package version 2.4.9464. For each meta-analytic mixed-effects model, we conducted model diagnostics by inspecting whether the standardized residuals deviated from a normal distribution and whether there were any patterns in standardized residuals when regressed on the predictor. Model diagnostics were satisfactory for all models. We assessed the significance of fixed effects and intercepts with asymptotic likelihood ratio χ2 tests by comparing the model with a given effect vs. the model without the effect, both fitted using ML in spaMM.

We assessed the amount of heterogeneity among studies in our meta-analyses using commonly recommended approaches65. In particular, we tested whether the total amount of heterogeneity (Q) was statistically significant and estimated Higgins I2 and H2 (ref. 65). Higgins I2 reflects the proportion of total heterogeneity due to between-study variation (i.e. random effects) and ranges from 0 to 1. A value of 0 means that heterogeneity is due to within-study variation exclusively, whereas a value of 1 indicates that heterogeneity is due to between-study variation. This metric is, therefore, comparable among different meta-analyses. H2 is a ratio showing the proportion of observed heterogeneity in relation to what would be expected under the null hypothesis of homogeneity. For example, a value of 2 means that there is twice as much variation as would be expected if no between-study variation were present (i.e. H2 = 1). We found that the amount of heterogeneity differed among the two datasets and tested models, with moderate heterogeneity for models testing conditions 1 and 2 and considerable heterogeneity for models testing condition 3 (Supplementary Note 1, Supplementary Table 1).

We also tested for the evidence of publication bias by (1) visual inspection of the funnel plots and (2) Egger’s test65. No evidence of small-study effect (that may be an indication of publication bias) was found for effect sizes used to test all three conditions in the PRCS dataset (Supplementary Figs. 16–17, Supplementary Note 1).

Sensitivity analyses

One study in the meta-analytical models testing conditions 2 and 3 for phenology in the PRCS dataset appeared to be an outlier (ref. 66, Figs. 3 and 4). Therefore, we also re-ran the models without this study (Supplementary Tables 2 and 3). Additionally, the PRCS dataset contained mainly studies on bird species, and only one for a mammal species67. To test how sensitive the results were to inclusion of this taxon, we re-ran the analyses after excluding the study for mammal species. The main findings were qualitatively unaffected by excluding either the only mammal study or an apparent outlier from the PRCS dataset, or both (Supplementary Tables 2 and 3).

Measures of phenological responses are known to be sensitive to methodological biases30, in particular to temporal trends in abundance of the sampled species32. Indeed, if species abundance increases over time, the probability of recording earlier events increases (especially if phenology is measured as the first occurrence date), meaning that species abundance can affect mean of the distribution of a phenological trait. For the same reason, the variance around the mean phenological response may be sensitive to population size, with higher variance at lower population sizes. To assess the sensitivity of our results to potential changes in population sizes over time, we fitted a heteroskedastic model in which population abundance was included both as a fixed-effect explanatory variable and as an explanatory variable for the model of the residual variance. This model is an extension of the models specified in Eq. (2) that were used to assess condition 2. This model was fit to each study for which abundance data were available and where the duration of the study was at least 11 years (28 out of 42 studies). The fitted model was:

$${\mathrm{Trait}}_t = \alpha + \beta _{{\mathrm{Trait}}} \times {\mathrm{Clim}}_t + \gamma \times {\mathrm{Year}}_t + \delta \times {\mathrm{Abund}}_t + \varepsilon _t + \varepsilon ,$$ (5)

where Trait is the mean phenotypic trait (z-scaled across the years within each study), Clim the quantitative climate covariate, Year the quantitative year covariate, Abund the quantitative covariate for species abundance, t the time, ε t is a Gaussian random variable with mean zero and following an AR1 model over years, and ε is an independent Gaussian random variable with mean zero and variance proportional to the estimated variance of the mean phenotypic trait (which depends on t). β Trait , γ and δ are regression coefficients that correspond to the slopes of the trait on the climatic variable, the trait on year and the trait on the abundance for the study, respectively. To avoid overfitting, we refitted the same model without the AR1 structure and retained, for each study, the model structure leading to the lowest marginal AIC, analogously to how it was done for the models fitted to assess conditions 1, 2 and 3.

The results were qualitatively unaffected by the inclusion of abundance in the models: the rate of advancement in phenology estimated across the studies was −0.346 ± 0.102 SD per °C (LRT between the model with and without change in phenology: χ2 = 8.4, df = 1, p = 0.004) when including abundance and −0.335 ± 0.088 SD per °C without abundance (LRT: χ2 = 9.19, df = 1, p = 0.002; for comparison’s sake both models were fitted to a subset of studies for which abundance data were available). We also found that although abundance affects phenology, its effects were on average smaller compared to the effects of temperature (Supplementary Fig. 6).

Implications for population persistence

To assess implications of our findings for population persistence, we first calculated the actual lag (Lag) between the observed mean phenotype and the optimum by following Estes and Arnold (2007)37:

$${\mathrm{Lag}} = \beta \cdot \left( { \omega ^2 + \sigma _{\mathrm{p}}^2} \right),$$ (6)

where β is a standardized linear selection differential, ω2 a width of the fitness function, and \(\sigma _{\mathrm{p}}^2\) phenotypic variance (here scaled to 1). This lag is expected to be asymptotically constant when the optimum is shifting at a constant rate and under other assumptions detailed in Bürger and Lynch35.

Next, we calculated the critical lag (Lag crit ) between the phenotype and the optimum following Bürger and Lynch35 (see also ref. 68):

$${\mathrm{Lag}}_{{\mathrm{crit}}} = k_{\mathrm{c}}/s,$$ (7)

where k c is the critical rate of environmental change and s is a measure for the strength of selection, \(s = \sigma _{\mathrm{g}}^2/\big( {\sigma _{\mathrm{g}}^2 + V_{\mathrm{s}}} \big)\), with \(\sigma _{\mathrm{g}}^2\) genetic variance and V s the strength of stabilizing selection around the optimum, calculated as \(V_{\mathrm{s}} = \omega ^2 + \sigma _{\mathrm{e}}^2\), with ω2 the width of the fitness function and \(\sigma _{\mathrm{e}}^2\) environmental variance in the trait.

The critical lag reflects a situation where the population can just replace itself (population growth λ = 1) and comparison between the actual and the critical lag provides insight into the persistence of populations. If the actual lag is larger than the critical lag, then the population growth rate is lower than 1, implying substantial extinction risk. We therefore focused our numeric analyses on this difference between the actual lag and the critical lag. Note that this analysis relies on strong assumptions described in the original papers (e.g. heritability is considered in the estimation of the critical lag but not in that of the actual lag, and that the reverse is true for plasticity), but more accurate methods, for example, ref. 36, would require detailed data that are not available for most of our study populations.

In the first step, we assessed the difference between the actual and the critical lag across a range of parameter values (Supplementary Table 4) to study the sensitivity of the difference to known (i.e. estimated) or unknown parameters. This analysis revealed that the results are most sensitive to the values of ω2 and β (Fig. 6a–f). Therefore, in the second step, we computed the difference among the observed and critical lags for each study in our PRCS dataset (Fig. 6g), by (1) using as β the absolute values of the estimates of WMSD for each study and (2) drawing 1000 random ω2 per study. We drew these random values from the empirical distribution of ω2 estimates provided in Fig. 7 from Estes and Arnold37.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.