Land cover dataset

The European Space Agency (ESA) Climate Change Initiative (CCI) land cover (LC) product is used to map changes in vegetation cover. The ESA CCI-LC maps are provided at a spatial resolution of 300 m for a period of 24 years, from 1992 to 201532. These maps characterize the global surface using 37 land cover classes based on the United Nations Land Cover Classification System (UNLCCS)71, and were designed to overcome previous limitations and reduce uncertainty in the representation of land cover and LCCs in climate models22,31,32. The dataset was produced after combination of the global daily surface reflectance of five different satellite observation systems, with the ambition to keep high levels of consistency over time. The accuracy of the CCI-LC products was assessed at a global level using an independent validation dataset, and estimated to be of 75.4%32. The highest accuracy was found for cropland classes, forests, urban areas, bare areas, water bodies and perennial snow and ice. Mosaic classes, lichens and mosses showed the lowest accuracy.

We used three types of land cover data to explore the effects of recent land cover changes in European climate and single out the role of cropland abandonment: LC1992, LC2015, and NoCRP_AB. LC1992 and LC2015 represent the land cover in Europe in 1992 and 2015, respectively. NoCRP_AB is a land cover dataset where the IGBP classes cropland and cropland/natural vegetation mosaic in 1992 are not allowed to convert to other land classes. However, other land classes are allowed to convert to cropland and cropland mosaic according to the historical transitions. This means that, between 1992 and 2015, cropland and cropland/natural vegetation mosaic are only allowed to expand, and not to contract.

In order to facilitate the interpretation of the land transitions that occurred in Europe (Fig. 1), the 37 UNLCCS classes were aggregated into the generic IPCC land classes, according to the specific cross-walking table provided by the ESA CCI-LC products (reproduced in Supplementary Table 3)32.

Regional climate simulations

Regional climate simulations are performed with the Weather Research and Forecasting (WRF) model version 3.9.141. WRF is a next-generation mesoscale model that is suitable for operational research across scales. WRF produces simulations based on actual atmospheric conditions (i.e., from observations and analyses), and it has been validated to capture the spatiotemporal patterns of climate against observations in Europe72,73,74.

The specific configuration of WRF used in this study follows the settings of the international Coordinated Regional Climate Downscaling Experiment (CORDEX) initiative (EURO-CORDEX)74. The initial and lateral boundary conditions are from the European Centre for Medium-Range Weather Forecasts Interim reanalysis (ERA-Interim)75. The physical parameterization schemes include the Thompson microphysics scheme76, the Rapid Radiative Transfer Model for GCMs longwave and shortwave radiation77, the Mellor-Yamada Nakanishi and Niino boundary layer scheme78, the Kain-Fritsch convection parameterization79, and the Community Land Model version 4.0 (CLM4) with IGBP-MODIS land use classification80,81.

CLM4 is a state-of-the-science land surface process model that represents several aspects of the land surface, including surface heterogeneity, and consists of components related to land biophysics, hydrologic cycle, biogeochemistry, human dimensions, and ecosystem dynamics. CLM4 in WRF has a detailed description of land surface, in which the vertical structure includes a single-layer vegetation canopy, a five-layer snowpack, and a ten-layer soil column41. In each grid cell, the land surface is categorized into five primary subgrid land-units (glacier, lake, wetland, urban, and vegetated) that share the same atmospheric forcing and flux feedback to the atmosphere within a grid cell. The surface variables are calculated by averaging the subgrid quantities weighted by their fractional areas (tile approach). The urban land-unit uses the “urban canyon” concept82 to represent the canyon geometry, described by building height and street width. The vegetated subgrid is comprised of up to 15 plant functional types (PFTs) that differ in structure and physiology as leaf and stem optical properties, root distribution parameters, aerodynamic parameters, and photosynthetic parameters41. These parameters are monthly prescribed and daily updated by linearly interpolating monthly values83. CLM4 includes new treatments of soil column-groundwater interactions, soil evaporation, aerodynamic parameters for sparse/dense canopies, vertical burial of vegetation by snow, snow cover fraction and aging, black carbon and dust deposition, and vertical distribution of solar energy. The CLM two-stream radiation model calculates the model equivalent surface albedo using climatological monthly soil moisture along with the vegetation parameters of PFT fraction, leaf and steam area index. Several PFTs can coexist in a given grid cell, and the energy balance and surface fluxes are calculated at the PFT level before being aggregated at the grid-scale level based on the proportion of PFTs in the grid cell. As WRF reads as input IGBP-MODIS/USGS LC classifications, we use a cross-walking table8 (Supplementary Table 4) to translate the 37 classes in CCI LC to the 21 classes of the IGBP-MODIS classification system, which CLM4 then translates to PFTs. The distribution to PFTs is a well-known potential source of uncertainty, especially for mixed classes and northern boreal forests (an area where little LCCs occurred in our study)29. The use of cross-walking tables is a common viable approach to ensure transparency and reproducibility of model results until the mapping of plant functional traits at global scale will become possible and made available22,29.

Due to limitations in terms of computational time, a 24-year (from 1 January 1992 to 31 December 2015) averaged dataset is produced from the ERA-Interim data and used as initial and lateral boundary conditions. This new dataset contains 6-hourly data from 1 January at 0000 UTC and run through 31 December at 1800 UTC for a period of 1 year. The land cover datasets are aggregated at a horizontal resolution of 0.11° (ca. 12 km, the highest resolution available from EURO-CORDEX) retaining the proportions of the different land classes per grid cell. Three independent WRF simulations are then performed with the three land cover datasets LC1992, LC2015, and NoCRP_AB. Since lateral boundary conditions do not vary between experiments, the resulting differences in model outcomes can be attributed to the different land cover datasets. The simulations are performed with 40 atmospheric levels and an integration time step of 72 s. The first 15 days are treated as model spin-up time and therefore excluded from the analysis. The analysis also excludes 30 simulation grids (ca. 360 km) from the border of the EURO-CORDEX domain to clear the noise in the lateral boundary conditions.

The simultaneous contribution of 2-m air temperature (T) and absolute specific humidity (q) is computed in terms of equivalent temperature (T E ), which indicates the temperature a sample of air would have if all its latent heat was isobarically converted to sensible heat. It can be estimated through the following equation39,40: \(T_{\mathrm{E}} = T + Lq/C_p\), where L is the latent heat of vaporization and C p the specific heat of dry air.

Spatial correlogram

Spatial correlograms are used to study the spatial autocorrelation of the changes in temperature. This is an approach used to show how correlated are pairs of spatially distributed observations at increasing distance between them84,85. The spatial autocorrelation coefficient is computed for each distance class, and it is usually measured by the Moran’s I statistics as follows86,87,88:

$$I\left( d \right) = \frac{{\frac{1}{W}\mathop {\sum}

olimits_i^n {\mathop {\sum}

olimits_j^n {w_{ij}\left( {y_i - \bar y} \right)\left( {y_j - \bar y} \right)} } }}{{\frac{1}{n}\mathop {\sum}

olimits_{i = 1}^n {\left( {y_i - \bar y} \right)} }}\;{\mathrm{for}}\,i \,{

e}\, j,$$ (1)

where y i and y j are the values of the changes in temperature in grids i and j. A matrix of geographic distances needs to be calculated for all pairs of locations. We then convert these distances to classes d. The weight factor w ij is 1 when the pairs of sites belong to distance class d, and 0 otherwise. W is the number of pairs in the computation for a given distance class, and it is equal to the sum of w ij for that class. Moran’s I takes values in the interval [−1, 1] and can be interpreted as the Pearson’s correlation coefficient. Positive values of I indicate positive autocorrelation and negative values of I indicate negative autocorrelation85.

Temperature changes of individual land transitions

A ridge-regression approach is introduced to retrieve the effects of the individual land cover transitions on the temperature and equivalent temperature changes. Although the methodology and applications differ, the concept is similar to the one recently used to characterize the changes in surface properties and energy fluxes from specific vegetation cover changes8,59.

To identify the signal of the temperature changes due to the individual land cover transitions, we use a set of nonoverlapping local windows that covers the domain to unravel the effect on temperature from each LCC. Only the local effects of land cover transitions inside the window are considered, and the indirect perturbations due to regional changes outside the window are ignored. Local effects dominate the overall biophysical impacts for space-limited land cover transitions89, such as those recently occurred in Europe.

The size of the local window is 5-by-5 grids, approximately 60-by-60 km, in which the local climate is assumed to be uniform. Climatic gradients that result from topographical differences are masked according to refs. 8,59. To unmix the temperature signals resulting from the mix of the possible land cover transitions among the various classes, a linear regression approach is applied separately to the N windows,

$$y_i = X_i\beta _i + \varepsilon _i,i = 1,2,...,N,$$ (2)

where for window i, X i is the explanatory variable, i.e. a matrix containing the fractions of the transitions of all the land cover classes for the 25 grids of each window (with the first column made of ones to capture the intercept), y i is a vector that contains the 25 values of changes in temperature, ɛ i is the error term referring to the model residuals, and β i is the vector of the regression coefficients. The residuals, ɛ i , i = 1, 2, …, N, are assumed to be independently and identically distributed with mean 0 and variance \(\sigma _{\varepsilon ,i}^2\). Equation (2) can be explicitly written for window i as a system of equations:

$$\begin{array}{l}y_1 = \beta _0 + \beta _1x_{11} + \beta _2x_{12} + \,\cdots\, + \beta _mx_{1m} + \varepsilon _1\\ y_2 = \beta _0 + \beta _1x_{21} + \beta _2x_{22} + \,\cdots\, + \beta _mx_{2m} + \varepsilon _2\\ \vdots \\ y_n = \beta _0 + \beta _1x_{n1} + \beta _2x_{2n} + \,\cdots\, + \beta _mx_{nm} + \varepsilon _n,\end{array}$$ (3)

where the dependence on i is dropped to simplify the notation. x nm is the fraction of change in land class m in grid n, n is the number of grids of the local window (n = 25) and m is the number of land cover classes (m = 14). The aim is to find the difference in regression coefficients between each pair of land classes to inform about the effect of the land transition on temperature by solving the system of Eq. (3). Once the system is solved, we can use the coefficients β i for window i to understand the local temperature effects of a given transition of land cover classes from j to k by setting x j = −1 and x k = 1, and all the other x to 0.

However, not all coefficients in β i can be estimated simultaneously since the transitions \(x_{jk},k = 1,2,...,14\), in each grid in window i sums to zero. In addition, in some cases, there are zero-columns in the matrix of the explanatory variable when transitions between specific land classes are missing, and the corresponding regression coefficient cannot be estimated. This is handled by solving Eq. (3) for each window i using ridge regression to stabilize the estimation. This is a robust method frequently applied in statistical regression90,91,92. The solution for window i is given by

$$\hat \beta _i = \left( {X_i^TX_i + \lambda I} \right)^{ - 1}X_i^Ty_i,$$ (4)

where λ stabilizes the inference when there is little or no information about a coefficient. We choose λ = 10−12 so that a large variance is assigned to a coefficient for which the data do not inform. We investigate the expected change in temperature associated with each type of land cover transition as follows:

$$\Delta y_{i,j \to k}^{} = \hat \beta _{i,k} - \hat \beta _{i,j},\quad j,k = 1,2,...,m.$$ (5)

For a window i where the data inform about the transition from land cover j to land cover k, \(\Delta y_{i,j \to k}^{}\), takes a meaningful value, whereas when the data do not inform about the transition, the ridge regression leads to a zero value.

To distinguish between windows where the data inform about the transition of interest and windows where the data have little or no information about the transitions, we calculate the associated uncertainty of \(\Delta y_{i,j \to k}^{}\). We first calculate the estimate of residual variance

$$\hat \sigma _{\varepsilon ,i}^2 = \left( {y_i - X_i\hat{\beta} _i} \right)^{\mathrm{T}}\left( {y_i - X_i\hat{\beta} _i} \right)/{\mathrm{df}}_i,$$ (6)

where the degrees of freedom (df i ) for window i is n minus the number of independent linear combinations of β i that the data inform about in that window. “T” is the transpose of a matrix. Using the estimated residual variance, we can calculate the estimated covariances between coefficients as

$$\hat \Sigma _i = \left( {X_i^TX_i + \lambda I} \right)^{ - 1}\hat \sigma _{\varepsilon ,i}^2.$$ (7)

The variances of the coefficients in \(\hat \beta _{i}\) are presented in the diagonal of the covariance matrix \(\hat \Sigma _i\), and the cross-covariance between the coefficients are stored in the off-diagonal parts of the covariance matrix \(\hat \Sigma _i\). This provides the uncertainty associated with a temperature change due to a given land cover transition, \(\Delta y_{i,j \to k}^{}\), by calculating the variance:

$$\hat \sigma _{i,j \to k}^2 = \hat \Sigma _{i,jj} + \hat \Sigma _{i,kk} - 2\hat \Sigma _{i,jk},$$ (8)

where \(\hat \Sigma _{i,jk}\) denotes row j and column k of \(\hat \Sigma _i\), i.e. \(\hat \Sigma _{i,jj}\) and \(\hat \Sigma _{i,kk}\) stand for the estimated variances of the estimated regression coefficients \(\hat \beta _{i,j}\) and \(\hat \beta _{i,k}\), and \(\hat \Sigma _{i,jk}\) is the cross-covariance between \(\hat \beta _{i,j}\) and \(\hat \beta _{i,k}\). The covariance term is needed because the estimates of the land cover effects are not independent.

As described above, the system of Eq. (3) is solved in each of the local windows that spans all the domain. Within the window, water bodies are masked out and only the land cover transitions are considered. In addition, we also filter out windows with no land cover transitions. The regressions are then applied to all the IGBP land cover classes. This method leads to antisymmetry or skew-symmetry in the result, i.e., Δy j→k = −Δy k→j .

We then pool the information across the windows contained in a region (either Europe or two subdomains) to achieve informative estimates with useful uncertainty, because only a few of the 14 × 13 possible land cover transitions occurred in each window. The mean values for each land cover transition for all the windows which are considered in the regressions are calculated using the weight mean since it has the lowest variance

$$\Delta y_{j \to k} = \mathop {\sum}\limits_{i = 1}^N {\frac{{\Delta y_{i,j \to k}^{}}}{{\hat \sigma _{i,j \to k}^2}}} {\Big /}\mathop {\sum}\limits_{i = 1}^N {\frac{1}{{\hat \sigma _{i,j \to k}^2}}}$$ (9)

and the variance is

$$\hat \sigma _{j \to k}^2 = 1{\Big /}\mathop {\sum}\limits_{i = 1}^N {\frac{1}{{\hat \sigma _{i,j \to k}^2}}},$$ (10)

where N stands for the number of windows for regression. The windows where the estimated variance \(\hat \sigma _i^2\) is unreliable are omitted, such as the variance is bigger than 100.

The mean and the median of the empirical residuals, \({\hat{\varepsilon}} _i = y_i - X_i {{\hat {\beta}} _i},i = 1,2,...,N,\) are almost zero, and the histogram of the empirical residuals is unimodal, symmetric, and similar to a normal distribution. This suggests that a Gaussian distribution can be assumed for the residuals so that confidence intervals for the land cover transitions can be calculated using Gaussian distributions.

This statistical analysis is repeated for the entire European domain and for the two major subdomains to investigate the heterogeneities of the temperature and equivalent temperature response to land cover transitions in the different climate regimes found in the analysis. Subdomain A mainly refers to central, southern and western Europe, and subdomain B to eastern Europe. The number of valid windows retrieved is 2779 for the analysis of the entire European domain, and 856 and 1032 for the subdomains A and B, respectively. Only the valid estimates for each land cover transition that occurred in at least 15% of the valid windows are considered as representative averages and shown in Fig. 6.

Validation of model results

We validated model outputs against relevant observation datasets. WRF model simulations based on the EURO-CORDEX configuration with the new land cover datasets LC1992 and LC2015 are compared with observation records for European climate (E-OBS)93. E-OBS data are averaged over the period 1992–2015 to align with the dataset used for the boundary conditions in model simulations. An additional simulation with the default IGBP land cover in WRF is added to benchmark the relative performance of the new ESA CCI land cover datasets with indices like the pattern correlation coefficient (PCC), the regional bias, and the root-mean-square error (RMSE). The three simulations show similar patterns for annual mean temperature relative to E-OBS, and the simulations with the CCI-based land cover datasets are found to reduce both model bias and RMSE (Supplementary Fig. 8).

Our simulations modeled multiple, and sometime contrasting, land cover changes per grid cell, and we could not directly compare the changes caused in the single component of the surface energy budget from specific LCCs with other available datasets. However, it was possible to validate estimates of surface albedo and soil moisture from the individual simulations. Estimates of surface albedo for LC1992 and LC2015 were resampled and compared against the satellite-derived CLARA_A2 dataset, which covers the 34-year period from 1982 until 2015 and provides monthly average values on a 0.25° × 0.25° grid94. Monthly mean values (with standard deviation and pattern correlation) of 1992–2015 averaged surface albedo are compared in Supplementary Table 2. Results generally show high correlation coefficients and a consistent seasonal cycle, with values falling within the respective uncertainty ranges. The ESA CCI SM database95 provides harmonized estimates of 10-cm soil moisture from a large set of satellite sensors and the 1992–2015 average was used to compare soil moisture outputs from LC1992 and LC2015 (Supplementary Fig. 3). We found high pattern correlation coefficients (from 0.82 in summer to 0.96 in winter) and limited bias.

Temperature changes attributed to specific land transitions from our ridge-regression approach were compared to an alternative dataset based on satellite remote sensing observations8,59. This dataset is derived from potential vegetation changes where a space-for-time approximation is applied to multiscale remote sensing products and was developed to benchmark climate model outputs related to biophysical land processes. It includes changes in the surface energy balance and land surface temperature (day and night) for up to 10 IGBP land cover transitions (over affected land areas only) at a resolution of 1° for the period 2008–2012. We downloaded the publicly available version of this dataset59 and postprocessed it to compute daily monthly mean surface temperature changes by a simple average between day and night temperature. The standard errors were derived from the corresponding standard deviations provided by the dataset. Because some of these estimates are likely unrealistic values (for example, >10 °C), we treat them as outliers and filter them out. We then compared the temperature effects of the individual land transitions with those in Fig. 6, for the EURO-CORDEX domain and for the subdomains A and B (Supplementary Table 1). We find that results are generally consistent among the two databases, but the comparison should consider some important caveats. First, care is needed when interpreting temperature data acquired using different protocols. The observational dataset reports about land surface temperature, i.e. the temperature at the top of the land surface, including bare land, water, snow or ice cover, cropland or forest canopy, while our study focuses on air (2 m) temperature. Although air temperature is recognized to be generally dependent on land surface temperature61, there are inherent differences because satellite retrievals only occur under clear-sky conditions, and coupling strength between the two temperature indicators varies in time and across different land covers (e.g., coupling is stronger in forests than in open land)62. In addition, the observation dataset only refers to the LCCs between 2008 and 2012, a temporal dimension that differs from our study, and the reported temperature values for each specific LCC refer to the area affected by the land use change only, with the variables that are upscaled to one degree resolution. On the other hand, our study considers temperature values for a specific grid cell where the LCC is only a fraction of the area of the cell. Relatively smaller values are thus expected from our study.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.