Data used for construction of mycorrhizal vegetation maps

To construct the maps of mycorrhizal biomass fraction distribution, we integrated (1) data concerning dominant plant species and their growth forms within each continent × ecoregions × land cover type combination, (2) data describing mycorrhizal type of these species, and (3) data estimating cover and biomass fractions of trees, shrubs and herbaceous plants within individual land cover types. Here we defined a species or a set of species with a similar mycorrhizal strategy and growth form as ‘dominant’ if it constituted > 20% of vegetation biomass. Supplementary Fig. 1 shows a flowchart of the map assembly process.

We selected the global ecoregion map of Bailey32 with 98 ecoregions (Supplementary Data 1), provided by the Oak Ridge National Laboratory Distributed Active Archive Center57 (spatial resolution 10 arcmin), as a basis for mapping of global-scale distribution of mycorrhizal types. This map was preferred over that of biomes provided by Olson and co-workers46 and others because of higher level of detail for ecoregions and because the boundaries of ecoregions were more strongly related to the distribution of mycorrhizal types. Ecoregions spanning across multiple continents were considered separately for each continent. We used the continent division based upon the FAO Global Administrative Unit Layers (http://www.fao.org/geonetwork/srv/en/).

We determined the land cover types present in each ecoregion using a satellite observation-based map for the year 2015 generated by the European Space Agency33. This map includes 38 land cover categories such as croplands, urban areas, grasslands and forests of various types, with a spatial resolution of 300 m (Supplementary Data 2).

Based on vegetation surveys assigned to ecoregions (1568 data sources; Supplementary Data 3) we determined the dominant plant species and their growth form for each continent × ecoregion × land cover type combination. We used relative abundances of plant species averaged across different data points, as available across vegetation surveys, with equal weight to all observations.

We assigned mycorrhizal type (AM, EcM, ErM, NM) to each dominant species using the FungalRoot database34. Species with dual EcM-AM and AM-NM categories were allocated to both types equally (50% weight). Mycorrhizal status of species with no empirical records were extrapolated from congeneric and confamilial species. Therefore, all Diapensiaceae and Ericaceae species were considered ErM58, except for Enkianthus (AM), Arbuteae, Pyroleae, Monotropeae and Pterosporeae (all subtypes of EcM). Because of multiple incorrect reports and alternative definitions for EcM, we took a conservative approach by considering plants to be EcM only when this was supported by multiple independent studies and the proportion of conflicting reports was < 50%59. Although most crop plants are able to form arbuscular mycorrhizas, intensive agricultural practices and breeding may lead to reduction or loss of mycorrhizal infection40,41. Therefore, rain fed and flooded croplands were considered to feature partly AM and partly NM vegetation (Supplementary Data 5, 7), unless data indicating dominance of NM vegetation was available. Crop species that belong to Brassicaceae family were considered NM.

Combining the prevailing dominant plant species, their growth form and mycorrhizal type, we estimated the biomass proportions of EcM, AM, ErM and NM vegetation in each ecoregion by continent by land cover combination (Supplementary Data 5 and 6). We considered that in forests with a sparse understorey, trees contribute 90–95% of the biomass and the understorey accounts for 5–10% of biomass36,60,61,62,63. In forests with a dense layer of shrubs, trees, shrubs and herbs/dwarf shrubs contribute 70 ± 15%, 20 ± 10% and 10 ± 5% to plant biomass36,60,61,62,63, respectively. In shrublands, we consider shrubs and herbs to account for 90 ± 5% and 10 ± 5% of biomass, respectively35,60,61,62,63. We considered that woodlands harbour 30 ± 10% of biomass in trees, 30 ± 10% of biomass in shrubs, and the remaining biomass in herbaceous vegetation35,60,61,62,63. This resulted in biomass proportions of each mycorrhizal type in continent × ecoregion × land cover type (Supplementary Data 5, 6).

As we focused on the biomass of mycorrhizal plants and not on species diversity; we did not attempt to map the distribution of orchid mycorrhiza. Orchid species are never abundant in ecosystems in terms of biomass, and they are therefore unlikely to play an essential role in biogeochemical cycles at large regional scales.

Assembly of raster maps of mycorrhizal vegetation

We generated raster maps based on the proportional mycorrhizal type biomass data. We overlaid the raster map of Bailey ecoregions (10 arcmin resolution)57 with the raster of ESA CCI land cover data (300 m resolution)33, which we converted to 10 arcmin using a nearest neighbour approach. The resulting raster was overlain with the polygon map of continents, rasterized at 10 arcmin. To each pixel, we assigned the corresponding mycorrhizal type proportions, considering the prevailing combination of Bailey ecoregion × land cover in each continent. Because some ecoregions covered multiple isolated parts of continents and proportions of mycorrhizal type distribution differed in these regional areas by > 2-fold, we split these ecoregions into two or more subregions using the ArcGIS 10.2.2 Raster Calculator.

Impact of croplands

We estimated the effect of agriculture on mycorrhizal type distribution by substituting the proportions of mycorrhizal types in croplands (land-cover types 10, 11, 12, 20 and 30; see Supplementary Data 2) with estimates of mycorrhizal type proportions from natural land cover types in these ecoregions. For ecoregions naturally harbouring more than one vegetation type (e.g., grasslands, shrublands and forests), we considered that the expected vegetation represents a mixture of these land cover types. This resulted in an additional dataset describing combinations of vegetation as defined by ecoregion × continent × land cover without croplands (Supplementary Data 7 for continent data and Supplementary Data 8 for island data). Maps of potential mycorrhizal type distribution in a cropland-free world (Supplementary Fig. 4) were created based on this dataset, following the aforementioned procedures.

In this analysis, we did not consider forest plantations to be croplands. Therefore, changes in land cover induced by forest restoration through, for instance pine plantations or eucalypt plantations, are not addressed as vegetation changes induced by cropland cultivation. The total area occupied by AM tree plantations exceeds that of EcM tree plantations64,65 (Supplementary Table 3), suggesting that exclusion of tree plantations leads to conservative estimates of the global reduction of EcM vegetation.

Map validation

The maps of the current distributions of mycorrhizal biomass fractions were validated using the datasets of forest biomass structure for Eurasia35, global analysis of impacts on mycorrhizas on carbon vs nitrogen dynamics19, the USA-based analysis of mycorrhizal associations conducted with remote sensing techniques36, and the map of mycorrhizal root biomass in West Australia by Brundrett24 (see Supplementary Fig. 2).

The data of forest biomass structure for Eurasia35 provide information on per-plot tree species abundances for a large number of European sites. As the data contain all records obtained since the 19th century, we used only the data recorded after 1999. Using our database of plant species and associated mycorrhizal types we assigned every tree species with its mycorrhizal type (1344 data points, Supplementary Fig. 2). This provided us with a per-site data of the relative biomass of AM and EcM trees. We used these data as proxies for AM and EcM biomass fractions to compare with the data in our maps. We used the same approach for the data of Lin and co-workers19, which represent plot-based records of vegetation structure for 100 sites across the globe accompanied with data about plant-mycorrhizal associations.

The dataset of Fisher et al.36 provides the relative cover of AM and EcM plants from Landsat scenes centred on four sites in USA: Lilly-Dickey Woods (Indiana), long-term research site of Smithsonian Conservation Biology Institute (Virginia), Tyson Research Center Plot (Missouri), and a long-term research site of Wabikon Forest Dynamics (Wisconsin). Given that the dataset comprises forested areas only, we considered areal coverage of AM and EcM plants provides a good estimate of AM and EcM plant biomass. Using this dataset, we directly compared the AM and EcM coverage per pixel with the data of our maps.

In the datasets19,35,36 biomass and areal fractions of AM and EcM plants are always considered to sum up to 100%. As these datasets do not provide information about non-mycorrhizal and ericoid plants, we estimated these as 5–10% and 0–20%, respectively, depending on the dominant tree association, and accordingly reduced the values of AM and EcM biomass fractions in the validation calculations. While we considered this approach to be acceptable for the validation of AM and EcM data, the data quality is not high enough to validate the NM and ErM maps.

As these three datasets19,35,36 represent forest data, we evaluated whether all data points or raster cells36 were indeed located in forest areas. This was done using the ESA land cover categories data33. All data points that were located out of the current areas registers by ESA33 as forests were excluded from the analysis.

The West Australian map of mycorrhizal root abundance24 provides information about the percentage of total biomass of plant roots featuring AM and EcM root colonization and about the percentage of non-mycorrhizal root biomass. We used this data as a proxy for biomass fractions of AM, EcM and NM plants. In order to quantify the differences between the validation datasets and our maps, we have calculated the Mean Averaged Error (MAE) of the difference between our maps and validation datasets. MAE expresses the deviation between two spatial datasets from the 1:1 line. For AM, EcM and NM vegetation fractions MAE is 18.7%, for EcM it is 13.6%, for NM it is 4.7%, respectively. Due to a virtual absence of information about distribution of solely ErM vegetation, direct validation of the ErM maps was impossible.

To further assess the uncertainties in the maps, we examined which land use classes represent the data points that deviate from the observed data by more than 25% units. Our analysis showed that the large proportion of those deviations (60% for EcM and 40% for AM) fall into those land use classes that represent a poorly described mixture of evergreen or mixed forests and grasslands, i.e. ESA classes described as various forms of “closed to open (>15%) forest” (Supplementary Data 2). Further improvement of the ESA classification data will provide a possibility to improve precision of our maps.

Due to the rarity of datasets on field-examined mycorrhizal vegetation distributions at large special scale we had to validate our datasets used available data on plant species distribution and to accept a number of assumptions and/or recalculations in order to make the data comparable. These adjustments may have affected the quality of the validation dataset and therefore the validation.

Map spatial uncertainty analysis

We quantified the uncertainty in our maps of mycorrhizal vegetation fractions by applying the error propagation rules to the formulas used to calculate the biomass fractions of mycorrhizal plants per-grid cell. For this we used the data provided by refs. 35,60,61,62,63 to estimate the uncertainty associated with relative biomass of trees, shrubs, and herbaceous vegetation in each CCI land cover class. The uncertainty in the proportion of each mycorrhizal type in a given Bailey × continent combination was set to 1/√n, where n = b + c/3 + g/20. In this formula, b is a number of literature sources describing the vegetation composition in a given Bailey ecoregion × continent combination (Supplementary Data 3), while c and g are the numbers of literature sources describing vegetation composition at continent level and global levels, respectively. These latter sources were given less weight, because of their lower spatial explicitness, though these sources were used only if they were providing information relevant for the combination of Bailey ecoregion × continent under consideration. This procedure was applied to the maps of the current distribution of mycorrhizal fractions (Fig. 1) and for the distribution of mycorrhizal fractions in the cropland-free world (Supplementary Fig. 5). The resulting maps of uncertainties are shown in Supplementary Figs. 4 and 6, respectively. We calculated the uncertainties in the estimations of changes in biomass fractions of mycorrhizal vegetation induced by crop cultivation and pastures (Fig. 3) by applying the error propagation rule for the mathematical operation of subtraction. The resulting uncertainty map is shown in Supplementary Fig. 7.

Carbon stocks in the aboveground mycorrhizal biomass

In order to estimate the amount of carbon stored globally in the biomass of plants that belong to different mycorrhizal types, we multiplied the mycorrhizal type biomass fractions by the data of the global distribution of carbon stored in aboveground plant biomass obtained from passive microwave-based satellite observations38. As these data have resolution of 15 arcmin, we converted our data of mycorrhizal biomass fractions to this resolution. To calculate the total amount of carbon stored in AM, EcM, ErM and NM plants (illustrated in Fig. 2, and reported per biome in the Supplementary Table 1), we summed the data on carbon stocks per unit area globally and per biome46.

To assess uncertainty of estimations of carbon storage in mycorrhizal vegetation we used the rule of uncertainty propagation through the operation of multiplication. For this we used the uncertainty of mycorrhizal biomass fraction data and the per biome uncertainty of biomass carbon data, as provided by Liu et al.38. Given that the data of carbon stored in vegetation of different mycorrhizal types is a product of two datasets, each featuring a high uncertainty38, the values of aboveground biomass carbon stored in arbuscular, ecto-, ericoid and non-mycorrhizal vegetation are locally associated with large uncertainties (e.g. Supplementary Table 1). Therefore, we recommend to use these data exclusively for large-scale estimates of biomass carbon storage. However, following the central limit theorem, the uncertainties of the total amount of carbon stored in vegetation of each mycorrhizal type are much lower: 15, 17, 1.8 and 5.5 GT for AM, EcM, ErM and NM vegetation, respectively.

Statistical analysis

We examined whether soil carbon content was related to the biomass fractions of AM and EcM plants, using generalized linear model regressions, with topsoil (0–20 cm) or subsoil (20–60 cm and 60–100 cm) C content per m2 as a response variable, and biome and ecosystem biomass fraction of AM or EcM plants as predictors. The data for soil carbon content in the top 20 cm of soil were obtained from the ISRIC-WISE Soil Property Databases at a resolution of 30 arcsec47. As we were interested in relationships between AM or EcM coverage and soil carbon in natural vegetated environments, we excluded urban and agricultural areas, lakes and “Rock and Ice” areas according to the ESA land cover categories33, from the analysis. We also excluded wetlands, inundated areas, and extremely dry regions (deserts and Mediterranean areas), because we considered that harsh abiotic conditions instead of biotic interactions are likely to shape biogeochemical cycles in these areas; and we excluded five land cover categories for which our maps showed higher uncertainties, i.e. those where the extent of forest vs grassland cover was unclear. Those land cover categories included “Tree cover, broadleaved, evergreen, closed to open (>15%)”, “Tree cover, broadleaved, deciduous, closed to open (>15%)”, “Tree cover, needleleaved, evergreen, closed to open (>15%)”, “Tree cover, needleleaved, deciduous, closed to open (>15%)” and “Sparse vegetation (tree, shrub, herbaceous cover)”.

Data for biome types were obtained from the map of terrestrial biomes46. To create a more balanced dataset, we combined all natural grasslands into one single category and all coniferous forests into another single category. To minimize the impacts of imprecise estimations of biome borders we excluded from forest biome areas that we recognized as grasslands according to the ESA land cover data33. Similarly, we excluded grassland biome areas that were fully covered by forests and shrubs according to ref. 33. However, given that AM-EcM-NM interactions are likely to relate to C stocks in the areas featuring forest-grassland and forest-tundra mosaics, we kept such areas in the dataset. As soil carbon content data is known to feature high uncertainties47, we opted to run the analyses at a resolution of 1 arcdegree, which allow us to grasp large-scale tendencies, while reducing the problem of P-value fallacy66.

Our models comprised topsoil and subsoil carbon content as a response variable. As predictors, we examined the following combinations of data: (i) biome, biomass fraction of EcM plants, and their interactions, (ii) biome, biomass fraction of AM plants, and their interactions, (iii) biome, amount of carbon stored in the aboveground biomass of EcM plants (product of our EcM map and ref. 38), and their interactions, and (iv) biome, amount of carbon stored in the aboveground biomass of AM plants (product of our AM map and ref. 38), and their interactions. All models were examined independently and evaluated based on the Akaike Information Criterion.

In all models, we assessed the total variance explained by the model by R-squared metric and the relative importance of each predictor using the Lindemann, Merenda and Gold (LMG) metric. This metric shows the proportion of variance explained by each of model predictors within the entire variance explained.

Given significant interactions between biome factor and biomass fractions of mycorrhizal plants (Table 1), we fitted generalized linear models with EcM or AM biomass fractions as predictors and topsoil and subsoil C stocks as response variables to the data within individual biomes. The outcome of this analysis is reported in the Supplementary Table 2. As this analysis encompasses considerable errors on both axes of this regression, we have additionally checked if a model II regression would yield qualitatively similar results and confirmed that this was the case (Supplementary Table 4).

Additionally, we examined whether the amount of carbon stored in AM and EcM plants per unit area was a better predictor of soil C stocks than the fractions of AM or EcM biomass per unit area. This analysis was performed in exactly the same manner as the analysis of the impacts of AM and EcM biomass. We assessed the resulting models using the Akaike Information Criterion and R-squared, and detected that these models were worse than the those based on the fractions of AM or EcM biomass per unit area. We also examined how fractions of AM or EcM biomass per unit area, were related to soil C-to-N ratio, and detected that these relationships had a pattern very similar to that of soil C.

Dealing with spatial autocorrelations

The above-described analyses were checked for spatial autocorrelation using a Moran’s I metric. These tests revealed a spatial autocorrelation of residuals, which is expected given the nature of our soil and vegetation data with spatial points nested within biomes. We examined how incorporation of a spatial dependence structure affected our models. Accordingly, we ran generalized least square (GLS) models of the same sets of predictors as described in the section “Statistical analysis” of the relationship between soil carbon content and biomass distribution of mycorrhizal types, but included spatial correlation structures. Because calculation of the global spatial autocorrelation matix is computationally intensive, we chose a Monte-Carlo type approach. For each analysis we drew a random sample of 2% of the data points (1624 points) 100 times, taking care that all biomes are included into each sample, and subsequently ran a GLS accounting for autocorrelation structure of the data. For this we used the algorithms proposed by Zuur et al.67. The averaged outcomes of these analyses for the relationship between topsoil C, AM, and EcM fractions of vegetation biomass are shown in the Supplementary Table 5. The analyses accounting for autocorrelations yielded the same conclusions as the analyses that did not account for autocorrelations: there is a positive relationship between biomass fraction of EcM plants in vegetation and soil C, and this relationship is biome-independent (non-significant Biome × EcM interaction); in contrast, the relationship between biomass fraction of AM plants in vegetation and soil C was idiosyncratic and biome-dependent.

Including spatial coordinates explicitly in the model, or explicit accounting for autocorrelation matrix, can be problematic to interpret because covariation between spatial coordinates and environmental variables can obscure the interpretation of the relative importance of the predictors68. Given that the main goal of our analysis was to detect the global links between AM and EcM dominance and soil C, but not to predict this relationship in new areas or under future climatic scenarios we prioritized the interpretation of the models rather than their predictive power, and therfore report the outcomes of the models that do not account for autocorrelations (Table 1).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.