Significance Quantitative knowledge of terrestrial carbon pathways and processes is fundamental for understanding the biosphere’s response to a changing climate. Carbon allocation, stocks, and residence times together define the dynamic state of the terrestrial carbon cycle. These quantities are difficult to measure and remain poorly quantified on a global scale. Here, we retrieve global 1° × 1° carbon state and process variables by combining a carbon balance model with satellite observations of biomass and leaf area (where and when available) and global soil carbon data. Our results reveal emergent continental-scale patterns and relationships between carbon states and processes. We find that conventional land cover types cannot capture continental-scale variations of retrieved carbon variables; this mismatch has strong implications for terrestrial carbon cycle predictions.

Abstract The terrestrial carbon cycle is currently the least constrained component of the global carbon budget. Large uncertainties stem from a poor understanding of plant carbon allocation, stocks, residence times, and carbon use efficiency. Imposing observational constraints on the terrestrial carbon cycle and its processes is, therefore, necessary to better understand its current state and predict its future state. We combine a diagnostic ecosystem carbon model with satellite observations of leaf area and biomass (where and when available) and soil carbon data to retrieve the first global estimates, to our knowledge, of carbon cycle state and process variables at a 1° × 1° resolution; retrieved variables are independent from the plant functional type and steady-state paradigms. Our results reveal global emergent relationships in the spatial distribution of key carbon cycle states and processes. Live biomass and dead organic carbon residence times exhibit contrasting spatial features (r = 0.3). Allocation to structural carbon is highest in the wet tropics (85–88%) in contrast to higher latitudes (73–82%), where allocation shifts toward photosynthetic carbon. Carbon use efficiency is lowest (0.42–0.44) in the wet tropics. We find an emergent global correlation between retrievals of leaf mass per leaf area and leaf lifespan (r = 0.64–0.80) that matches independent trait studies. We show that conventional land cover types cannot adequately describe the spatial variability of key carbon states and processes (multiple correlation median = 0.41). This mismatch has strong implications for the prediction of terrestrial carbon dynamics, which are currently based on globally applied parameters linked to land cover or plant functional types.

The terrestrial carbon (C) cycle remains the least constrained component of the global C budget (1). In contrast to a relatively stable increase of the ocean CO 2 sink from 0.9 to 2.7 Pg C y−1 over the past 40 y, terrestrial CO 2 uptake has been found to vary between a net 4.1-Pg C y−1 sink to a 0.4-Pg C y−1 source, and accounts for a majority of the interannual variability in atmospheric CO 2 growth. The complex response of terrestrial ecosystem CO 2 exchanges to short- and long-term changes in temperature, water availability, nutrient availability, and rising atmospheric CO 2 (2⇓⇓⇓–6) remains highly uncertain in C cycle model projections (7). As a result, there are large gaps in our understanding of terrestrial C dynamics, including the magnitude and residence times of the major ecosystem C pools (8, 9) and rates of autotrophic respiration (10). Moreover, the impact of climatic extremes on C cycling, such as recent Amazon droughts (11), highlights the importance of understanding the terrestrial C cycle sensitivity to climate variability. To understand terrestrial CO 2 exchanges in the past, present, and future, we need to better constrain current dynamics of ecosystem C cycling from regional to global scales.

C uptake, allocation, pool stocks, residence times, respiration, and disturbance together drive net CO 2 exchanges (12) on subdaily to millennial timescales; these C state and process variables also determine the temporal sensitivity of the net C balance to climatic variability. For example, global changes in photosynthetic uptake could lead to a rapid response from short-lived C pools (such as foliage, fine roots, and litter) or a prolonged response from the long-lived C pools (such as woody biomass and soil C), with very different outcomes on ecosystem source–sink behavior. Quantitative knowledge of terrestrial C pathways is, therefore, central to understanding the temporal responses of the major terrestrial C fluxes—including heterotrophic respiration (13), fires (14, 15), and wetland CH 4 emissions (16, 17)—to interannual variations in C uptake.

Although C dynamics have been extensively measured and analyzed at site level (18⇓⇓–21), the respiration and allocation of fixed C and its residence time within the major C pools are difficult and expensive to measure at site level and remain poorly quantified on global scales. As a result, global terrestrial C cycle models rely on land cover type-specific C cycling parameters—based on spatially preassigned plant functional types—to determine C fluxes and C pools (22). Globally spanning C cycle observations can provide a much-needed constraint on the spatial variability and associated dynamics of the terrestrial C cycle. Over the past decade, a growing number of datasets has enhanced understanding of the terrestrial C cycle, including global-scale canopy dynamics [National Aeronautics and Space Administration Moderate Resolution Imaging Spectroradiometer (MODIS) leaf area index (LAI)], empirically derived global soil C data [Harmonized World Soil Database (HWSD)] (23), satellite-based above- and belowground biomass (ABGB) maps for the tropics (24, 25), and Greenhouse Gases Observing Satellite CO 2 and plant fluorescence (26, 27). These spatially and temporally explicit datasets provide an enhanced view of the terrestrial C cycle and can be used together to retrieve consistent global C state and process variables. Significant efforts in data-driven estimates of the global C fluxes have been made over the past decade. These efforts include estimates based on atmospheric CO 2 concentrations (1, 28, 29), high-resolution global primary production maps (30) based on eddy covariance tower datasets (FLUXNET) (18), mean residence time of terrestrial C (31), ecosystem respiration dependence on temperature based on FLUXNET data (32), and global C cycle data assimilation systems (33).

Given an increasing number of C cycle observations, what remains an outstanding challenge is to produce a data-consistent analysis of terrestrial C cycling—including retrievals of C fluxes, C pools, autotrophic respiration, allocation fractions, and residence times—based on multiple global-scale earth observations and datasets. Current global-scale terrestrial biosphere models, because of their complexity and structures, are ill-equipped to ingest an ever-increasing volume of earth observations to estimate (rather than prescribe) model parameters based on the currently available observations. To overcome this challenge, we use a model–data fusion (MDF) approach to retrieve terrestrial C state and process variables during the period 2001–2010 without invoking plant functional type or steady-state assumptions. We bring together global MODIS LAI, a tropical biomass map (24), a soil C dataset (23), MODIS burned area (34), and a diagnostic ecosystem C balance model [Data Assimilation Linked Ecosystem Carbon Model version two (DALEC2)] (19, 35) to retrieve C state and process variables by producing a novel data-consistent and spatially explicit analysis of terrestrial C cycling on a global 1° × 1° grid (Fig. 1) [we henceforth refer to this MDF setup as the C data model framework (CARDAMOM)]. Specifically, we address the following questions: How is C uptake partitioned between the live biomass pools and respiration? What is the residence time of C within the major ecosystem C pools? How do estimates of C cycle states and processes vary spatially, and to what degree do emergent variable patterns match land cover maps? We use a Markov Chain Monte Carlo MDF algorithm to retrieve C state and process variables—and their associated uncertainty—within each 1° × 1° grid cell (Materials and Methods). The MDF approach retrieves the state and process variables that minimize the model mismatch against any available C cycle observations. Therefore, in the absence of extratropical biomass data or wintertime MODIS LAI observations, estimates of 2001–2010 C cycle state and process variables are achievable, albeit more uncertain.

Fig. 1. Diagnostic ecosystem C balance model DALEC2 (19, 35) and datasets used to retrieve 1° × 1° C state and process variables. GPP, a function of climate and foliar C, is partitioned into autotrophic respiration (Ra) and NPP. NPP is partitioned into the live biomass pools. Plant mortality provides input to the DOM pools. Heterotrophic respiration (Rh) is derived from decomposing DOM pools. Fire fluxes are derived from burned area data (35) and all C pools (SI Text, section S2). Within each 1° × 1° grid cell, we use a Bayesian MDF algorithm to retrieve C state/process variables and uncertainties; variables are retrieved without prior land cover type or steady-state assumptions. Data constraints consist of MODIS leaf area, total biomass (24) (tropics only), and soil C (23). Details on the Bayesian fusion approach are provided in Materials and Methods.

Results Distinct C allocation patterns emerge from our terrestrial C analysis (Fig. 2). Net primary production (NPP) allocation to structural biomass (wood and fine roots) is largely ≥80% (area-weighted 25th to 75th percentile range = 85–88%) in the wet tropics (<23° N/S; annual precipitation >1,500 mm) in contrast to the dry tropics (77–87%) and extratropical regions (73–82%). The highest NPP allocations to foliage (≥30%) spatially coincide with major grassland areas, including the North America prairies, the central Asia steppes, and the Sahel region in Africa. The dry tropics exhibit relatively high NPP allocation to labile C (7–14%) (Fig. S1), which reflects the increasing impact of seasonality on production as precipitation declines, requiring labile C stores for leaf flush. C use efficiency (CUE; equivalent to 1 − autotrophic respiration fraction) is overall lowest within the wet tropics (0.42–0.44) in contrast to dry tropical (0.45–0.50), temperate (23–55° N/S; 0.47–0.50), and high-latitude (>55° N/S; 0.49–0.50) areas. Fig. 2. Retrievals of NPP allocation to structural (wood and fine roots) and photosynthetic (labile and foliage) C pools. Allocation fractions were retrieved at 1° × 1° using a Bayesian MDF approach (Fig. 1). The GPP allocation fraction retrievals at locations B, T, D, and W are shown on the Right (black dot, median; box, 50% confidence range; line, 90% confidence range). Fig. S1. (Left) Posterior GPP C allocation to autotrophic respiration (equivalent to 1 − CUE), labile C, foliar C, fine roots, wood (mean; column 1), and associated uncertainty (SD; column 2). (Center) Posterior C residence time in foliar C; fine roots, wood, litter, and soil C (log-based mean; column 1); and associated uncertainty factors (based on logarithmic SD; column 2). (Right) Posterior mean 2001–2010 C stocks in labile, foliar, fine roots, wood, and litter C pools (mean; column 1) and associated uncertainties (SD; column 2). Live biomass and dead organic C residence times exhibit contrasting spatial features (r = 0.3) (Fig. 3). Within the majority of wet tropical land area (56%)—especially across most of the Amazon River (76%) and Congo River (69%) basins—the longest C residence time occurs within the woody pool (Fig. S1). In the dry tropics and extratropical latitudes, soil C residence times exceed wood C residence time by a median factor of 2.6 (1.6–4.3). Woody residence time is typically shorter in the dry tropics (8–19 y) compared with other biomes (wet tropics: 12–21 y; temperate latitudes: 21–29 y; and high latitudes: 25–28 y). Litter C residence time is typically longer in extratropical ecosystems (0.8–1.6 y) compared with tropical ecosystems (0.4–0.5 y). The longest foliar residence time (or leaf lifespan) occurs in the wet tropics and semiarid regions (Fig. S1). Fig. 3. Retrievals of C residence time (RT) in live biomass and dead organic C pools; residence times are retrieved at 1° × 1° using a Bayesian MDF approach (Fig. 1). Brown denotes ecosystems with high residence times for all C pools, green denotes ecosystems with long live biomass C residence times, and orange denotes ecosystems with low live biomass residence time. The residence times for individual C pools at locations B, T, D, and W are shown on the Right (black dot, median; box, 50% confidence range; line, 90% confidence range). Mean C residence times in ref. 31 are shown as gray boxes (50% confidence intervals) and black dots (medians). Overall, the wet tropics are characterized by relatively high structural C (>100 tC ha−1) and photosynthetic C (>2.5 tC ha−1) (Fig. 4): in contrast, the dry tropics and extratropical regions exhibit less structural and/or photosynthetic C. Foliar C stocks are typically larger in the wet tropics (2.8–4.7 tC ha−1) relative to other biomes (0.2–0.6 tC ha−1); similarly, fine root stocks are also greater in the wet tropics (4.0–5.3 tC ha−1) compared with other biomes (0.8–2.7 tC ha−1). Root:shoot (fine root C:leaf C) is lowest in the wet tropics (1.1–1.5) followed by the dry tropics (1.6–1.9) and extratropics (1.8–2.1). We find larger woody C uncertainties (1° × 1° 90% confidence range/median) in the extratropics (1.8–4.6) in contrast to tropical woody C (1.4–1.6) because of the latitudinal limits of the total ABGB map (24). Litter C is greater in high latitudes (2.4–3.4 tC ha−1) relative to temperate (0.6–2.4 tC ha−1) and tropical (0.2–2.6 tC ha−1) regions. High-latitude ecosystems have higher labile C stocks linked to seasonal leaf expansion (0.2–0.5 tC ha−1) relative to temperate (0.1–0.3 tC ha−1) and tropical (0.1–0.3 tC ha−1) ecosystems. Fig. 4. Retrieved mean photosynthetic (foliar and labile) and structural (wood and fine roots) C pool stocks; C stocks are retrieved at 1° × 1° using a Bayesian MDF approach (Fig. 1). Retrieved mean C stocks for each pool at locations B, T, D, and W are shown on the Right (black dot, median; box, 50% confidence range; line, 90% confidence range). Dark colors denote high-structural C/high-photosynthetic C ecosystems, green denotes low-structural C/high-photosynthetic C ecosystems, red denotes low-photosynthetic C/high-structural C ecosystems, and yellow denotes low-photosynthetic C/low-structural C ecosystems. We find high leaf C mass per leaf area (LCMA) values in the wet tropics (85–97 gC m−2) and semiarid regions, such as the Sahel, southwestern United States, and the Australian continent (typically >100 gC m−2) (Fig. 5); LCMA estimates are lower (typically <80 gC m−2) in high latitudes and the dry tropics. We find a positive correlation between leaf lifespan and LCMA in high-latitude (r = 0.79), temperate (r = 0.80), dry tropical (r = 0.78), and wet tropical (r = 0.64) areas. Fig. 5. (Upper Left) Retrieved median 1° × 1° LCMA (in grams C per meter−2). (Upper Right) Zonal mean of median LCMA and 50% confidence range (CR). (Lower) LCMA against leaf lifespan for high latitudes (>55° N/S), temperate regions (23°–55° N/S), dry tropics (precipitation <1,500 mm; <23° N/S), and wet tropics (precipitation >1,500 mm; <23° N/S). Global gross primary production (GPP; global 25th to 75th percentile = 91–134 Pg C y−1), ecosystem respiration (91–137 Pg C y−1), and fires (1.3–2.0 Pg C y−1) are broadly consistent with a terrestrial C model ensemble (22), data-driven estimates (36), and bottom-up inventories (37) (Fig. S2). The net C exchange uncertainty (−8 to +13 Pg C y−1) is an order of magnitude greater than mode net C exchange (NCE; −2 Pg C y−1); NCE latitudinal uncertainty is larger but comparable with the terrestrial C model ensemble range. Global atmospheric model CO 2 concentrations based on CARDAMOM mode NCE fluxes are seasonally consistent [r2 = 0.93; root-mean-square error (RMSE) = 0.53 ppm CO 2 ] with mean total column CO 2 measurements (38) (Fig. S3). The mean integrated C residence time in ref. 31 is within the range of individual pool residence times at locations B, T, D, and W (Fig. 3). The 2001–2010 CARDAMOM analysis spatial and temporal LAI variability is consistent with the MODIS LAI constraints (r2 = 0.8; RMSE = 0.6 m2/m2). When alternative GPP (36), alternative model structure, or biased data constraints (±20%) are imposed at locations B, T, D, and W, 88% of median sensitivity analysis estimates are within ±50% of median C state and process variable retrievals (Fig. S4). Fig. S2. CARDAMOM zonal profiles of median GPP, ecosystem respiration, fires, and NCE (red). The 50% confidence range (C.R.) is depicted as a light gray-shaded area. The blue lines represent the eight global MsTMIP models (62) (details in SI Text, section S7). The dashed black line denotes the flux tower-derived GPP (36). The continuous black line denotes the Global Fire Emissions Database version 3 (GFEDv3) total C emissions (36). Fig. S3. The 2009–2010 GEOS-Chem model with CARDAMOM mode NCE compared against mean monthly total C column observing network (TCCON) atmospheric column measurements across 12 TCCON sites. Left shows atmospheric CO 2 concentrations, and Right shows the linearly detrended CO 2 anomalies. The detrended comparison Pearson’s r = 0.93, and RMSE = 0.53 ppm. JFM, AMJ, JAS, and OND denote consecutive 3-month periods between January and December. Fig. S4. Posterior allocation fractions (alloc), residence times (RT), carbon pools (C), and leaf C mass per leaf area (LCMA) median and 50% confidence ranges shown for 1° × 1° grid cells B, T, D, and W for the unperturbed results (S0) and sensitivity experiments S1–S12. The coordinates of B, T, D, and W are reported in Materials and Methods (locations shown in Inset). Across all locations, 88% of median sensitivity analysis estimates (sensitivity tests S1–S12; SI Text, section S4) are within ±50% of unperturbed median C state and process variable retrievals. Retrieved C cycle variables are broadly consistent with a range of in situ measurements (Table S1). Estimates of CUE within the Amazon River basin are comparable with the upper bound of recent measurements (0.32–0.47) (39). Recent estimates of extratropical forest C density (40) are, on average, 38% lower than CARDAMOM total biomass estimates within forested areas (although the forest biomass estimates are typically within the CARDAMOM 1° × 1° uncertainty). Estimates of mean Amazon woody C residence times (15–21 y) are lower but comparable with aboveground woody C residence times derived from site-level measurements (∼20–70 y) (20). Table S1. In situ observations and CARDAMOM posterior state and process variable estimates We find that 88–99% of C state and process variability is accounted for by eight empirical orthogonal basis functions (EOFs) (Fig. 6); in other words, retrieved C state and process variables are largely explained by eight modes of spatial variability (Fig. S5). On average, the Global Land Cover Map (GLOBCOVER) land cover classifications (41) (e.g., deciduous forests, evergreen forests, and grasslands) account for <50% of C state and process variability (median multiple correlation coefficient R = 0.41); GLOBCOVER land cover types best describe spatial variations in C stocks (0.5 ≤ R ≤ 0.8) followed by LCMA (R = 0.4), residence times (0.3 ≤ R ≤ 0.5), and allocation fractions (0.1 ≤ R ≤ 0.4). Fig. 6. Multiple correlation coefficients (R; x axis) of retrieved C state and process variables—allocation fractions (AF), residence times (RT), mean C pools, and LCMA (y axis)—against 18 GLOBCOVER land cover fractions and C variable primary EOFs. R denotes the ability of GLOBCOVER land cover types and primary EOFs to predict 1° × 1° state and process variables (R would equal one if all C state and process variables could be expressed as a linear sum of land cover fractions or EOFs). Fig. S5. Maps show eight primary 1° × 1° standardized EOFs 1–8 derived from a principal component analysis of standardized C state and process variables (SI Text, section S6). The two dominant modes (EOFs 1 and 2) together reflect first-order global variations in C state/process variables (c s ) as a result of latitude and precipitation, whereas higher-order modes reflect increasingly complex spatial structures (we note, however, that EOFs 3–8 typically account for a smaller portion of c s spatial variability). Scatterplots show standardized EOFs 1–8 coefficients corresponding to each C state/process variable (shown as symbol–color combinations). The linear sum of standardized EOFs 1–4 (1–8) and the associated coefficients reproduce 29–95% (88–99%) of C state/process variability (Fig. 6).

Discussion Typically, C allocation and residence time parameters are based on land cover types in global-scale terrestrial C cycle studies (refs. 9 and 22 among others); here, spatially broad allocation and residence patterns emerge instead as a result of the MDF approach. For example, high-biomass ecosystems throughout the wet tropics display similar C allocation, residence time, and LCMA configurations (Figs. 2–5). Similarly, we find that dead organic matter (DOM) C residence is generally longer in high latitudes (Fig. 3). Compared with conventional land cover types, EOFs 1–4 account for a larger degree of the spatial structures in retrieved C variables (Fig. 6); for most variables, the two dominant EOF modes—which together reflect first-order variations in latitude and global precipitation patterns (Fig. S5)—explain more spatial variability than GLOBCOVER land cover types. The mismatch between land cover types and retrieved variables has major implications for the estimation and prediction of terrestrial C cycling, which is currently based on small sets of globally applied parameters linked to land cover types. The importance of climate, biodiversity, fire, and anthropogenic disturbance in generating these mismatches needs to be explored in additional research (42). It also is clear that plant traits vary across biomes (Figs. 2–4 and Fig. S1), not just at biome boundaries (43), and that there are continental-scale tradeoffs and correlations among traits (44). Our analysis is consistent with these viewpoints: for example, the emergent relationship between LCMA (proportional to leaf mass per area) and leaf lifespan (Fig. 5) matches the positive correlation found in global plant trait datasets (45). Evaluating global plant trait patterns emerging from CARDAMOM provides a novel opportunity for connections to theoretical and functional biodiversity research and a route to integrating this knowledge into predictive terrestrial C cycle modeling. The residence times of major C stocks provide substantial insights into the sensitivity and potential future trajectories of the terrestrial C cycle. For example, land cover changes in the wet tropics may result in rapid DOM C losses given the relatively short DOM residence times (<30 y) (Fig. 3). In contrast, high-latitude C residence times are an order of magnitude higher (30–300 y), and therefore, shifts in C allocation or turnover rates are likely to result in long-lived C flux responses. Overall, given the predominant role of C residence times in future terrestrial uptake responses (9), the derived residence times provide a first-order estimate of ecosystem response times as a result of changes in C cycling regimes. However, we note that model structure is likely to be a major source of uncertainty in long-lived (>10 y) C flux predictions. For example, although reduced complexity models can capture some of the principal long-term (>10 y) DOM dynamics represented in earth system models (8), systematic errors in DOM dynamics can arise because of the underrepresentation of processes controlling DOM residence times (46, 47). We also note that our decadal analysis is unlikely to be able to capture slow feedback processes acting on longer timescales, such as permafrost remobilization and priming (48). The large allocation and stocks and short residence time of wood in the wet tropics indicate the potentially rapid postdisturbance regrowth and C accumulation (49). We note that fires are less frequent but major events within boreal ecosystems (50), and therefore, longer time periods are required for retrievals to fully account for the effect of fires on high-latitude C residence times. C state and process variable retrievals are sensitive to the uncertainty characteristics of C cycle observations (35) and the prior parameter ranges (Table S2). We highlight that the current coverage and accuracy of C cycle observations (24, 51) remain major limiting factors in our approach. For example, extratropical C stock and residence time uncertainties are higher because of the absence of biomass observations. Undoubtedly, future estimates of globally spanning biomass density will provide a major constraint on CARDAMOM estimates of extratropical C state and process variables (52). Table S2. DALEC2 parameters, descriptions, and prior ranges (the DALEC2 equations are fully described in ref. 35) Land to atmosphere C flux estimates could be used to further constrain CARDAMOM C fluxes (Fig. S2) and C cycle variables associated to nonsteady C states. For example, soil C residence time samples are negatively correlated with corresponding mean 2001–2010 NCE samples at locations B (r = −0.3), T (r = −0.4), D (r = −0.5), and W (r = −0.3); therefore, regional- or grid-scale estimates of NCE could provide a much-needed additional constraint on soil C residence time. CARDAMOM flux magnitude and uncertainty can be used as prior information in global atmospheric CO 2 inversions; in turn, the assimilation of Greenhouse Gases Observing Satellite (26) and Orbiting Carbon Observatory 2 atmospheric CO 2 observations (53) should further constrain CARDAMOM NCE estimates and their associated uncertainties. In this manner, nonsteady-state C fluxes can ultimately be reconciled with ecosystem state and process variables, such as C stocks and residence times. The CARDAMOM approach provides a framework to test alternative model structures (54): in this manner, combined C cycle model parametric and structural uncertainties can be characterized, while ensuring consistency between models and global-scale datasets. This assessment would amount to a major step forward from conventional C cycle model intercomparison studies. Ultimately, an ensemble of models can be used to determine the degree to which retrievals of key C state and process variables are model-dependent. Moreover, alternative model structures could be used in CARDAMOM to assimilate globally spanning plant traits related to C cycling (55) and satellite observations, such as solar-induced fluorescence (27), vegetation optical depth (56), soil moisture (57, 58), and changes in aboveground biomass (25, 59, 60). We anticipate that the incorporation of additional datasets and alternative model structures into CARDAMOM will generate quantifiable reductions in retrieved C variable uncertainties and new ecological insights on the state of the terrestrial C cycle.

SI Text S1. Global 1° × 1° Grid MDF. Global datasets. We grid the 30-s HWSD soil C density (based on national inventories of the top 1-m soil bulk density and organic C content) (23) and an ∼1 × 1-km above- and belowground pantropical biomass map (24) at 1° × 1°. We grid the MOD15A2 MODIS LAI product (1 × 1 km) and an MODIS burned area product (0.25° × 0.25°) (34) at a 1° × 1° monthly resolution for each month within the period 2001–2010. Although finer spatial/temporal resolutions can potentially be implemented, we chose a 1° × 1° monthly resolution as a consequence of the computational cost of our approach. We use European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis Interim (ERA-interim) 1° × 1° monthly reanalysis products as DALEC2 drivers; ref. 35 has MODIS quality flag and ERA-interim driver details. We exclude 1° × 1° grid cells where desert and ice-covered areas account for more than 90% of the grid cell land cover (based on GLOBCOVER 2009 global land cover maps) (41), because their role in the terrestrial C cycle is negligible. MDF. Within each 1° × 1° grid cell i, we use the 1° × 1° aggregated biomass (tropics only), soil C, and MODIS LAI datasets (observations O i ) to constrain DALEC2 parameter x i (for a complete description of the DALEC2 model and C pools, we refer the reader to ref. 35 and references therein). We implement a Metropolis–Hastings Markov Chain Monte Carlo (MHMCMC) (33, 35) to determine the probability of x i given observational constraints O i (expression 1). The prior ranges of DALEC2 parameter x i are shown in Table S2. We also imposed a prior log-normal distribution on autotrophic respiration fraction x i,a (autotrophic respiration = 0.5 × 1.2±1) and a prior log-normal distribution on canopy efficiency x i,c (canopy efficiency parameter = 17.5 × 1.2±1), where ±1 represents a normal distribution with mean = 0 and variance = 1. These constraints yield a range of results that is broadly consistent with the global GPP range reported in ref. 30 and represent the range of autotrophic respiration estimates reported in ref. 61. The prior parameter probability p(x i ) is, therefore, expressed as p ( x i ) = p BW ( x i ) e − 0.5 ( log ( x i , a ) − log ( 0.5 ) log ( 1.2 ) ) 2 e − 0.5 ( log ( x i , c ) − log ( 17.5 ) log ( 1.2 ) ) 2 , [S1]where p BW (x i ) is the prior parameter probability described in ref. 35. Within each 1° × 1° grid cell, we prescribe an uncertainty factor of 1.5 to mean 2001–2010 HWSD soil C and total ABGB density (i.e., mean labile + foliar + fine roots + wood) and an uncertainty factor of 2 to mean monthly MODIS LAI observations. For total biomass, given that the maximum entropy algorithm used in ref. 24 was based on bins of 12.5 tC ha−1, we anticipate that low-biomass density values (such as the edges of the Sahel and Kalahari Deserts) exhibit comparable uncertainty. We, therefore, prescribe an uncertainty factor of max(1.5, 12.5/B i ), where B i is the total biomass density, and the max function denotes the maximum of the two values. The likelihood function p(O i |x i ) is, therefore, expressed as p ( O i | x i ) = exp ( − 0.5 ∑ j = 1 N ( M i j − O i j U i j ) 2 ) , [S2]where O ij and U ij are the jth observations and uncertainty factors at location i, and M ij is the equivalent DALEC2 model output based on parameter vector x i [we note that O ij , U ij , and M ij are log-transformed; e.g., for a soil C value of 100 tC ha−1, O ij = log(100) and U ij = log(1.5)]. For each LAI observation, M ij is the DALEC2 foliar C (on the corresponding month) divided by LCMA. For biomass and soil C, M ij is the DALEC2 soil C stock and mean live biomass (labile + foliar + fine roots + wood) on January 1, 2001. For the analytical description of DALEC2, the MHMCMC algorithm, and p BW (x i ), we refer the reader ref. 35 and references therein; the DALEC2 fire module is described in SI Text, section S3. Ecological and dynamical constraints. The 12 ecological and dynamical constraints (EDCs; EDCs 1–12) (35) are a component of the prior parameter probability p BW (x i ), and consist of relative constraints on allocation parameters, turnover rates, growth rates, exponential decays, and steady-state proximity. When steady state is not assumed, steady-state proximity conditions are necessary to distinguish between real and nonsensical C pool trajectories (35). We developed a simpler numeric equivalent of the steady-state proximity EDCs (EDCs 9–12) to account for the stochastic C losses from fires. For each pool, we derive the steady-state proximity factor ( S p r o x ) as follows: S p r o x = C i n p u t ¯ C o u t p u t ¯ , [S3]where C i n p u t ¯ and C o u t p u t ¯ are the mean inputs and outputs, respectively, from each pool. We impose a steady-state proximity condition of 0.5 > S p r o x > 2 for each pool. We found that EDC 8—the EDC limiting rapid exponential pool trajectories—was excessively rigid for relatively small amounts of exponential pool trajectories (which can occur naturally and/or as a model artifact). Here, we use a simpler approach to minimize the rapid exponential decay of C pools: we ensure that the steady-state proximity of each C pool at time 0 − S prox(jan2001) is within 0.05 of S prox ; i.e., | S p r o x − S p r o x ( j a n 2001 ) | < 0.05. [S4] S p r o x ( j a n 2001 ) can be derived as S p r o x ( j a n 2001 ) = S p r o x × C j a n 2001 − 2010 ¯ C j a n 2001 , [S5]where C j a n 2001 − 2010 ¯ is the mean January C pool stock and C (jan2001) is the C pool stock in January of 2001. The CARDAMOM code used in this manuscript (DALEC2 model, EDCs, and adaptive MHMCMC) is available on request. S2. DALEC2 Fire Module. To determine the monthly C losses from fires at time t, we determine the monthly fraction of each grid cell burned, B area(t) , based on the MODIS-derived burned area product (34). At each monthly time step, the fire losses within each 1° × 1° grid cell are derived as F e ( t ) = B a r e a ( t ) × ∑ p = 1 6 k f a c t o r ( p ) C ( p , t ) , [S6]where F e(t) is the total fire C emissions at time t, k factor(p) is the combustion factor for pool p, and C (p,t) is the C in pool p at time t. We also impose a resilience factor r to the remaining pools within the burned area: for the live biomass pools, a fire–mortality flux is derived from the uncombusted C pools as F m ( t , p ) = B a r e a ( t ) × ( 1 − k f a c t o r ( p ) ) ( 1 − r ) C ( p , t ) . [S7]The fire–mortality C flux from foliage, roots, and labile is deposited into the litter pool, and fire–mortality C flux from wood is transferred to the soil C pool. Equally, 1 − r of uncombusted litter C is transferred to the soil C pool. The k factor values for labile (0.1), foliar (0.9), root (0.1), wood (0.1), litter (0.5), and soil C (0.01) are broadly equivalent to the k factor values used by the Global Fire Emission Database (37). We apply a resilience factor of r = 0.5. The sensitivity calculations associated with k factor(p) and r f are described in SI Text, section S4. S3. Global State and Process Variables. The spatial distributions of individual C pool allocation fractions, residence times, and stocks are shown in Fig. S1. The residence time for each C pool at grid cell i is derived as R T p o o l ( j ) = C p o o l ( j ) F i n ( j ) − Δ C p o o l ( j ) × 365.25 , [S8]where C p o o l ( j ) is the mean pool size, F i n ( j ) is the mean daily C pool input, and Δ C p o o l ( j ) is the mean daily change in pool size throughout 2001–2010 for the jth parameter vector sample of y i [i.e., C p o o l ( j ) , F i n ( j ) , and Δ C p o o l ( j ) are calculated from DALEC2 output driven with jth parameter vector sample of y i ]. Mean live biomass (DOM) pool residence times are derived based on Eq. S8, where C p o o l ( j ) , F i n ( j ) , and Δ C p o o l ( j ) are the total live biomass (DOM) C corresponding to jth parameter vector sample of y i . Leaf lifespan is equivalent to RT foliar . Reported global and zonal 25th to 75th percentile ranges of total annual fluxes were derived from the sum of monthly 1° × 1° 25th and 75th percentiles for each flux multiplied by the 1° × 1° grid cell area; the same approach was used to derive median fluxes and mode NCE. Monthly mode NCE within each 1° × 1° grid cell was derived by binning NCE samples into 0.01 gC m−2 d−1 intervals. S4. Sensitivity Tests. We determine the sensitivity of C allocation, residence times, and C pool size estimates at locations B, T, D, and W (the B, T, D, and W coordinates are reported in the Materials and Methods) to LAI, biomass, and soil C data constraints (sensitivity tests S1–S6); fire combustion and resilience factor coefficients (sensitivity tests S7–S10); the use of Max Planck Institute GPP product (MPI GPP) (36) instead of the default DALEC GPP (19) (sensitivity test S11); and the suppression of heterotrophic respiration under −10 °C (sensitivity test S12). The sensitivity experiments are summarized in Table S3. The results of the sensitivity tests are shown in Fig. S4. S5. Comparison Against in Situ and Regional Observations. CARDAMOM results are compared against a range of in situ measurements in Table S1. We compare each in situ measurement against the 50% and 90% confidence ranges of the mean 1° × 1° values within the stated region. Comparison details and footnotes are included in Table S1. We also compare CARDAMOM total biomass against the extratropical Envisat Advanced Synthetic Aperture Radar forest biomass map (BIOMASAR map) (40) aggregated to 1° × 1°. The CARDAMOM to BIOMASAR comparison is conducted for the total biomass across all 1° × 1° areas with at least 95% BIOMASAR map coverage; total BIOMASAR biomass within those areas is 38% lower than CARDAMOM biomass. We note that lower than expected LCMA estimates in boreal ecosystems (Fig. 6) could be explained by (i) understory plant traits (linked to deciduous shrubs) or (ii) seasonal MODIS LAI biases (52). In particular, the significant correlation between LCMA and leaf lifespan suggests that retrieved LCMA accuracy could be strongly linked to seasonal biases in MODIS LAI. S6. Comparison with GLOBCOVER Land Cover Types and EOFs. For each 1° × 1° grid cell I, we determine the fraction of each GLOBCOVER (41) land cover type L: F L(i) . We then determine the Pearson’s correlation coefficients (r LS values) between f L (the vector of all 1° × 1° land cover type L fractions) and each C state and process variable vector c S (the vector of each 1° × 1° state and process variable): state or process variables (denoted by subscript S) consist of allocation fractions, C residence times, C pool sizes, and LCMA. The r LS 2 values between each GLOBCOVER land cover type fraction L and each C state/process variable S are shown in Fig. S6. The land cover categories are irrigated croplands, rain-fed croplands, mosaic cropland > vegetation, mosaic vegetation > cropland, closed to open broadleaved evergreen or semideciduous forest, closed broadleaved deciduous forest, open broadleaved deciduous forest/woodland, closed (>40%) needle-leaved evergreen forest, open needle-leaved deciduous or evergreen forest, closed to open mixed forest, mosaic forest or shrub land > grassland, mosaic grassland > forest or shrub land, closed to open shrub land, closed to open herbaceous vegetation, sparse vegetation, closed to open broadleaved forest regularly flooded, forest or shrub land permanently flooded, and closed to open vegetation on flooded or waterlogged soil (53). The multiple correlation coefficient R S between C state/process variable S and 18 GLOBCOVER land cover type fractions is derived as R S = r S T R LL − 1 r S , [S9]where r S is the 1 × 18 vector of correlations coefficients between state/process variable vector c S and 18 1° × 1° land cover type fraction vectors f L , r s T is the transpose of r s , and R LL − 1 is the inverse of the correlation matrix R LL , which contains the intercorrelations between 18 land cover type fraction vectors f L . R S is equivalent to the maximum correlation (Pearson’s r2) between the spatial variability of C state/process variable c S and the best-fitting linear combination of land cover type fraction f L . The resulting R S values are shown in Fig. 6. We also use a multiple correlation coefficient analysis on the empirical orthogonal functions (EOFs; “primary modes” of variability) of all c s . We conducted a principal component analysis to derive the eight primary EOFs [EOFs were derived using the pca.m function in Matlab; each c s vector is centered at zero and scaled to the standard deviation (SD) of c s ]. Standardized EOFs (normalized by EOF SD) and EOF coefficients are shown in Fig. S5. The EOF maps exhibit the primary modes of c s variability in space; for each c s , the maximum spatial variability explained by EOFs 1−N is the sum of standardized EOFs 1−N multiplied by their associated coefficients. EOF multiple correlation coefficients—R S(EOF) —were derived for the primary two, four, and eight EOFs based on Eq. S9, where R LL is the identity matrix (given that EOFs are orthogonal). R S(EOF) results are shown in Fig. 6. S7. Comparison Against the Multiscale Synthesis and Terrestrial Model Intercomparison Project. We compare GPP, ecosystem respiration, and NCE against the Multiscale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) terrestrial biosphere model ensemble version 1.0 (62) NCE (note that total C exchange is reported as net ecosystem exchange by MsTMIP). The 0.5° × 0.5° monthly GPP, total (heterotrophic and autotrophic) respiration, and NCE values for 2001–2010 based on the BG1 simulation were downloaded from nacp.ornl.gov/mstmipdata/ and aggregated to a 1° × 1° grid [the BG1 simulation includes time-varying nitrogen deposition, atmospheric CO 2 , and land use history (22)]. The eight MsTMIP models are shown in Fig. S2: for the sake of brevity, we did not label individual models. S8. Atmospheric CO 2 Comparison. We incorporated the 2009–2010 CARDAMOM monthly mode NCE values into the Goddard Earth Observing System 4D chemistry and transport model (GEOS-Chem) (29). The GEOS-Chem model simulations are based on GEOS-Chem version 8.2 driven by the National Aeronautics and Space Administration GEOS-5 meteorological fields. In addition to NCE, fossil fuel emissions and oceanic surface CO 2 fluxes are prescribed (55). We compared the 2009–2010 GEOS-Chem model CO 2 concentrations against the monthly mean anomaly across 12 total C column observing network sites (38): Bialystok, Poland; Darwin, Australia; Eureka, Canada; Garmisch, Germany; Karlsruhe, Germany; Lauder, New Zealand; Lauder, New Zealand; Lamont, Oklahoma; Orleans, France; Park Falls, Wisconsin; Sodankyla, Finland; and Wollongong, Australia. Details of the GEOS-Chem total C column observing network comparison are reported in ref. 63 and references therein. We note that the uncertainty in the GEOS-Chem trend stemming from the CARDAMOM flux uncertainty is substantial: global NCE 25th to 75th percentile = −8 − +13 Pg C y−1, which roughly corresponds to a ±5-ppm growth rate (1). To evaluate the CARDAMOM seasonal NCE variability, we compare the linearly detrended model and observations (Fig. S3).

Acknowledgments A.A.B., J.-F.E., L.F., and M.W. were funded by the NERC National Centre for Earth Observation. I.R.v.d.V. was financially supported under The Netherlands Organization for Scientific Research Project VIDI: 864.08.012. This work made use of the Edinburgh Compute and Data Facility resources. The research leading to these results received funding from European Union’s FP7 (2007–2013) Grant 283080 (Project GEOCARBON). The Total Carbon Column Observing Network (TCCON) is supported by the National Aeronautics and Space Administration (NASA) Carbon Cycle Science Program through a grant to the California Institute of Technology. Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with NASA.

Footnotes Author contributions: A.A.B. and M.W. designed research; A.A.B., J.-F.E., I.R.v.d.V., and L.F. performed research; A.A.B., J.-F.E., I.R.v.d.V., L.F., and M.W. contributed analytic tools; A.A.B., J.-F.E., I.R.v.d.V., L.F., and M.W. analyzed data; and A.A.B., J.-F.E., and M.W. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The outputs reported in this paper have been deposited in an online digital repository of multidisciplinary research datasets produced at the University of Edinburgh, datashare.is.ed.ac.uk/handle/10283/875.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1515160113/-/DCSupplemental.