Study area

We limited the study region to include only arid to moderately humid vegetated land, defined by a dryness index of >0.3. We defined dryness as the average fraction of months that the mean potential evapotranspiration exceeds mean precipitation. The potential evapotranspiration was calculated using the PenmanMonteith equation33 with 30 years of meteorological data34,35. Greenness was derived from the MODIS MOD13C2 NDVI product (https://lpdaac.usgs.gov), which is a monthly composite of cloud-free observations resampled globally to 0.05° resolution. We regarded areas with maximum NDVI <0.25 through time as unvegetated and excluded them from our analysis. Our study region covered about 50% of total land area and 90% of the vegetated area.

Ecohydrological model

The World-Wide Water (W3) model11 (http://wald.anu.edu.au/) simulates water stores and flows in vegetation, surface water, soil and unconfined groundwater systems. The model was driven by global estimates of daily precipitation34, radiation, air temperature, wind speed, snowfall rate and surface pressure35. Soil and vegetation water and energy fluxes were simulated independently for deep-rooted vegetation and shallow-rooted vegetation in each hydrological response unit with different aerodynamic control of evaporation and interception capacities. The soil water store was separated into three unsaturated soil layers, namely, top (0–5 cm), shallow (5–100 cm) and deep (1–10 m) layer, where shallow-rooted vegetation and deep-rooted vegetation have different degrees of access to moisture in the different soil layers. The unconfined groundwater store was estimated with the mass balance from the groundwater storage, deep drainage from deep soil layer, capillary rise from the groundwater, groundwater evaporation and groundwater discharge. The W3 model also includes the simulation of canopy and biomass change coupling with water balance dynamics. The water in the biomass, surface water, soil and groundwater comprised the total water storage in the W3 model.

Data assimilation

Three contrasting satellite water observations with different penetration depths from surface to the total water column were used in this study, namely, surface water extent, near-surface soil moisture and changes in total water storage. The surface water extent was estimated from MODIS 8-day composites using the reflectance dissimilarity between water and dry surfaces in shortwave infrared spectral band18, analogous to the microwave method of estimating water extent using brightness temperature36. The MODIS-derived surface water extent was assimilated into the W3 model through a simple nudging approach with a high gain from the MODIS water fraction estimations to describe surface water dynamics not reliably simulated by the model. Monthly 3° × 3° GRACE mascon solutions37 were obtained from the Jet Propulsion Laboratory (http://grace.jpl.nasa.gov). In contrast to GRACE, which has the capability of detecting water storage change accumulated in the total water column, SMOS can only penetrate the land surface for up to 5 cm. The 0.25° × 0.25° retrievals of near-surface soil moisture from the Centre Aval de Traitement des Données SMOS38 (https://www.catds.fr) for both ascending and descending orbits were used to derive the daily averaged soil moisture content and to constrain the model simulated top-layer soil moisture (0–5 cm). To resolve the disparity in spatial, vertical and temporal resolution, the GRACE and SMOS data were assimilated into the W3 model using an Ensemble Kalman Smoother with a fixed 1-month window21. A single monthly GRACE observation together with all the daily SMOS observations within a 1-month window were included in the observation vector. The state vector was comprised of all model estimates of daily soil water storage in three layers and groundwater over a month and updated with GRACE and SMOS simultaneously. The observation operator including temporal accumulation components enables direct comparison with the GRACE and SMOS observations. The forecasts of water storage in different layers were adjusted with the Kalman gain matrix39 based on the uncertainties in the W3 model and satellite observations. The model uncertainties were estimated from the sample covariance computed from 100 ensemble members which were generated through the perturbation of meteorological forcings (precipitation, air temperature and radiation in this case). The observation uncertainties were quantified using the spatially and temporally varying uncertainties in the GRACE and SMOS products. GRACE and SMOS observations imparted different constraints on the estimation of water storage at different layers through both model physics and simultaneous adjustment from variance–covariance structure between model states and observations. The smoother approach with a 1-month assimilation window also considered the temporal correlation between model states to separate water storage change into different depths based on different temporal dynamics. Data assimilation produced daily global 0.25° × 0.25° estimates of water in the near-surface soil, shallow root zone, deep root zone and unconfined groundwater.

Statistical forecasts

The statistical relationships between water storage dynamics and vegetation greenness anomalies were assessed using Spearman's rank correlation (ρ). The lagged ρ between water storage integrated over different depths and greenness anomalies over the subsequent 1 to 12 months was calculated and used to define an optimal integration depth (in mm of equivalent water thickness), interpreted as the vegetation-accessible storage. Given accessible storage as a time-dependent variable, the 98th percentile of the accessible storage over the study period at each grid was calculated as the maximum storage for the soil layer that vegetation growth responds to most strongly. The number of months for which lagged ρ > 0.6 was used as an indicator of skilful forecast lead time. The specific value of threshold used was based on maximising skilful forecasts. Nevertheless, the area of skilful forecasts remains stable with changes in threshold values. Alternative predictors tested included an antecedent precipitation index with a constant decay coefficient of 0.940, the satellite-derived SMOS soil moisture, GRACE total column storage estimates and the water storage estimates from model open-loop run without any data assimilation.

A deterministic forecast of the vegetation greenness anomaly dV t in t month’s time was obtained from a linear combination of the current greenness anomaly \({\mathrm d}V_{t_0}\) and the anomaly in water storage over the determined optimal integration depth z, denoted by \(S_{z,t_0}\) as follows:

$${\mathrm d}V_t = {\mathrm d}V_{t_0} + \beta _1S_{z,t_0} + \beta _2,$$ (1)