Overview

We fitted a discrete hazards geostatistical model51,52 with correlated space–time–age errors and made predictions to generate joint estimates—with uncertainty—of the probability of death (the number of deaths per live births) and the number of deaths for children aged 0–28 days (neonates), children under 1 year old (infants) and children under 5 years old at the subnational level for 99 LMICs for each year from 2000 to 2017. The analytical process is summarized in the flowchart in Extended Data Fig. 6. We made estimates at a grid-cell resolution of approximately 5 × 5-km and then produced spatially aggregated estimates at the first (that is, states or provinces) and second (that is, districts or counties) administrative levels, as well as the country level.

Countries were selected for inclusion in this study based on their socio-demographic index (SDI) published in the Global Burden of Disease study (GBD)53. The SDI is a measure of development based on income per capita, educational attainment and fertility rates among women under 25 years old. We primarily aimed to include all countries in the middle, low–middle or low SDI quintiles, with several exceptions. Brazil and Mexico were excluded despite middle SDI status owing to the availability of high-quality vital registration data in these countries, which have served as the basis for existing subnational estimates of child mortality. Because this study did not incorporate vital registration data sources (see ‘Limitations’), Brazil and Mexico were not estimated directly; instead, state-level estimates from the GBD 2017 study were directly substituted in figures where appropriate4. Albania and Moldova were excluded despite middle SDI status owing to geographical discontinuity with other included countries and lack of available survey data. North Korea was excluded despite low–middle SDI status owing to geographical discontinuity and insufficient data. As countries with high–middle SDI status in 2017, China and Malaysia were excluded from this analysis. Libya was included despite high–middle SDI status to create better geographical continuity. Island nations with populations under 1 million were excluded because they typically lacked sufficient survey data or geographical continuity for a geospatial analytic approach to be advantageous over a national approach. Supplementary Figure 3.1 shows a map of the countries included in this study and Supplementary Table 3.1 lists the countries.

Data

We extracted individual records from 555 household sample survey and census sources. Records came in the form of either summary birth histories (SBHs) or complete birth histories (CBHs). All input data were subject to quality checks, which resulted in the exclusion of 82 surveys and censuses owing to quality concerns (see Supplementary Information section 3.2 for more details). Data on life and mortality experiences from CBH sources can be tabulated directly into discrete period and age bins, thus allowing for period-specific mortality estimations, known as the synthetic cohort method54,55,56. For SBH data, we used indirect estimation57 to estimate age-specific mortality probabilities and sample sizes and assign them to specific time periods. Complete details are available in Supplementary Information section 3.3.2.

In all cases, after pre-processing, each data point provided a number of deaths and a sample size for an age bin in a specific year and location. We referenced all data points to GPS coordinates (latitude and longitude) wherever possible. In cases in which GPS data were unavailable, we matched data points to the smallest possible areal unit (also referred to as ‘polygons’). All polygon data were spatially resampled into multiple GPS coordinates and weighted based on the population distribution following a previously described procedure5,22,23,58 and described in Supplementary Information section 3.4. Our combined global dataset contained approximately 15.9 million births and 1.1 million child deaths. A complete list of data sources is provided in Supplementary Table 8.1.

In addition to data on child mortality, we used a number of spatial data sources for this analysis. These included a suite of geospatial covariates, population estimates and administrative boundaries68. These sources and processing procedures are described in Supplementary Information section 4.

Spatial covariates

We extracted values from each of 10 geospatial covariates at each data point location. Geospatial covariates are spatial data represented at the 5 × 5-km grid-cell resolution. The covariates were travel time to the nearest city, educational attainment of maternal-aged women, the ratio of population of children under 5 to women of reproductive age (ages 15–49 years old), the mass per cubic meter of air of particles with a diameter less than 2.5 μm, total population, a binary indicator of urbanicity, intensity of lights at night, the proportion of children aged 12–23 months who had received the third dose of diphtheria–pertussis–tetanus vaccine, incidence rate of Plasmodium falciparum-associated malaria in children under 5 and prevalence of stunting in children under 5 (see Supplementary Information). All covariate values were centred on their means and scaled by their standard deviations. Covariates typically had global spatial coverage and values that vary by year. More details of the spatial covariates can be found in Supplementary Information section 4.

Analysis

Geostatistical model

To synthesize information across various sources, and to make consistent estimates across space and time, we fitted discrete hazards51,52 geostatistical models59 to our data. The models were discrete in the sense that ages were represented in seven mutually exclusive bins (0, 1–5, 6–11, 12–23, 24–35, 36–47 and 48–59 months), each with its own assumed constant mortality probability. The model explicitly accounted for variation across age bin, year and space through inclusion of both fixed and random effects. Indicator variables for each age bin were included to form a discrete baseline mortality hazard function, representing the risk of mortality in discrete bins from birth to 59 months of age with covariates set at their means. Baseline hazard functions were allowed to vary in space and time in response to changing covariate values, as well as in response to linear effect on year. To model this relationship, we estimated the effect of each covariate value on the risk of mortality. These estimated effects were then applied to the gridded surface of covariate values to make predictions across the entire study geography. We also included a Gaussian random effect across countries to account for larger-scale variations due to political or institutional effects, as well as a Gaussian random effect for each data source to account for source-specific biases. Finally, we included a Gaussian process random effect with a covariance matrix structured to account for remaining correlation across age, time and physical space. As such, estimates at a specific age, time or place benefitted from drawing predictive strength from data points nearby in all of these dimensions.

For each modelling region, we fitted one such discrete hazards model with a binomial data likelihood. All data were prepared such that we counted or estimated the number of children entering into (n) and dying within (Y) each period–age bin from each GPS-point location (s) in each survey (k) within each country (c).

The number of deaths for children in age band (a) in year (t) at location (s) was assumed to follow a binomial distribution:

$${Y}_{a,s,t} \sim {\rm{binomial}}\left({n}_{a,s,t},\;{P}_{a,s,t}\right)$$

where P a,s,t is the probability of death in age bin a, conditional on survival to that age bin for a particular space–time location. Using a generalized linear regression modelling framework, a logit link function is used to relate P to a linear combination of effects:

$${\rm{logit}}\left({P}_{a,s,t}\right)={\beta }^{0}+\mathop{\sum }\limits_{a=2}^{7}{I}_{a}{\beta }_{a}^{1}+{\beta }^{2}{X}_{s,t}+{\beta }^{3}t+{

u }_{c\left[s\right]}+{

u }_{k\left[s\right]}+{Z}_{a,s,t}$$

The first term, β0, is an intercept, representing the mean for the first age band when all covariates are equal to zero, whereas \({\beta }_{a}^{1}\) are fixed effects for each age band, representing the mean overall hazard deviation for each age band from the intercept, when all other covariates are equal to zero. β2 are the effects of geospatial covariates (X s,t ), which we describe in detail in Supplementary Information section 4. β3 is an overall linear temporal effect to account for overall temporal trends within the region. All geospatial covariates were centred and scaled by subtracting their mean and dividing by their standard deviations. Each v term represents uncorrelated Gaussian random effects: \({

u }_{c\left[s\right]} \sim {\rm{normal}}\left(0,{\sigma }_{c}^{2}\right)\) is a country-level random effect applied to all locations (s) within a country (c); \({

u }_{k\left[s\right]} \sim {\rm{normal}}\left(0,{\sigma }_{k}^{2}\right)\) is a data source-level random effect for the survey (k) from which the data at location s were observed. Data source-level random effects were used to account for systematic variation or biases across data sources and were included in model fitting but not in prediction from fitted models. The term Z a,s,t ~ Gaussian process(0, K) is a correlated random effect across age, space and time, and is modelled as a four-dimensional mean zero Gaussian process with covariance matrix K. This term accounts for structured residual correlation across these spatial–age–temporal dimensions that are not accounted for by any of the model’s other fixed or random effects. This structure was chosen, because the hazard probability for each age group is expected to vary in space and time, and such spatiotemporal correlations are likely to be similar across ages. K is constructed as a separable process across age, space and time \(\left(K={\Sigma }_{a}\otimes {\Sigma }_{t}\otimes {\Sigma }_{s}\right)\). The continuous spatial component is modelled with a stationary isotropic Matérn covariance function, and the age and temporal effects were each assumed to be discrete auto-regressive order 1. We provide further details on model fitting and specification in Supplementary Information section 5.1.

We assigned priors to all model parameters and performed maximum a posteriori inference using Template Model Builder60 software in R version 3.4. We fitted the model separately for each of 11 world regions (see Supplementary Fig. 3.1), owing to memory constraints and to allow model parameters to vary across epidemiologically distinct regions.

Post-estimation

Using the joint precision matrix and point estimates, we generated 1,000 draws from all model parameters using a multivariate-normal approximation. These model parameter draws were used to predict corresponding draws of mortality probabilities across all age groups for each grid cell in each year. In other words, for each age bin in each year we estimated 1,000 gridded surfaces of mortality probability estimates, each surface corresponding to one draw from the posterior parameter estimates61. All subsequent post-estimation procedures were carried out across draws to propagate model uncertainty. We used these estimated spatiotemporal gridded surfaces of age-specific mortality probabilities to produce various final resulting data products.

From the fitted model parameters, we produced posterior mortality probability estimates for each age group for each 5 × 5-km grid cell for each year from 2000 to 2017. We combined gridded age group estimates to obtain infant (under 1) and child (under 5) mortality estimates at each gridded location. Using a conversion from mortality probability to mortality rates, and using a gridded surface of population, we also estimated the number of deaths that occurred in each age group at each location in each year. For both mortality probabilities and counts, we multiplied out corresponding gridded estimates by a constant to ensure that at the national—and in two countries, the first administrative-level unit—aggregated estimates for each age group and year were calibrated such that they equalled estimates in the GBD study4. This calibration allowed us to take advantage of national data sources, such as vital registration, that could not be used in this study. We also aggregated grid-cell-level estimates to first and second administrative-level units using gridded population surface to weight estimates. These steps are described in Supplementary Information section 5.2.

Model validation

We used fivefold cross-validation to assess and compare model performance with respect to estimating local trends of age-specific mortality. Each fold was created by combining complete surveys into subsets of approximately 20% of data sources from the input data. Holding out entire surveys at a time served as a comparable approximation to the type of missingness in our data, essentially helping us check how well our model estimates of mortality probabilities compared to empirical estimates of mortality probability from an unobserved data source that did not inform the model.

For each posterior draw, we aggregated to administrative units. Using data aggregated to the administrative unit and aggregated estimate pairs, we calculated the difference between out-of-sample empirical data estimates and modelled estimates, and we report the following summary metrics: mean error, which serves as a measure of bias, the square root of mean errors, which serves as a measure of the total variation in the errors, the correlation and 95% coverage. At the second administrative-level unit for under-5 mortality, our out-of-sample 95% coverage was 93%, correlation was 0.78, mean absolute error was 0.015 and mean error was −0.0011. These results indicate a good overall fit, with minimal bias. This procedure and the full validation results are discussed in Supplementary Information section 5.3.

Limitations

This work should be assessed in full acknowledgement of several data and methodological limitations. We exclusively used CBH and SBH data from household survey and census data sources. Ideally, estimates of child mortality should incorporate all available data, including data from administrative vital registration systems. Vital registration systems are commonly present in many middle-income and all high-income countries. There are known data-quality issues with vital registration sources in many middle-income countries48,62 that add complications to their inclusion in our modelling procedure. For example, systems may not capture all deaths, and this level of ‘underreporting’ probably varies in space, time and age. In addition, underreporting is probably negatively correlated with mortality, and could contribute substantial bias to estimates. Statistical methods must be developed to jointly estimate—and adjust for—underreporting in vital registration data before such data can be used in geospatial models of child mortality. Promising work has begun in this domain in specific countries63, but further advancement will be necessary to improve estimates across a time series and across many countries at once.

We assume that SBH and CBH data were retrospectively representative in the locations in which they were collected. As such, we assume that survey respondents did not migrate. High-spatial-resolution migration estimates with which to adjust estimates do not yet exist, and many of the data sources that we use do not collect information on migration. We conducted a focused sensitivity analysis (Supplementary Information section 5.4.4) for migration in six countries, and found that although our results were generally robust, there was variation by country. Furthermore, despite providing high-quality retrospective data from representative samples of households, birth history data can suffer from certain non-sampling issues, such as survival/selection biases64 and misplacement of births65. We did not attempt to make corrections to data, and they were used as-is. Furthermore, retrospective birth history data will—by design—have a changing composition of maternal ages depending on the time since the survey. This was minimized by limiting retrospective trends to up to 17 years.

Although we collated a large geo-referenced database of survey data on child mortality, these data represented about 1% (1.1 million) of total deaths of children under 5 in study areas over the period. Where data do not exist or are not available in certain locations, mean estimates are informed from smoothing to nearby estimates and covariates. As such, there could be additional small-scale heterogeneity that is not picked up by our model. Wider uncertainty intervals in areas with no data account for these potential unknowns, and our 95% coverage estimates in out-of-sample predictive tests appear to be well-calibrated at the second administrative unit level. Furthermore, discrete localized mortality shock events could be missing in our analysis due to the lack of data and selection biases in surveys and censuses, and spatiotemporal smoothing. Fatal discontinuities are explicitly accounted for at the national or province level by calibration to GBD estimates. In all, 0.35% (0.4 million) of the 123 million deaths over this period were attributed to fatal discontinuities.

On the modelling side, we integrated point and areal data into a continuous model by constructing pseudo-points from areal data. Modelling approaches that integrate point and areal data as part of a joint model likelihood function are in development66 but are currently computationally infeasible at the large geographical scales at which we currently model. Furthermore, we divided our models into 11 regional fits (see Supplementary Fig. 3.1), as a full model that encompasses all 99 countries would be computationally infeasible due to memory constraints. Splitting up modelling in this way had the benefit of enabling parameters to vary across epidemiologically distinct world regions. A preferred model, however, would be fitted to all data simultaneously with parameters that are spatially variable.

The separable model used for age–space–time correlations is a common parsimonious assumption afforded in applying spatiotemporal geostatistical models due to efficient computation and inference; however, it yields the assumption of fully symmetric covariance. The symmetry implicit in the separable model dictates, for example, that (holding age constant for simplicity) the covariance between the observations at (location 1, time 1) and (location 2, time 2) is the same as the covariance between (location 1, time 2) and (location 2, time 1). Given our available data density in space–age–time, we believe that attempting to parameterize a more complex non-separable model would be challenging both computationally and inferentially, and it is not clear whether there would be much to benefit from the extra complications.

There are several limitations to address with respect to the use of covariates in the model. Most of the geospatial covariates that we used in the geostatistical model were themselves estimates produced from various geospatial models. Some of those estimated surfaces used covariates that were also included in our model in their estimation process. As such, we emphasize that our model is meant to be predictive, and that drawing inference from fitted coefficients across these highly correlated covariates is problematic and not recommended. Furthermore, we assumed no measurement error in the covariate values and assumed that the functional form between mortality and all covariates was linear in logit space. In certain locations, we used covariate values for prediction that were outside the observed range of the training data. As we explore in Supplementary Information section 5.4.2, however, these areas represent a relatively small proportion of the population.

Finally, we used a method for indirect estimation of SBHs that was recently developed and validated57. As such, indirect estimation was carried out as a pre-processing step before fitting the geostatistical model. We attempted to propagate various forms of uncertainty that could be introduced in this step, which resulted in halving the total effective sample size across all SBH data. In future, we aim to fully integrate such processing into the statistical model; such methods are in development67, but are not yet computationally feasible at scale.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.