Household survey data

We used data from Demographic Health Surveys (DHS; rounds V–VII; https://www.dhsprogram.com) and Multiple Indicator Cluster Surveys (MICS; rounds 5, 6; http://mics.unicef.org/surveys) undertaken in various LMICs from 2006 to 2018. Design of the two surveys was coordinated to facilitate data comparison and joint analyses across countries and over time. In brief, a two-stage probability sampling approach was used to select geographical clusters within countries or provinces of countries, and to select households within clusters. Survey questionnaires addressed household composition as well as risk factors, health outcomes and healthcare utilization among household occupants; original survey instruments can be viewed at https://dhsprogram.com/publications/publication-DHSG4-DHS-Questionnaires-and-Manuals.cfm (for DHS data) and https://mics.unicef.org/surveys (for MICS data).

Mothers were asked to answer survey questions on behalf of children under five years of age. Health information collected about children included history of ARI symptoms (defined as short, rapid breathing or difficult breathing that was chest-related or occurring with cough) and diarrhoea (defined as the occurrence of three loose stools in a 24-h period) in the 2 weeks preceding the survey. Mothers were asked about care seeking at public-sector or private-sector hospitals or clinics, pharmacies or doctor’s offices, and receipt of drugs including antibiotic pills, syrups or injections, antimalarials, and analgesics or antipyretics. For diarrhoea, mothers were also asked about children’s receipt of antimotility drugs, intravenous feeding, zinc and oral rehydration. Vaccination status (including antigens, doses and date of receipt) was collected from vaccination cards. Round VII of DHS and round 6 of MICS were the first to include PCV10/13 and rotavirus vaccination; thus, the case–control study (detailed below) included data from DHS round VII and MICS round 6 only.

Case–control study

We estimated PCV10/13 effectiveness against ARI-related end points and rotavirus vaccine effectiveness against diarrhoea-related end points in an individually matched, case–control study to compare counterfactual outcomes among individuals encountering similar conditions of transmission, as well as other exposures (the matching procedure is described below). End points were: occurrence of ARI and diarrhoea symptoms; occurrence of ARI and diarrhoea symptoms for which medical treatment or advice was sought (from a source other than a traditional practitioner); and occurrence of ARI and diarrhoea symptoms for which antibiotic pills, syrups or injections were received. For diarrhoea analyses, we also considered an end point of diarrhoea symptoms resulting in oral rehydration using pre-packaged salts or fluid, or recommended homemade fluids or intravenous solution, among children who sought care (results are included in Supplementary Tables 4, 5). For the purposes of our analyses, children experiencing each end point in the preceding 2 weeks were defined as ‘cases’, and those who did not report symptoms were defined as controls.

Within each country, we matched cases and controls on the basis of age and visit timing (each within one month), within-country wealth quintile (a variable defined on the basis of country-specific household socioeconomic indicators), urbanicity (residence within a sampling cluster defined as urban or rural) and pentavalent vaccine doses received (to reduce confounding by healthcare access, including access to vaccination, and to control for any effects of diphtheria, pertussis and Haemophilus influenzae type b antigens on ARI). Healthcare utilization questions for the ARI end point did not distinguish whether care was received due to respiratory symptoms or due to fever. Therefore, to separate vaccine effects from antibiotic treatment associated with ARI symptoms (rather than treatment of fever cases that would be expected to occur independent of ARI symptoms, for example, due to malaria, typhoid or arboviruses), we also matched children on mother-reported fever and excluded those receiving antimalarials.

We matched cases and controls without replacement at a 1:3 ratio and undertook statistical inference in a resampling framework, generating 2,000 independent draws from the permutation distribution of case–control match assignments. Within each iteration, we estimated the matched odds ratio (OR M ) using conditional logistic regression. Defining Y = 1 and Y = 0 as case and control status, respectively, and V = 1 and V = 0 as vaccinated and unvaccinated status, respectively, we took 1 − OR M (V|Y) to measure vaccine effectiveness.

Assessment of residual confounding

The relatively recent implementation of PCV10/13 and rotavirus vaccines in the immunization programmes of countries may make pentavalent vaccine an imperfect surrogate of the access of children to these newer products. In particular, within-country geographical differences in access to PCV10/13 and rotavirus vaccines may introduce residual confounding if the differences in vaccine access are associated with geographical variation in disease burden.

Data from DHS and MICS include the subnational administrative regions in which children reside, although this variable must be interpreted with caution as the size and uniformity of subnational regions varies across countries (between 4 and 3,399 children were available from each region (median, 217 children), before accounting for match eligibility). Although it was not possible to match children at precise subnational resolutions for our case–control study, we used subnational administrative region designators to assess risk for residual confounding by geography. We computed the proportion of variance in ARI and diarrhoea outcomes, and in PCV10/13 and rotavirus vaccine receipt, explained by subnational region after accounting for all other matching factors included in our analysis. Specifically, we assessed the difference in the summed squared error of residuals from regression models treating disease end points and vaccination as outcome variables, which either included or did not include subnational administrative units as intercepts after controlling for age and visit timing (each as monthly factor variables), urbanicity, wealth quintile, mother-reported fever (only for analyses of ARI and PCV10/13) and pentavalent vaccine doses received. Low estimates of the proportion of variance explained by subnational region in either the exposure or outcome suggest minimal risk of confounding by geography. The results of this analysis are included in Supplementary Table 6. In addition, to assess whether negative-control analyses provided a sufficient opportunity to correct for residual confounding associated with access to these newer vaccines, we estimated the correlation in receipt of full PCV10/13 and rotavirus vaccine series among age-eligible children in countries where both vaccines were recommended. We describe the results of these analyses in the Supplementary Information section ‘Assessment of residual confounding’.

Vaccine probe study

We formulated a simple model that enables the estimation of the proportion of episodes attributable to vaccine-targeted pathogens for each end point (for example, ARI symptoms or antibiotic-treated ARI symptoms) in the absence of vaccination. We defined ρ as the risk for a child to experience a given end point attributable to the vaccine-targeted pathogen in the 2-week recall period, ω as the risk of the same end point attributable to all other causes and θ as the relative risk of disease caused by the vaccine-targeted pathogen in a vaccinated individual relative to an unvaccinated individual, owing only to vaccine-conferred protection (such that 1 − θ measures vaccine efficacy against illness caused by the targeted pathogen). Under this framework

$${{\rm{OR}}}_{{\rm{M}}}(V|Y)=\frac{\Pr (V=1,Y=1){\rm{\Pr }}(V=0,Y=0)}{\Pr (V=0,Y=1){\rm{\Pr }}(V=1,Y=0)}=\frac{[\theta \rho +\omega ][1-\rho -\omega ]}{[1-\theta \rho -\omega ][p+\omega ]}=\frac{\theta \rho +\omega }{\rho +\omega }\times \frac{1-\rho -\omega }{1-\theta \rho -\omega }$$ (1)

Noting that \(\frac{1-\rho -\omega }{1-\theta \rho -\omega }\approx 1\) for small ρ and ω

$${{\rm{O}}{\rm{R}}}_{{\rm{M}}}(V|Y)\approx \frac{\theta \rho +\omega }{\rho +\omega }$$ (2)

or in other words, we expected the matched odds ratio to provide a reasonable estimator for the relative hazard of each end point, given vaccination. Using equation (2), above

$$\rho \approx \frac{\omega (1-{{\rm{O}}{\rm{R}}}_{{\rm{M}}}(V|Y))}{{{\rm{O}}{\rm{R}}}_{{\rm{M}}}(V|Y)-\theta }$$ (3a)

and

$$\frac{\rho }{\rho +\omega }\approx \frac{1-{{\rm{O}}{\rm{R}}}_{{\rm{M}}}(V|Y)}{1-\theta }$$ (3b)

The above approximation relies on the assumption that OR M (V|Y) is similar to the relative risk of disease, given vaccination. We illustrate in Extended Data Fig. 2 the potential degree of bias under differing values of θ, ρ and ω that are concordant with our results and those of external studies assessing all-cause burden of ARI and diarrhoea among children in LMICs15,50,51. The degree of bias approaches zero under any of the following conditions: (i) the all-cause disease end point is rare (that is, ρ + ω ≈ 0); (ii) the attributable fraction is low (ρ ≪ ω); or (iii) the vaccine is highly efficacious against disease due to the pathogen of interest (θ ≈ 0).

Estimation of the fraction attributable to rotavirus

Because previous studies have reported differential efficacy and effectiveness of rotavirus vaccines in socioeconomically distinct settings, we generated stratified estimates of OR M (V|Y) (in the case–control study) and θ (in the meta-analysis, as described below) for children aged 0–23 months residing in middle-income and low-income countries. We used these inputs to generate separate rotavirus-attributable fraction estimates associated with each end point for middle-income and low-income countries. We reconstructed the distribution of the rotavirus-attributable fraction for each end point across all LMIC settings (Fig. 2 and Supplementary Table 8) using the weighted average of paired draws from the independent distributions of the attributable fraction estimates of middle-income and low-income countries; weights were proportional to the total populations of children aged 0–23 months residing in middle-income and low-income countries.

To account for differential direct effects of the vaccine against rotavirus-attributable diarrhoea end points in middle-income and low-income countries, we used our stratified estimates of rotavirus vaccine efficacy, effectiveness and attributable fractions for all burden assessments.

Meta-analysis of vaccine efficacy studies

We obtained estimates of θ for PCV10/13 and rotavirus vaccine using a meta-analysis of vaccine efficacy studies. Because θ has not been estimated for the effects of PCV10/13 against vaccine-serotype ARI, we took PCV efficacy against invasive disease caused by vaccine-serotype pneumococci as a primary estimate of protection, and PCV efficacy against acute otitis media caused by vaccine-serotype pneumococci as a lower-bound estimate, consistent with previous studies52. Estimates of PCV efficacy against these end points were aggregated from two previously published reviews53,54. We used estimates of efficacy against invasive pneumococcal disease from only low- and middle-income countries. Because such estimates were not available for PCV efficacy against acute otitis media, we did not restrict estimates for this end point on the basis of study setting.

For rotavirus, we used estimates of θ from studies that estimated the efficacy of the human pentavalent rotavirus vaccine (Rotarix) against rotavirus gastroenteritis in the first two years of life in LMICs, again aggregated in a previous systematic review55. Data were restricted to Rotarix studies because this was the only vaccine used in the national immunization programmes of countries included in the case–control study.

For both end points, we estimated pooled values of θ using inverse variance-weighted random effect models using log-transformed effect size estimates from each study.

Risk factor analysis of household survey data

We used a regression-based approach to estimate country-level rates of incidence of ARI and diarrhoea in the year 2016 in the absence of PCV10/13 and rotavirus vaccine use. Because previous studies have described marked variation in the clinical threshold for maternal or caregiver report of ARI and diarrhoea28,32,33,56, we aimed to reconstruct incidence rates on the basis of risk factor prevalence, controlling for region-level intercepts that arise from differential reporting of symptoms. We built Poisson regression models for the outcome of reporting of syndrome Z (signifying all-cause ARI or diarrhoea) for child i of the form

$${\rm{E}}({Z}_{i}|{X}_{i})={\pi }_{{\rm{R}}(i)}\times \exp \,(\alpha +\sum _{k}{\beta }_{k}{X}_{k,i}+{{\epsilon }}_{i})$$ (4)

where α represents a general intercept for the log-transformed incidence rate. Each variable X k,i indicates the exposure of child i to the kth risk factor for syndrome Z, associated with a β k -fold increase in incidence; the \({{\epsilon }}_{i}\) errors are independent and identically distributed with mean zero among all children. The term π R(i) indicates the probability of reporting syndrome Z given it truly occurred among children in the same region as child i. We extracted the risk factors listed in Supplementary Table 27 on the basis of their inclusion in both DHS and MICS datasets, and based on previous evidence of their relevance to the risk of ARI and/or diarrhoea for children51,57,58. In addition to variables included in DHS and MICS, we extracted the gross domestic product (GDP) per capita of each country in the survey year and in 2016. We included covariates in the model identified to have a statistically significant (P < 0.05) unadjusted association with the outcome variable that persisted in multivariable analyses, and tested for significant pairwise interactions among covariates. Risk factor estimates are shown in Extended Data Fig. 3 and Supplementary Tables 28–30 for ARI (for children aged 24–59 months and 0–59 months) and diarrhoea (for children aged 0–23 months).

We similarly modelled the probability (H) for ARI and diarrhoea to be treated with antibiotics using a Poisson regression model of the form

$${\rm{E}}({H}_{i}|{X}_{i})=\exp \,(\delta +\sum _{k}{\delta }_{k}{X}_{k,i}+{\eta }_{i})$$ (5)

We evaluated the same risk factors for inclusion as in our analyses of ARI and diarrhoea; here we defined η i as mean-zero independent and identically distributed error terms. The individual risk factor estimates are shown in Extended Data Figs. 4, 5 and Supplementary Tables 31–33 for antibiotic-treated ARI and diarrhoea.

We limited analyses of ARI and diarrhoea end points to children who had received no PCV10/13 doses and no rotavirus vaccine doses, respectively. For DHS rounds V–VI and MICS round 5, which did not include PCV10/13 and rotavirus vaccine data collection, children were assumed not to have received PCV10/13 or rotavirus vaccine doses if countries had not yet implemented these vaccines in their immunization programmes at the time of the survey. Surveys undertaken in countries that had implemented PCV10/13 or rotavirus vaccines were excluded if information on the receipt of these vaccines among children was not collected.

Missing values in the outcome and risk factor variables were populated in five independent pseudo-datasets by multiple imputation using the Amelia II package in R59; in total, 11.4% of data cells were missing at the outset of analyses, including 0.6% and 2.4% of ARI and antibiotic-treated ARI observations, and 0.1% and 0.4% of diarrhoea and antibiotic-treated diarrhoea observations. Additional measures12 of country-level antibiotic access (Supplementary Table 27) were included due to scientific interest and a consideration that these covariates may aid imputation.

Assumption of a Poisson outcome distribution

Translating the reported history of any ARI or diarrhoea in the preceding 2 weeks to a Poisson-distributed number of ARI or diarrhoea episodes required two assumptions: that episodes are acute (consistent with the nature of illness solicited by DHS and MICS household surveys) and that overall incidence rates are low (resulting in a low or negligible likelihood of ≥2 distinct episodes occurring in a single fortnightly period in which a child was reported to have experienced ARI or diarrhoea)30. The low frequency of diarrhoea lasting 14 days or longer in a previous cohort study in LMICs21 (accounting for 4.9% and 1.8% of all diarrhoea episodes in the first and second years of life, respectively) indicated that our assumption of an acute nature of mother-reported illness was appropriate for the diarrhoea outcome. Similarly, previous studies of respiratory infections among children in LMICs have reported short average durations of new-onset disease episodes, with 87% to 100% of episodes lasting ≤2 weeks60,61,62. For incidence rates similar to those estimated in our study (approximately 4 diarrhoea cases and 1.25 ARI cases per child per year), assuming a 7-day average duration of illness and a definition of 3 symptom-free days to distinguish separate disease episodes, the assumption of exponentially distributed inter-event times (concordant with Poisson-distributed counts) would yield a second new-onset diarrhoea case in only 4.3% of 2-week reporting periods with a first case identified, and a second new-onset ARI case in only 1.4% of reporting periods with a first case identified. This small bias, if applicable to our analyses, would be expected to lead to slight underestimation of true burden.

Incidence rate estimation

From the fitted models, we generated standardized estimates of the hazard rates of ARI and diarrhoea for individual children for the year 2016. To correct for differential reporting, we took the lowest and highest estimated reporting probability from any region (π R ) to provide lower and upper bounds, respectively, on true incidence rates.

We sampled from the multivariate normal distribution of (log-transformed) regression parameter estimates and model residuals, thus accounting for uncertainty in the effects of risk factors on event rates. To propagate uncertainty in our analyses driven by sampling variability in the surveyed population as well as our multiple imputation procedure, we repeated this analysis 1,000 times for each of 5 multiply imputed datasets. At each iteration, we resampled children according to survey weights before sampling ARI and diarrhoea outcomes stochastically based on individual-level risk factors. On the basis of the estimated incidence rate of ARI and diarrhoea of each child from these risk factors, we sampled a total number of annual cases assuming this value would be Poisson-distributed with respect to the underlying individual-specific rate. We used the same approach to sample the antibiotic treatment status of each case based on model-generated estimates of the probability of antibiotic treatment, given individual risk factors, under an assumed Bernoulli outcome distribution.

Extrapolation of burden estimates to countries without household survey data

For the 58 LMICs not covered by DHS rounds V–VII or MICS rounds 5 and 6, we extrapolated rates of incidence of all cases and antibiotic-treated cases of ARI and diarrhoea on the basis of national-level variables aggregated as Health, Nutrition and Population Statistics by the World Bank (https://databank.worldbank.org/source/health-nutrition-and-population-statistics). For each of 5,000 draws from the distribution of country-specific incidence-rate and treatment probability estimates (1,000 draws each from 5 multiply imputed datasets), we fitted regression tree models using tenfold cross-validation to a randomly sampled 90% set of countries via stochastic gradient boosting using the caret package in R with default tuning parameters63. We saved predictions for the 10% ‘holdout’ set of countries to assess model performance (Extended Data Fig. 6). We resampled the 90/10 set of training and holdout countries at each iteration. We used fitted models to generate out-of-sample predictions for countries without DHS and MICS data.

Estimating the vaccine-preventable disease burden

We multiplied independent draws from the distribution of our estimates of the proportion of antibiotic-treated ARI episodes and antibiotic-treated diarrhoea episodes attributable to vaccine-serotype pneumococci and rotavirus, respectively, by independent draws from the distribution of our estimates of country-level incidence rates for antibiotic-treated ARI (for children aged 24–59 months and 0–59 months) and antibiotic-treated diarrhoea (for children aged 0–23 months). We used our stratified estimates of rotavirus-attributable fractions (for low-income and middle-income countries) for all analyses of diarrhoea burden end points. For analyses estimating vaccine direct effects, we multiplied independent draws from the distribution of our estimates of PCV10/13 direct effects against antibiotic-treated ARI and rotavirus vaccine direct effects against antibiotic-treated diarrhoea, as obtained in the case–control study, by independent draws from the distribution of our estimates of country-level incidence rates for antibiotic-treated ARI (of children aged 24–59 months and 0–59 months) and antibiotic-treated diarrhoea (in children aged 0–23 months).

To generate estimates of burden across multiple countries (globally, or grouped by income strata), we estimated total cases in each country by multiplying independent draws of the incidence rate for each end point by the number of children in each age group (using World Bank estimates; https://databank.worldbank.org/home). For each sampled parameterization (defined by estimates of vaccine effectiveness, pathogen-attributable burden and incidence rates), we summed total cases across countries belonging to a stratum of interest and estimated incidence by dividing total summed cases by the population at risk.

Potential reduction in PCV10/13 effects owing to serotype replacement

Replacement of vaccine-targeted pneumococcal serotypes by nonvaccine serotypes may partially offset the estimated effect of PCV10/13 on all-cause ARI end points. We assessed the extent to which this may alter our estimates by determining the maximal increase in nonvaccine-type antibiotic-treated ARI that could be expected under scenarios consistent with reported serotype replacement in pneumococcal carriage in LMIC settings.

We defined p VT (t) and p NVT (t) as the prevalence of carriage of vaccine-type (VT) pneumococci and nonvaccine-type (NVT) pneumococci at time t, and r VT and r NVT as the rates of progression of vaccine-type and nonvaccine-type pneumococci from carriage to ARI42. We used previous estimates of the relative incidence of vaccine-type and nonvaccine-type invasive pneumococcal disease per carrier64, and of vaccine-type and nonvaccine-type otitis media per carrier65, to supply bounds on r VT and r NVT for ARI, as this quantity has not previously been measured. We used data from three studies of pneumococcal carriage among children less than five years old before and after vaccine introduction in settings with long-term, continuous, prospective surveillance in place7,44,66 to supply pre-vaccination (p VT (0) and p NVT (0)) and post-vaccination (p VT (1) and p NVT (1)) estimates of carriage prevalence. We took [p NVT (1) − p NVT (0)]r NVT to indicate the excess incidence of disease attributable to nonvaccine serotypes that would result from post-vaccination replacement of vaccine-targeted serotypes by nonvaccine serotypes.

We compared this excess post-vaccination incidence attributable to serotype replacement with two measures of pre-vaccination disease incidence. The ratio [p NVT (1) − p NVT (0)]r NVT /[p VT (0)r VT ] compared the excess replacement-associated incidence of nonvaccine-serotype disease to the incidence of vaccine-serotype disease targetable by vaccination with PCV10/13. Second, defining AF as the fraction of all-cause disease incidence attributable to vaccine-type pneumococci, the ratio [p NVT (1) − p NVT (0)]r NVT AF/[p VT (0)r VT ] compared the extent of replacement-driven nonvaccine-serotype disease incidence to all-cause incidence in the pre-vaccination era. These estimates are shown in Extended Data Fig. 1, using the attributable fraction estimates for all-cause antibiotic-treated ARI (as this was the primary end point of interest in our analyses).

In light of the assumptions informing our interpretation of PCV10/13 effectiveness against non-specific ARI end points in the case–control study, we consider the estimates provided by this analysis to represent an upper bound on the potential replacement-driven disease burden. Our estimates of vaccine-serotype pneumococcal attributable fractions based on the case–control study assumed no effect of PCV10/13 on disease not attributable to vaccine-serotype pneumococci. However, children receiving PCV10/13 in prelicensure or early-implementation studies in fact have been reported to experience higher risk of carriage of nonvaccine serotypes67 and resulting nonvaccine-type disease68,69. As such, odds ratio estimates in our case–control study may be interpretable as representing the ‘net’ reduction in disease risk among vaccinated children, accounting for both protection against vaccine-type pneumococci and increased acquisition of nonvaccine types.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.