Data Sources

We pooled five nationally representative U.S. data sets that contain repeated measures of individual-level height and weight: the National Longitudinal Survey of Youth, the National Longitudinal Study of Adolescent to Adult Health, the Early Childhood Longitudinal Study–Kindergarten, the Panel Study of Income Dynamics, and the Epidemiologic Follow-up Study of the National Health and Nutrition Examination Survey (NHANES). After removing participants who had fewer than 2 recorded observations, the pooled data set contained 176,720 observations from 41,567 children and adults, a mean (±SD) of 4.3±1.6 observations per person. Participants with 2 or more observations were on average younger and more likely to be female and white. (Details regarding data sources and exclusion criteria are provided in Section 1.1 in the Supplementary Appendix, available with the full text of this article at NEJM.org.)

Trajectory Simulation

Using these data, we developed a simulation model to predict growth trajectories on the basis of individual-level weight and height information. We interpolated childhood trajectories on the basis of growth curves developed by the Centers for Disease Control and Prevention (CDC) after adjustment for secular trends in weight using NHANES data from 1976 through 2014. We obtained the parameters that were used to adjust for these trends by means of a model-fitting procedure that aligned trends in our simulated BMI categories with recently observed trends, using data from the Census, the American Community Survey, the Behavioral Risk Factor Surveillance System, the National Survey of Children’s Health, and NHANES. We estimated trends for four BMI categories: underweight or normal weight, overweight, moderate obesity, and severe obesity. Once these steps were completed, we used the simulation model to predict the risk of obesity. (Details regarding these steps are provided in Sections 1 through 3 in the Supplementary Appendix.)

Simulation Predictions

To predict the risk of obesity at the age of 35 years, we created virtual populations of 1 million children who were 19 years of age or younger using statistical matching techniques to produce nationally representative, open populations beginning in 2016, as described previously.35,36 (In an open population, new participants are being born into the simulation model, so the population structure changes over time, whereas in a closed population, no new participants are entering the model.) We estimated the conditional probability of obesity at the age of 35 years given BMI status at each age in childhood and then calculated the associated relative risks. We also calculated the risks of future obesity at 5-year intervals for each BMI group, according to age and sex. We repeated this process with 1000 independently generated populations, each time randomly sampling a set of good-fitting parameters for secular trends.

This approach allowed us to incorporate the individual-level (first-order) uncertainty that arises from the simulation of trajectories and also to incorporate uncertainty about the parameters (second-order) that we used to adjust for secular trends.37 Because the effect of first-order uncertainty on aggregate estimates decreases as the sample size increases, most of the uncertainty in our estimates reflects the previously described second-order issues. We report means and 95% uncertainty intervals (i.e., the 2.5 and 97.5 percentiles). We also performed sensitivity analyses for secular trends by performing 100 iterations of the model, assuming there were actually no secular trends. By not incorporating uncertainty about the various parameters into our sensitivity analysis, we could estimate the relative contribution of individual-level uncertainty to our prediction intervals.

To evaluate the convergent validity (the extent to which different models that address the same problem calculate similar results) of our approach,38 we compared the simulated prevalence of obesity at the age of 35 years with logistic-regression predictions on the basis of NHANES data from 1999 through 2014 for persons between the ages of 34 and 36 years.

We also performed extensive cross-validation analyses in which we predicted values for participants in our data set.39 We removed a participant from our data set one at a time and then predicted his or her height and weight over the length of the observed trajectory. We then compared our predictions with the actual values to evaluate the accuracy of our algorithm. We conducted cross-validation analyses for participants whose last age was between 34 and 36 years and calculated the coverage probability of our estimates — that is, we calculated the probability that the actual values for this cohort fell within our predicted uncertainty intervals. In addition, we ran prospective cross-validation analyses for younger cohorts starting from the age of 2 years to 29 years and calculated the number of times that our predictions fell within the bootstrapped 95% confidence intervals (Section 4 in the Supplementary Appendix).

Our model was coded in Java as part of the Childhood Obesity Intervention Cost-Effectiveness Study (CHOICES).40,41 Statistical analyses were performed with the use of R software. BMI categories were calculated on the basis of CDC standards, with severe obesity defined as a BMI of 35 or higher for adults42 and 120% or more of the 95th percentile for children.43 Data and R code are available from the authors on request.