Subjects

This secondary data analysis included a sample of 5787 twin pairs from the community-based Washington State Twin Registry within a cross-sectional study design. Twins include both monozygotic (MZ) and dizygotic (DZ) male and female twin pairs of the same sex, aged 18–97 years, reared together. Participants were recruited from Washington State driver’s license and identification card applications [30]. All twins completed an enrollment survey with questions related to childhood similarity to evaluate twin zygosity (MZ vs. DZ), a common twin registry practice with an accuracy of 95–98% compared to biological indicators [31, 32]. Twins were mailed an invitation letter and enrollment survey including questions related to height, weight, and soda consumption. Data collected from completed questionnaires received between 2009 and 2015 were analyzed.

Measures

Body Mass Index. The main outcome, BMI, was calculated from self-reported height and weight and expressed as kg/m2. The height and weight measures were collected from responses to the survey questions “What is your current height?” in feet and inches and “What is your current weight?” in pounds.

Soda Consumption. The predictor variable was soda consumption, which was collected from self-reported dietary recall based on the question “During the past 4 weeks, how many servings of the following did you have on a typical day…Cans or glasses of soda?” Possible answers included “none”, “1–2”, “3–4”, or “5 + .” Because many twins are initially recruited into the Registry at age 18, this question was taken from the Youth Risk Behavior Surveillance System (YRBSS). Methodology of the YRBSS is described elsewhere [33]. The YRBSS soda question has been evaluated previously; Park et al. [34] report unpublished data demonstrating a significant correlation (r = 0.44) between soda intake from YRBSS and a 24-h dietary recall among high school students, whereas O’Malley et al. [35] report that mean intakes of soda from YRBSS and three, 24-h dietary recalls were not significantly different from each other as well as a significant corrected Pearson’s correlation between methods (r = 0.44; p < 0.001) among 615 high school students.

Covariates. Age, sex, race, annual household income, and education level were collected from responses to survey questions and used as covariates in the statistical analyses. Age at time of survey was calculated based on reported date of birth. Sex was reported as male or female. Race was reported using six standard response options (American Indian or Alaska Native, Black or African-American, Native Hawaiian or Pacific Islander, Asian, White, and Other), which was subsequently re-categorized as white and non-white. There were eight categories of income with the lowest being “less than $20,000” followed by “$20,000–29,999”, “$30,000–39,999”, and so on, ending with the highest category of “$80,000 or more”. Education (highest level of education completed) included five categories: grade 1–11, high school graduate/GED, some college, bachelor’s degree, and graduate/professional degree.

Statistical analysis

BMI data were missing for 141 participants (1.2%), and soda consumption was missing for 95 (0.8%). These observations were omitted from descriptive analyses, but were included in the structural equation modeling analyses using full information maximum likelihood to account for missingness. In addition, 268 participants were missing zygosity information, and were therefore omitted from twin analyses. BMI was expressed as a continuous variable in all statistical analyses. In the structural equation analyses, soda drinking was modeled using a categorical variable model that posits a normally distributed latent continuous liability to soda consumption; latent cutoffs on the distribution determine placement of participants in the four measured categories [36].

Descriptive statistics for subjects were computed and reported for the overall subject sample. Next, we used structural equation modeling [37] to fit a classical twin model to soda consumption and BMI (Fig. 1). The classical twin model uses the variances and co-variances of MZ and DZ twins to partition the variance of phenotypes into three components: the additive effect of genes (A), the environmental effect of being raised in the same family (C), and environmental effects that make siblings raised together different from each other (E). The latter term includes measurement error.

Fig. 1 Univariate twin model. A additive genetic component; C shared environment component; E non-shared environment component Full size image

Partitioning of variability using the classical twin model was not the main goal of our analysis, however. Instead, our goal was to use the twin design to investigate the relationship between soda consumption and BMI between and within pairs of twins. In the absence of random assignment to soda-consumption conditions, an investigator cannot be certain that an observed association between soda consumption and BMI is actually the result of a causal effect. Phenotypic associations of this kind may also occur because genetic predispositions that lead to soda consumption are also associated with higher BMI, or alternatively because shared environmental background (e.g., poverty) predisposes to both soda consumption and high BMI.

Twin designs are especially useful for understanding measured and unmeasured uncontrolled confounds in non-experimental data. If the effect of soda consumption on BMI is truly causal, then one would expect it to be manifest both between twin pairs (pairs consuming more soda on average would have higher average BMI) and within pairs (the member of a pair who consumes more soda would have higher BMI than the co-twin who drinks less). If, however, the association is the result of uncontrolled confounding variables such as genetic background or socioeconomic status, the association will be observed between pairs but not within them, because twin pairs share a rearing environment and either all or half of their genetic background. The twin method cannot fully control for all potential confounds, however, and some uncontrolled variables may vary within pairs as well as between them. We therefore refer to associations that have survived genetically informed tests as “quasi-causal,” to suggest that the twin analysis has strengthened our confidence in the causal underpinning of the association.

The logic of the method and the statistical methods associated with it are described in Turkheimer & Harden [29], and illustrated in Fig. 2. Soda consumption and BMI are both partitioned into ACE components using the classical twin method. In addition, BMI is regressed on phenotypic soda drinking (b P ), as well as on the shared components (b A and b C ) of soda drinking. In the first analysis b A and b C are set to zero, leaving a simple regression of BMI on soda conumption at the individual level; this model is called a phenotypic association model, and tests for the association of soda consumption with BMI without including genetic or shared environmental confounds, but with other covariates as noted in the list above. The model is then re-estimated including estimates of b A and b C , which controls for genetic and shared environmental confounds, respectively, in the estimation of the phenotypic effect. This is referred to as a quasi-causal model. The models were estimated first without and then with the set of covariates listed above.

Fig. 2 Quasi-causal twin model, controlling for covariates. A additive genetic component; C shared environment component; E non-shared environment component; b A and b C amount of residual variance of body mass index attributable to the genetic and shared environment, respectively; b P phenotypic association Full size image

All models were fit in Mplus 7.4 [37] using weighted least squares estimation. The alpha level for testing hypotheses was set to 0.05. Twin-based regression models are generally saturated, so the only source of reduced fit involves incidental issues such as differences between twins arbitrarily assigned as Twin 1 and Twin 2 within pairs. All reported models fit the data closely using standard “goodness of fit” tests.