Study population

We collected microbiota information on 236 mother-child dyads recruited from the 279 families involved in the INSIGHT study53 (Supplementary Fig. S1). These dyads included full-term singletons born to primiparous mothers in Central Pennsylvania, and were predominantly of European descent64. The INSIGHT study collected clinical, anthropometric, demographic, and behavioral variables on children and mothers53 (Table 1). Parents completed questionnaires reporting children’s dietary intake and exposure to medications. Children’s weight and length were measured at birth, 3–4 weeks, 16 weeks, 28 weeks, 40 weeks, 1 year, and 2 years. For children attending visits before two years, length was measured using a recumbent length board (Shorr Productions). After two years, standing height was measured with a stadiometer (Seca 216).

Microbiota sample collection

Buccal samples were collected by research staff of the Penn State Hershey Pediatric Clinical Research Office at the child’s two-year clinical research center visit. Information was mailed to participants prior to the visit instructing them to not eat or drink anything, not use tobacco products, and not brush their teeth/use mouthwash for two hours prior to buccal swab collection. Ten sterile cotton swabs were each rubbed for 20 seconds against the inside cheeks of children or mothers. The cotton swabs were placed in tubes containing slagboom buffer77. Samples were stored in the Pediatric Clinical Research Office and transported from the Hershey Medical Campus to the University Park Campus where they were processed.

Within two days before the two-year visit, stool samples were collected by parents in stool collection tubes, wrapped in freezer packs, and frozen immediately in the home freezer. They were then brought, packed on ice, to the clinical research site by the parents and stored at −20 °C. Samples were finally transported in coolers on ice from the Hershey Medical Campus to the University Park Campus where they were stored at −80 °C until processing.

Sample DNA extraction, library preparation, and DNA sequencing

Genomic DNA (gDNA) was extracted from samples using the MoBio PowerSoil DNA isolation kit (Qiagen). The manufacturer’s directions were followed with modifications implemented based on the protocol established by the Human Microbiome Project. These included two heating steps (65 °C and 90 °C for 10 minutes each) prior to bead beating. To control for contamination in the DNA from this stage, a blank ‘sample’ (containing only MoBio Bead Buffer) was subjected to the entire DNA extraction protocol and library preparation. Contamination was never detected via gel electrophoresis, Qubit, or Bioanalyzer.

Library preparation for sequencing of the 16S rRNA gene followed the Illumina protocol ‘16S Metagenomic Sequencing Library Preparation’ (Illumina part# 15044223 Rev.B; https://support.illumina.com/downloads/16s_metagenomic_sequencing_library_preparation.html). Briefly, the variable regions 3 and 4 of the 16S rRNA gene were amplified using PCR (16S_ampliconPCR_For: TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG and 16S_ampliconPCR_Rev: GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC, PCR conditions are given in the Illumina protocol). The PCR product was isolated using a magnetic bead procedure (Agencourt AMPure XP, Beckman Coulter). Subsequently, Illumina Nextera XT indexes were added and an additional magnetic bead purification was performed. Each library was quantified using a fluorometer (Qubit, ThermoFisher Scientific) and analyzed for correct size on a BioAnalyzer (Agilent). As a positive control, a synthetic mock community DNA pool78 was amplified and sequenced alongside the experimental samples. To control for contamination in the amplification or library preparation steps, a blank sample was added to all of the library preparation steps. Contamination was never detected via gel electrophoresis, Qubit, or Bioanalyzer.

An equimolar pool of 48 libraries with a 25% PhiX spike-in was sequenced on an Illumina MiSeq (v3 chemistry, 2 × 300 reads). Demultiplexing was performed using the Illumina software. FASTQ files were retrieved to be used in downstream analyses.

Sequence analysis

Sequences were first examined using FastQC79 and a multi-sample report was generated for textual and graphical views using Galaxy tools80 and MultiQC81, respectively. FASTQ manipulation filters80 were then applied to remove low-quality sequences. Sequences were trimmed using a sliding window approach. To both the 5′ and 3′ ends of the sequences, we applied a cutoff of minimal mean PHRED score of 20 within a window of 5 bases, step size of 1. Quality was reassessed after trimming.

We then merged/joined our forward and reverse paired-end reads into a single contig using a two-step process. First, we used fastq-join from the ea-utils package82,83,84 to merge overlapping forward and reverse reads into a single read. This process aligns read pairs and merges overlapping regions based upon user-specified parameters of mismatch percentage and minimum alignment length; we utilized 8% and 6 bases, respectively. To prevent the loss of non-mergeable reads, we performed a second joining operation, using the FASTQ Joiner tool85,86 and inserted a string of 5 ambiguous nucleotides (“NNNNN”) between the pairs. We next removed chimeric sequences. We used VSearch87 with the uchime_denovo algorithm to create a list of non-chimeric sequences.

Non-chimeric reads were classified individually into taxa. We utilized Kraken88 with a customized database containing only 16S rRNA gene sequences, based on GreenGenes89. Once Kraken had assigned taxa-kmer counts to individual reads, we utilized a custom abundance reporting tool (https://github.com/blankenberg/Kraken-Taxonomy-Report) to report abundances across samples at specified ranks, along with a phylogenetic tree that was pruned to contain the terminal nodes that are present in at least one of the samples and the connected internal nodes.

Statistical analyses

Diversity measures and tests

Rarefaction and α-diversity calculations were performed using Vegan56. Rarefaction was used to normalize the read count across samples - which were all rarefied to 100,000 reads. To compute summary measures of diversity for each microbiota sample, we used the Inverse Simpson α-diversity formula on the phylum level abundance counts (Shannon Diversity Index, Simpson, and Inverse Simpson measures were all highly correlated; see Supplementary Table S9), as computed in Vegan56. The Firmicutes-to-Bacteroidetes (F:B) ratio was calculated separately for gut and oral microbiota of each child, and for oral microbiota of each mother, by taking the total count of the number of reads assigned to the Firmicutes phylum divided by the total count of the number of reads assigned to the Bacteroidetes phylum. To test for differences in the proportion of each bacterial phyla between microbiota types (child gut, child oral, mom oral) we used prop.test from the base stats package in R (calculates Pearson’s chi-squared test). For this test we took the average of all individual proportions (calculated as the proportion of reads assigned to that phyla) multiplied by the number of subjects to obtain the “probability of successes”, and to calculate “failures” we took the number of individuals less the “probability of successes”. We compared all possible combinations of the microbiota types (child gut vs. child oral, child gut vs. mom oral, child oral vs. mom oral, and child gut vs. child oral vs. mom oral) and reported Bonferroni-corrected p-values for these 24 tests. Non-metric Multidimensional Scaling analysis, using the Bray-Curtis dissimilarity matrix as the β- diversity measure, was also performed using Vegan56. See Supplementary Fig. S6 for a detailed schematic of the computational workflow.

Weight outcome and calculation of the the Conditional Weight Gain score

Per the American Academy of Pediatrics90, growth measured by weight-for-length is the recommended practice in the United States for children less than two years of age, and BMI becomes the recommended/standard outcome for children who are older than two years. Since we are characterizing child growth from birth to two years, we chose the weight-for-length ratio as our outcome59. At each time point when weight and length were collected (see above), we computed the ratio of weight to length (later referred to as growth index). Additionally age- and gender-specific weight- and length-for-age z-scores (WAZ and LAZ, respectively) were determined using the World Health Organization gender-specific child growth standards91.

Conditional weight gain (CWG) z-scores were then computed for each child using age- and gender-adjusted anthropometrics at birth and six months63,64. Briefly, CWG z-scores were computed as standardized residuals from a linear regression of WAZ at six months on WAZ at birth, using LAZ and precise age at six months visit as covariates. The CWG z-score, therefore, represented the variability of child weight gain explained neither by length at birth and six months nor by gender. By construction, the CWG z-scores had mean 0 and standard deviation of 1. Moreover, in practice these scores were approximately normally distributed. Positive CWG z-scores indicated weight gain that was above the average weight gain, i.e. rapid infant weight gain.

Construction of growth curves

The growth indexes calculated above were analyzed longitudinally using tools from Functional Data Analysis (FDA)92 alongside the fda package in R. In particular, individual growth curves were constructed using PACE60, a procedure that pools information across subjects to more accurately assemble the curves (Supplementary Fig. S3). The PACE software is freely available for R, and we used it with its default settings. After the curves were assembled, we represented them using 102 cubic spline functions with evenly spaced knots, so that subsequent FDA methods could be applied. Growth curves were also temporally aligned, using the register.fd function in R, before further analyses were conducted (Fig. 1b).

Association between growth curves and microbiota summary measures

We assessed the association between growth curves and α-diversity, as well as the F:B ratio, of gut and oral microbiota fitting Function-on-Scalar Linear Models93. These were six low-dimensional functional regressions (the growth curve response, i.e. the function, was regressed on a single scalar predictor: α-diversity or F:B ratio), and were carried out in R using code we wrote based on our previous publication93. The outcome of these functional regressions were estimated regression coefficient curves, which we obtained using a penalized least squares approach imposing a penalty on the second derivative of the parameter. In each case, the smoothing parameter was set at 10,000, but results were robust against this choice (especially the p-values; see below). The additional smoothing enforced by the penalization was especially useful in terms of producing interpretable estimates. The shape of each estimated coefficient curve indicates how the relationship evolves along the time dimension, with amplitude (distance from zero) and sign (positive or negative) representing strength and direction. Significance of each regression was determined as described in62, based on three measures which utilize different types of weighted quadratic forms (note that these were not separate statistical tests, only different ways of determining significance in the same regression). The first, denoted as L2, employs a simple L2 norm (squared integral) of the parameter estimate. The second, denoted as PCA, uses principal components to reduce the dimension of the parameter and then applies a Wald-type test. The last, denoted as Choi, incorporates a weighting scheme into the PCA test so that more principal components can be included, resulting in a test that is in between the PCA and L2 tests, and thus we believe is the preferred measure. We reported results from Choi in the main text, but the results for all three measures of significance can be found in the Supplement. Code is available at: (https://github.com/mreimherr/Insight_Microbiome_Simulation.git).

Testing potential co-factors

We tested a wide variety of maternal, health, and behavioral co-factors (gathered by the INSIGHT study)53 − a total of 17 (see below) − for effects on the children’s microbiota, and the relationships of the microbiota with children’s weight gain. Maternal gestational weight gain, diabetes during pregnancy, mode of delivery, and gender of the child (4 co-factors) were obtained from electronic medical health records. Maternal smoking during pregnancy, family income, child exposure to antibiotics or acid reducing medications (4 co-factors), were obtained from maternal recall surveys. Intervention group was determined by the INSIGHT study (1 co-factor). First, considering these (non-diet) co-factors one at a time, we tested whether gut and oral microbiota diversity and (separately) F:B ratio differed between their categories using non-parametric Mann-Whitney U and Kruskal-Wallis tests implemented in R (Supplementary Table S2). Specifically, we tested the hypotheses that children with rapid infant weight gain would have (1) a lower diversity and (2) a greater F:B ratio than children without rapid infant weight gain.

We also obtained information on a child’s diet at two years as reported by parents using an Infant Food Frequency Questionnaire. This questionnaire included 121 food and drink items. Parents reported how often their child had each item in the past week (0, 1, 2–3, or 4–6 times per week, and 1, 2, 3, 4–5, 6 or more times per day). These data were then distilled into a 10-item summary, each item with the corresponding number of weekly consumptions. The items included: sugar-sweetened beverages, milk, dairy (excluding milk), fruit, vegetables, vegetables excluding potatoes, snacks, sweets, meats, and fried foods. We looked at correlations between the consumption frequencies to identify whether any of the items could be removed (we used the R package Rstats and the graphical package corrplot)94. A correlation cut off of 0.7 was employed, eliminating food categories ‘milk’ and ‘vegetables excluding potatoes’ that were highly correlated with other variables. As a result, eight diet categories were retained (8 co-factors). Consumption frequencies for the remaining eight food categories were then used as predictors in four multiple linear regressions for the microbiota summary measures (four regressions in all, for diversity and F:B ratio in children’s oral and gut microbiota; Supplementary Table S3). This was performed using the lm function of the Rstats package.

Next, we used the bestglm package in R95 to select the best subset of predictors for a multiple linear regression. This was performed again for four regressions (diversity and F:B ratio in children’s oral and gut microbiota), this time considering 17 covariates at our disposal, as listed in Tables S2 and S3. The covariates selected by bestglm were then used as predictors in restricted linear regression fits (results are reported in Table 2 for gut microbiota α-diversity and F:B ratio, for which only diet-related covariates were retained; no covariates were retained by bestglm for the oral microbiota summary measures).

Given the prominent effects of diet-related covariates, especially on the gut microbiota (Tables S3 and 2), we also assessed their relationship with weight gain. Specifically, we ran a multiple functional regression for children growth curves (between birth and age two) against food consumption frequencies at age two (Supplementary Table S4; we used the same eight food categories considered in Supplementary Table S3). Finally, we assessed whether diet-related covariates could modulate the relationship between weight gain and the gut microbiota. Specifically, we repeated the two functional regressions of children growth curves on gut α-diversity (Supplementary Table S5) and gut F:B ratio (Supplementary Table S6), in each case adding the diet-related covariates retained by bestglm (Table 2). Similar to what we did for the functional regressions for growth curves against microbiota summary measures (see our explanation at the end of the section on ‘Association between growth curves and microbiota summary measures’ of the Methods), statistical significance for these functional regressions was determined with three measures: L2, PCA, and Choi62. We reported Choi results in the main text, but retained PCA and L2 results in the supplement.

Identification of influential taxonomic groups

To mitigate sparseness and collinearity in our bacterial abundance data, we merged low-abundance or highly correlated genus-level abundance counts into abundances of taxonomic groups. We implemented a two-stage procedure which utilized phylogenetic relationships – merging abundances only for neighboring nodes along the phylogenetic tree created from the Kraken taxonomic report tool in Galaxy88. First, moving upwards along the tree, we merged a node with its neighbor if its abundance was less than five counts in more than 90% of the samples in the data set. When such a merger occurred, the counts from the two nodes were summed. Next, considering the merged abundances produced by the first stage and moving upwards along the tree, we merged neighboring nodes if their abundances showed a correlation in excess of 0.7 across the samples in the data set. When such a merger occurred the counts were averaged. Notably, this procedure allowed us to tailor the level of resolution of our analyses to the data: branches of the phylogenetic tree where genera were scarcely observed or highly correlated were lumped together, while finer resolution was maintained for branches where genera were more abundant and diversified in their behavior across the samples. The procedure was applied, separately, to abundance data from the three microbiota samples (child gut, child oral and mother oral). Supplementary Table S7 contains a complete list of all taxonomic groups obtained in each microbiota sample.

To identify taxonomic groups with the strongest associations with weight gain, we considered the merged taxonomic group abundances as scalar predictors in functional regressions for growth curves using FLAME (Functional Linear Adaptive Mixed Estimation)66. Separately, we considered the merged taxonomic group abundances as features in Linear Discriminant Analysis for rapid vs. non-rapid infant weight gain (based on CWG scores) using LEfSe (Linear Discriminant Analysis Effect Size)67. These were high-dimensional analyses (each comprised a number of predictors corresponding to the number of taxonomic groups found in a given microbiota sample). The FLAME functional regressions were carried out using methods and R code from66. Estimates and p-values were computed using standard methods from FDA (see github repository link below for code) as described in92. FLAME simultaneously selected important predictors and captured their effects as estimated regression coefficient curves. It can be thought of as a generalization of the adaptive LASSO96 to functional/longitudinal outcomes. In particular, a Sobolev kernel was used in the penalty to produce smooth estimates while also carrying out variable selection. The tuning parameters were chosen via cross validation, as discussed in66,96. Code and examples for carrying out the two-stage abundance merger procedure and all of the FDA methods utilized here can be found at https://github.com/mreimherr/Insight_Microbiome_Simulation.git. Standalone code for FLAME is available at: http://personal.psu.edu/~mlr36/codes.html.

Data Sharing

Raw microbiota reads were deposited in SRA under BioProject number PRJNA420339. Phenotype information was deposited under dbGaP Study number phs001498.v1.p1. All code used is either already public (http://personal.psu.edu/~mlr36/codes.html) or available at GitHub (https://github.com/mreimherr/Insight_Microbiome_Simulation.git). 16S rRNA gene analysis pipeline tools and pipeline are available in the Galaxy platform (usegalaxy.org). The three relevant Galaxy workflows are https://usegalaxy.org/u/sjcarnahancraig/w/16s-qc-3, https://usegalaxy.org/u/sjcarnahancraig/w/kraken-classification, https://usegalaxy.org/u/sjcarnahancraig/w/vegan-rarefacation−alpha-diversity.

Ethics statement

This study was approved by Penn State University Institutional Review Board (PRAMS034493EP) and all methods were performed in accordance with all relevant guidelines and regulations. Informed consent was received from mothers prior to collection of biological samples and phenotypic, demographic, health, and diet information.