Recent literature has linked several facets of gut health with the onset of T1D in humans and rodent models4,6,10. Altered intestinal microbiota in connection to T1D has been reported in Finnish7,8,11,12, German13, Italian14, Mexican15, American (Colorado)16 and Turkish17 children. Common findings include increased numbers of Bacteroides species, and deficiency of bacteria that produce short-chain fatty acids (SCFAs)7,8 in cases of T1D or islet autoimmunity (IA)8,11,15,18. Corroborating these findings, decreased levels of SCFA-producing bacteria were found in adults with type 2 diabetes (T2D)19. In addition, increased intestinal permeability14 and decreased microbial diversity12 after IA but before T1D diagnosis have been reported. Studies using the nonobese diabetic (NOD) mouse model have determined immune mechanisms that mediate the protective effects of SCFAs9 and the microbiome-linked sex bias in autoimmunity20. NOD mice fed specialized diets resulting in high bacterial release of the SCFAs acetate and butyrate were almost completely protected from T1D9. A study in a streptozotocin-induced T1D mouse model demonstrated that bacterial products recognized in pancreatic lymph nodes contribute to pathogenesis21.

Even in the absence of immune perturbation, the first few weeks, months and years of life represent a unique human microbial environment that has only recently been detailed22,23. Infants have a markedly different gut microbial profile from adults, characterized by a distinct taxonomic profile, greater proportion of aerobic energy harvest metabolism, and more extreme dynamic change24. These differences gradually fade over the first few years of life, particularly in response to the introduction of solid food, and individual microbial developmental trajectories are influenced by environment, delivery mode, breast (versus formula) feeding, and antibiotics25,26,27. Most studies that address the development of the gut microbiome, both generally and in association with T1D, have used gene analysis of 16S rRNA, which leaves open the question of functional and strain-specific differences that are not easily detected by this technology that might contribute to disease pathogenesis12.

Bridging this gap is one goal of the The Environmental Determinants of Diabetes in the Young (TEDDY) study, a prospective study that aims to identify environmental causes of T1D28. It includes six clinical research centres in the United States (Colorado, Georgia/Florida and Washington) and Europe (Finland, Germany and Sweden), which together have recruited several thousand newborns with a genetic predisposition for T1D or first-degree relative(s) with T1D. This has enabled the TEDDY study to collect a range of biospecimens, including monthly stool samples starting at three months of age, coupled with extensive clinical and personal data such as diet, illnesses, medications and other life experiences. To characterize microbial, environmental, genetic, immunological and additional contributors to the development of T1D, the TEDDY study group further assembled nested case–control studies for IA (n = 418 case–control pairs) and T1D (n = 114)29. Case–control pairs were matched by clinical centre, sex and family history of T1D, which are all known confounding factors for T1D susceptibility and microbiome composition.

Here, we assessed 783 children followed from three months to up to five years of age from six clinical centres in four countries (Finland, Germany, Sweden and the United States) who either progressed to persistent IA or T1D or were matched as controls (Fig. 1a, b, Extended Data Table 1). Stool samples were collected, on average, monthly starting at three months of age and continuing until the clinical end point (IA or T1D). This study focused solely on analysing metagenomic sequencing data (n = 10,903 samples, n = 783 subjects), while a companion paper by Stewart et al.30 interrogated corresponding 16S rRNA amplicon sequencing information.

Fig. 1: More than 10,000 longitudinal gut metagenomes from the TEDDY T1D cohort. We analysed 10,913 metagenomes collected longitudinally from 783 children (415 controls, 267 seroconverters, and 101 diagnosed with T1D) approximately monthly over the first five years of life. a, Subjects were recruited at six clinical centres (Finland, Sweden, Germany, Washington, Georgia and Colorado). Primary end points were seroconversion (defined as persistent confirmed IA) and T1D diagnosis. Additional metadata analysed for subjects and samples included the status of breastfeeding, birth mode, probiotics, antibiotics, formula feeding, and other dietary covariates. b, Overview of stool samples collected and microbiome development as summarized by Shannon’s alpha diversity and stratified by end point. Median number of samples per individual n = 12 (healthy controls n = 10, seroconverters n = 13, T1D cases n = 16). Full size image

We first investigated the taxonomic composition of early gut metagenomes at the species level. Principal coordinate analysis ordination of Bray–Curtis beta diversities showed a strong longitudinal gradient and marked heterogeneity among the earliest samples (Fig. 2a, Extended Data Fig. 1a–k, Supplementary Note 1). Permutational analysis of variance (ANOVA) of Bray–Curtis beta diversities indicated that inter-subject differences explained 35% of microbial taxonomic variation (permutation test, P < 0.001, 1,000 permutations), followed by age at stool sampling at roughly 4% of variance (P < 0.001). Using cross-sectional analysis to test for associations between taxonomic beta diversities and other collected metadata, we found that in addition to subject ID and age, geographical location and breastfeeding had strong and systematic effects on the composition of the microbial community (Supplementary Table 1, Extended Data Fig. 2a–d, Supplementary Note 1). To investigate the stability and individuality of the microbial profiles further, we compared intra- and inter-subject Bray–Curtis beta diversities. The gap between individual stability and similarity within or across clinical centres was largest at the beginning of the sampling period, indicating that the children had particularly dissimilar microbiota during these early months (Fig. 2b, Supplementary Note 1). Finally, we tested microbial alpha diversity (Shannon’s diversity index) of taxonomic profiles for associations with collected metadata, and found that the cessation of breastfeeding had the largest effect (ANOVA, partial η2 = 0.053) in the accrual of alpha diversity in early life (Supplementary Table 2, Extended Data Fig. 3a–e, Supplementary Note 1).

Fig. 2: The early gut microbiome is characterized by early heterogeneity of Bifidobacterium species and individualized accrual of taxa over time. a, Principal coordinate analysis (PCoA) ordination of microbial beta diversities (n = 10,913 samples), measured by Bray–Curtis dissimilarity. Arrows show the weighted averages of key taxonomic groups. b, Microbiota stability, measured by Bray–Curtis (BC) dissimilarity (n = 10,750 samples) in three-month time windows, over two-month increments, stratified into three groups: within subject, within clinical centre, and between clinical centres. Lines show median values per time window. Shaded area denotes the estimated 99% confidence interval. Gut microbial communities were highly individual. c, Influence of antibiotic (Abx) courses on microbial stability, measured by Bray–Curtis dissimilarity over consecutive stool samples (<50 days apart) from the same individual during the first three years of life, and stratified by whether antibiotics were given between the two samples (n = 654 observations with antibiotics, n = 6,734 observations without antibiotics). Curves show locally weighted scatterplot smoothing (LOESS) for the data per category. Shaded areas show permutation-based 95% confidence intervals for the fit. d, Decreases in the most common Bifidobacterium species in connection to oral antibiotic treatments. Fold change was measured between consecutive samples with an antibiotic course between them, given that the species in question was present in the first of the two samples. Sample size per species (n) indicates the number of sample pairs in which the species in question was present in the sample before the antibiotic treatment. Bars show bootstrapped mean log 2 (fold change) (that is, decrease), and error bars denote s.d. (n = 1,000 bootstrap samples). Full size image

We next investigated the effects of antibiotics on the early life microbiome. Courses of oral antibiotics disrupted microbial stability, with a larger effect in the earliest comparisons (Fig. 2c, Extended Data Fig. 4a–f, Extended Data Table 2, Supplementary Note 2). Previous studies have found Bifidobacterium species to be especially vulnerable to antibiotics31,32, leading us to investigate how antibiotic perturbations influenced these common dominant members of the early gut. Comparing microbial relative abundances before and after antibiotics (assuming that the given species was present in the preceding sample), we saw a decrease in the abundances of the Bifidobacterium members B. bifidum, B. pseudocatenulatum, B. adolescentis, B. dentium and B. catenulatum, whereas B. longum and B. breve did not systematically decline owing to antibiotics (Fig. 2d), suggesting that certain Bifidobacterium species are particularly susceptible to out-competition by other community members after depletion by antibiotics. Given their dominance in the typical developing gut microbiota and finely tuned balance of metabolic interactions with breast milk, this finding underscores the importance of approaching antibiotic prescriptions in early childhood with care, especially during breastfeeding.

Accompanying our taxonomic profiling, functional profiling of these metagenomes suggested the development of a consistent microbial functional core during infancy, with a smaller subject-specific variable functional pool (Extended Data Fig. 5a, b, Supplementary Note 3). As in most microbial community studies33, microbial gene families of uncharacterized function made up a substantial fraction of these profiles, averaging roughly 50% based on Gene Ontology34 annotations (Extended Data Fig. 5c) and more than 90% based on more functionally specific MetaCyc pathways (Extended Data Fig. 5d). We observed an increasing longitudinal trend in the proportion of unmapped reads (Extended Data Fig. 5e, Pearson’s r = 0.318, P < 2.2 × 10−16). However, within the reads that mapped to either microbial pangenomes or known protein sequences (the proportion of which decreased with age), we saw an increase in the proportion of reads with MetaCyc annotation, mainly during the first year (Extended Data Fig. 5f, Pearson r = 0.391, P < 2.2 × 10−16). This suggests that although the early life microbiome is relatively well-covered by current microbial reference genomes, less functional and biochemical characterization has been carried out on gene families within these microorganisms, which will thus particularly benefit from future work.

In addition to broadly conserved and subject-specific functions, we identified a range of microbial metabolic enzymes that consistently increased or decreased in abundance over the first year of life, paralleling shifts in community structure and infant diet (Fig. 3, Supplementary Note 3, Supplementary Table 3). For example, the enzyme l-lactate dehydrogenase (1.1.1.27), which is well-characterized in Bifidobacteria for its role in milk fermentation35, was among the most consistently declining enzymes over this period, notably coinciding with the cessation of breastfeeding in many infants (from 73% breastfed at month 3 to 28% at year 1). Conversely, the enzyme transketolase (2.2.1.1), which has been implicated previously36 in the metabolism of fibre, was among the most consistently increasing enzymes, which also coincided with increased incorporation of solid food (a component of 53% of infants’ diets at month 3 versus 100% at year 1). Hence, these notable changes in community functional potential highlight the unique metabolic environment of the early infant gut, and the subsequent transition to a more adult-like gut microbiome that is adapted to variable, fermentative energy sources.

Fig. 3: Consistent changes in enzymatic content of the gut microbiome in early life. We identified enzyme families (level-4 Enzyme Commission (EC) categories) that exhibited the most consistent within-subject changes in total community abundance between the ages of 3 months and 1 year. The top 20 most consistent increases or decreases are presented and stratified according to their top 15 contributing species. Heat map values reflect the mean contribution of each species to each enzyme over samples (n = 733 at 3 months; 675 at 1 year; and 382 at 2 years). Values reflect units of copies per million (CPM) normalized to total read depth (including unmapped reads and reads mapped to gene families lacking EC annotation). Rows (enzymes) and columns (species) are clustered according to Spearman correlation at 3 months; subsequent years are ordered according to clustering at 3 months. Full size image

Combining taxonomic and functional profiles to test for differences between cases and controls, we used linear mixed-effects modelling and identified a relatively small number of individual taxonomic and functional features that were associated with case–control outcome (Supplementary Table 4), most with borderline statistical significance (false discovery rate (FDR) corrected q-values indicated below). We confirmed separation between cases and controls by random forest classifiers (Extended Data Fig. 6a, b, Supplementary Note 4). In the IA case–control cohort, healthy controls contained higher levels of Lactobacillus rhamnosus (q = 0.055), supporting protection against IA by early probiotic supplementation37 (Extended Data Fig. 6c, d, Supplementary Note 5). IA controls also had more Bifidobacterium dentium (q = 0.054), whereas IA cases had on average higher abundance of Streptococcus group mitis/oralis/pneumoniae species (q = 0.11). In T1D case–control comparisons, controls had higher levels of Streptococcus thermophilus (q = 0.078) and Lactococcus lactis (q = 0.094) species, both common in dairy products, whereas cases contained higher levels of species such as Bifidobacterium pseudocatenulatum (q = 0.078), Roseburia hominis (q = 0.11) and Alistipes shahii (q = 0.14). Even though our modelling approach controlled for regional differences in clinical centres, we found additional but often weak associations with outcome in some clinical centres when tested separately (Supplementary Table 4). Finnish IA cases had more Streptococcus group mitis/oralis/pneumoniae species (q = 0.0008), IA controls from Colorado had more Streptococcus thermophilus (q = 0.0059), and Swedish IA cases contained more Bacteroides vulgatus (q = 0.090).

Pathways with the highest statistical significance in case–control comparisons were related to bacterial fermentation (Supplementary Table 4). The superpathway of fermentation (MetaCyc identifier PWY4LZ-257) was increased in controls in the T1D cohort (q = 0.019) and Finnish IA cohort (q = 0.049). SCFAs such as butyrate, acetate and propionate are common by-products of bacterial fermentation, and butyrate and acetate protected NOD mice against T1D9. Consistently, we observed that several bacterial pathways that contribute to the biosynthesis of short-chain fatty acids were increased in healthy controls. Among pathways involved in butyrate production, the degradation of l-arginine, putrescine and 4-aminobutanoate (ARGDEG-PWY) superpathway was increased in T1D controls cohort-wide (q = 0.043), whereas the fermentation of acetyl coenzyme A to butanoate (PWY-5676) was more abundant in the Finnish T1D controls (q = 0.053). The degradation of acetylene (P161-PWY), which contributes to acetate production, was increased in T1D controls cohort-wide (q = 0.14), and the degradation of l-1,2-propanediol (PWY-7013), which is involved in propionate biosynthesis, was higher in the German T1D controls (q = 0.019). These findings support existing evidence for the protective effects of SCFAs in human T1D7,8 and T2D19 cohorts and the NOD mouse model9.

As reflected by the community-level analyses, human milk with its pro- and prebiotic functions is one of the main factors that determine the community composition of the infant gut microbiome. Bifidobacterium longum subsp. infantis is a particularly versatile degrader of human milk oligosaccharide (HMO) that is often found in stool samples collected during breastfeeding38. By following the families representing genes in the B. longum subsp. infantis HMO gene cluster39,40 in our data, we found that an additional 30 bacterial species carried at least one homologue with more than 50% sequence identity to one or more HMO utilization genes (Supplementary Table 5). As expected, many Bifidobacteria carried several homologues, but surprisingly three Enterococcus species (E. casseliflavus, E. faecalis and E. faecium) also carried seven or more homologues (Supplementary Table 5).

To identify strain-level adaptation similar to B. longum subsp. infantis, we further examined whether any of these genes showed contrasting prevalence between samples collected during breastfeeding and after weaning, given that the carrier species itself was present. In total, 41 gene families were observed more often during breastfeeding (Supplementary Table 5, test of proportions, adjusted P < 0.001); most (37 out of 41) were carried by B. longum (Fig. 4), and B. pseudocatenulatum contained four such gene families (Extended Data Fig. 7, Supplementary Table 5). In samples with B. longum, this implicated a clear strain shift after weaning, when fewer B. longum strains carried these genes (Fig. 4). In samples with B. pseudocatenulatum, four gene families showed a similar but less contrasting pattern (Extended Data Fig. 7). Overall, these observations identify new candidate species that contribute to HMO processing or exploitation, and link strain composition to specific driving molecular functions that potentially explain selective sweeps during microbiome development, in this case specifically related to breastfeeding.

Fig. 4: Bifidobacterium longum strains are characterized by HMO gene content and stratified by breastfeeding status. Gene families involved in HMO utilization and showing contrasting presence in B. longum genomes during breastfeeding (n = 1,584 samples) compared to after weaning (n = 3,705 samples). Abundance heat map columns represent stool samples in which the relative abundance of B. longum species was more than 10% (n = 5,289 samples). Rows and columns were ordered by hierarchical clustering using the complete linkage method. As in Fig. 3, values reflect units of CPM and were further divided by relative abundance of B. longum to obtain quantifications that are comparable between samples. UniRef90 identifiers and gene names or families are indicated on the left. Full size image

Despite ample sample size, scrutiny of the study design, and thorough statistical analyses, most of the taxonomic and functional signals we detected in case–control comparisons were modest in effect size and statistical significance. This could be due to several reasons—differences between T1D endotypes, temporally diffuse signals, geographical heterogeneity, or lack of stool samples for the first two months of life —and these should be considered in future investigations (Supplementary Note 6). Furthermore, the data used in these investigations was composed of samples from the genetically predisposed and mostly white, non-Hispanic case–control groups designed into the TEDDY study. Results cannot be guaranteed to reflect the whole TEDDY cohort or child populations in the respective countries.

Future targeted approaches to identify subject-specific connections between the gut microbiota and T1D pathogenesis may be beneficial, particularly given the apparent population-level heterogeneity revealed here. For example, laboratory experiments involving dietary factors that have been associated with the onset of T1D3 may reveal biochemically specific signals that are mediated by the microbiome. Different endotypes of disease, such as differences in the first appearing autoantibody (IAA versus GADA), the number of appearing autoantibodies, the time from seroconversion to T1D diagnosis, genetic host risk alleles and ethnic backgrounds, may be characterized by distinct microbial configurations (Supplementary Note 6). Finally, components of the microbiome that were poorly measured in these data may also have crucial roles: viruses, fungi, microbial transcription or small-molecule biochemistry. By surveying these additional molecular activities by cross-sectional analysis and in more detailed longitudinal populations, this study lays the foundation to identify further gut microbial components that are predictive, protective or potentially causal in T1D risk or pathogenesis.