B. infantis EVC001 reshapes the gut microbiome in breastfed infants

Using shotgun metagenome sequencing, we characterized the taxonomic and antibiotic resistance profiles within the gut microbiome of 60 healthy, term, breastfed infants in Northern California (USA) at 21 days postnatal. Details of the study design and subject characteristics have been previously reported [25, 28] and a summary of the demographic data of the subjects analyzed here is presented in Additional file 1: Table S1. After quality filtering, Illumina sequencing led to a total of 1.6 billion paired-end (PE) reads, of which approximately 3.6% were discarded as human genome sequence contaminants, resulting in an average of 27 million PE reads per sample (Table 1). High-quality, human-filtered reads were subjected to taxonomic profiling following the updated analysis pipeline of the Human Microbiome Project [34] (September, 2017; see Materials and Methods).

Table 1 Overview of recovered metagenomics sequencing results from infants fed EVC001 and controls Full size table

A total of 202 bacterial species belonging to 76 genera, 43 families, 21 orders, 13 classes, and 7 phyla were identified across the samples (Additional file 3: Table S2). There were differences in the taxonomic distribution between the infants fed EVC001 and those who were not. Among the infants fed EVC001 (n = 29), 10 bacterial genera comprised 99% of the community, with the Bifidobacterium genus representing 90% of the total relative abundance of any identified genus out of a total of 55 identified genera (P < 0.0001, Kruskal-Wallis test) (Fig. 1a).

Fig. 1 Taxonomic classification of metagenomic reads for EVC001-fed infants and controls. a Relative abundance of the top bacterial genera identified between the two groups of infants. b Relative abundance of bacterial species belonging to the Bifidobacterium genus identified among groups. c Hierarchical clustering based on a strain-level analysis of Bifidobacterium longum subspecies. Gene family profiles of a subgroup of reference genomes were selected from a global (n = 38) strain analysis. Each column represents the presence or absence of genes in a sample or a reference genome with respect to the total pangenome. All samples from EVC001-fed infants clustered together with B. longum subsp. infantis ATCC 15697 (B. infantis), whereas the samples from infants in the control group clustered separately with other B. longum subspecies (e.g., B. suis, B. longum DJ01A, and B. longum NCC2705). Functional analysis of gene families confirmed that the EVC001 samples were dominated by B. infantis due to the presence of unique genes (e.g., Blon_2348 in B. infantis), while genes present only in B. longum subsp. longum (e.g., araD; araA), were abundant in the communities from control infants. P-values were computed for each gene via Fisher’s exact test according to group Full size image

In the control group (n = 31), 68 genera were identified, of which Bifidobacterium comprised an average of 38% of the total microbiome, whereas there was a greater proportion of other genera than in the EVC001-fed group, particularly Clostridium (P = 0.01, Kruskal-Wallis test; Fig. 1a). Within the Bifidobacterium genus, eight species were identified, of which Bifidobacterium longum was the most abundant, representing 86% of the total identified bacterial species within the EVC001-colonized infants and 19% within the controls (P < 0.0001, Kruskal-Wallis test; Fig. 1b). Other detected bifidobacteria included B. breve and B. bifidum, which accounted for 9.4 and 7%, respectively, in the microbiome of the control infants and considerably less (1.4 and 0.4%, respectively) in the EVC001-colonized group.

To discriminate the B. longum species at the subspecies level and determine the abundance of B. longum subsp. infantis and B. longum subsp. longum to specifically relate changes in the microbiome composition to colonization with B. infantis, we performed a strain-level analysis within the B. longum species using the pangenome gene-families database provided by PanPhlan. This database includes genes from 38 strains of B. longum subspecies (e.g., B. longum subsp. longum, B. longum subsp. infantis, and B. longum subsp. suis). PanPhlan recovered an average of 98.8% of all genes present in Bifidobacterium longum subsp. infantis ATCC 15697 [50] from every sample in the EVC001-fed group, representing 2,449 pangenome gene families. In contrast, 19 infants in the control group lacked detectable reads that mapped to B. longum subspecies genes in their metagenomes. The remaining control samples (n = 12) reported 43% coverage of B. infantis genes, and Bifidobacterium longum subsp. longum NCC2705 displayed the highest gene coverage (79%) across 1,708 pangenome gene families.

Samples and four representative reference genomes were hierarchically clustered based on pair-wise similarities between strains calculated via the Jaccard distance between gene family profiles (Fig. 1c). The resulting heatmap showed that Bifidobacterium longum subsp. infantis was substantially more abundant than other Bifidobacterium longum subspecies in the EVC001 group. Gene loci unique to the B. infantis reference genome and samples from B. infantis EVC001-fed infants revealed key genes that were more abundant, including human milk oligosaccharide (HMO) utilization clusters [50, 51]. These genes were completely absent among the 29 of 31 infants who were not fed B. infantis EVC001, indicating that B. infantis was exceptionally rare (only 6% of infants) unless the infants were fed B. infantis EVC001. Genes unique to B. longum subsp. longum that enable characteristic arabinose consumption (araD and araA) were significantly enriched among infants harboring B. longum subsp. longum and rare among infants colonized by B. infantis EVC001. Together, these findings suggest that B. infantis was the dominant B. longum subspecies among infants fed B. longum subsp. infantis EVC001, and that B. infantis was exceptionally rare among these infants unless the infants were fed the strain during the clinical trial. The high level of B. infantis persistence previously reported in infants fed EVC001 in conjunction with the pangenome analysis here suggests stable colonization of the infants by this strain. This is in agreement with other models examining the efficient and durable colonization of host-associated gut microbes in their coevolved host [52, 53] and our prior longitudinal analysis of fecal samples by 16S rDNA sequencing [25]. In terms of microbial diversity, we have previously reported that there was no difference in community richness (alpha diversity) measured via Shannon index, but there are differences in the relative abundance of taxa and community stability (beta diversity) as assessed by UniFrac distance and Jaccard index [25].

Colonization by EVC001 is associated with a reduced ARG burden

We identified a total of 599,631 infant gut microbial genes from the shotgun sequencing data, of which 80,925 were unique to 29 infants fed B. infantis EVC001, and 313,683 microbial genes were unique to samples from 31 infants not fed B. infantis EVC001. Both groups shared a total of 205,023 microbial genes. The metagenomes were screened for ARGs using a BLASTX type search against the curated Comprehensive Antibiotic Resistance Database (CARD). After quality filtering the BLAST results (see Materials and Methods), a total of 652 ARGs were identified (Additional file 4: Table S3). The EVC001-fed group reported an average of 0.01% of ARGs among the total microbial genes (min = 0.001%; max = 0.18%; SEM = 0.006%), with 285 different ARGs (Fig. 2, a), of which 33 were found only in the EVC001 group at very low percentages (< 0.05%). Among the infants not fed B. infantis EVC001, these ARGs accounted, on average, for 0.09% of the total metagenomic reads (min = 0.004%; max = 0.24%; SEM = 0.01) with 612 of the different ARGs that were identified. Of these 612 ARGs, 360 uniquely belonged to this group. Thus, infants fed EVC001 had, on average, 90% fewer ARGs in their microbiome (P < 0.0001, Mann-Whitney test).

Fig. 2 Relative abundance of the total resistome profile in each metagenome sample. a Relative abundance of antibiotic resistance genes (ARGs) compared with the overall metagenome for each sample. Each point represents a sample resistome (control, n = 31; EVC001-fed, n = 29). Box plots denote the interquartile range (IQR), with horizontal lines representing the 25th percentile, median, and 75th percentiles. The whiskers represent the lowest and highest values within 1.5 times the IQR from the first and third quartiles, respectively. The asterisks on the top indicate significant P-values (Mann-Whitney test). b Relative abundance of ARGs according to their taxonomic identification. The shade of color represents genera belonging to the same bacterial class. The asterisks on the top indicate significant P-values (Kruskal-Wallis test) Full size image

To compare the microbial taxonomic affiliation of ARGs, the 652 ARGs identified among the optimal BLAST hits were assigned to different taxa according to the NCBI taxonomy guidelines coupled with the Lowest Common Ancestor (LCA) method in MEGAN [44]. A total of 41 bacterial genera were taxonomically assigned to the 652 ARGs, of which Escherichia, Staphylococcus, Bacteroides, and Clostridioides were associated with the majority of the ARGs (68.9, 5, 4, and 2.6%, respectively). Considering the taxonomic content within the resistome, the metagenomes from control infants had 17 bacterial genera with a relative abundance > 0.001%, with Escherichia-ARGs accounting for about 0.054% of the total metagenome (Fig. 2b). In the EVC001-fed group, only 12 bacterial genera had a relative abundance of associated ARGs > 0.001%. Escherichia was also the genus that carried the majority of ARGs but contributed significantly less to the overall metagenome (0.003%) compared with the controls (P = 0.001, Kruskal-Wallis test; Fig. 2b).

EVC001 significantly decreases the abundance of key antibiotic-resistant genes

Among the ARGs uniquely identified in the samples from infants in the control group, three were present in a relative abundance greater than 0.1% and were associated with the Clostridium genus. Specifically, tetA(P) and tetB(P), which are ARGs found on the same operon, were identified. tetA(P) is an inner membrane tetracycline efflux protein and tetB(P) is a ribosomal protection protein, both of which confer resistance to tetracycline [54, 55]. mprF was found only in the samples from infants in the control group, and acts by negatively charging phosphatidylglycerol on the bacterial membrane and confers resistance to antibiotic cationic peptides that disrupt the cell membrane, including host-produced defensins [56].

After cross-sample normalization, 38 ARGs were found to significantly differ between the two groups (P < 0.01, Kruskal-Wallis test). All 38 ARGs were significantly lower in the EVC001-fed group compared to the controls (Fig. 3a). Notably, none of the ARGs were significantly higher in the samples from the EVC001-fed group compared to the control group (P > 0.05, Kruskal-Wallis test). Genes enriched in the metagenome of infants in the control group were found to confer resistance to beta-lactams, fluoroquinolones, and macrolides, and 12 genes conferred resistance to multiple drug classes.

Fig. 3 Comparison of the most significant antibiotic resistance gene types. a Relative abundance of the most significantly different antibiotic resistance genes (ARGs) identified among EVC001-fed infants and controls (P < 0.02; Bonferroni). Percentages are relative to the overall metagenome. These ARGs confer resistance to different drug classes, including beta-lactams, fluoroquinolones, and macrolides. The ARGs are grouped by gene name, followed by CARD identification entry (ARO). The colored bars represent respective drug class to which the ARG is known to confer resistance to. b Heatmap showing a hierarchical cluster analysis of the total ARGs identified (n = 652). Two groups were identified, one characterized by a lower-ARG carriage, containing most of the samples from infants fed EVC001 and one characterized by a higher overall carriage, containing most samples from infants in the control group. Genes clustered based on similar biological mechanisms implicated in drug resistance (see Results). P-values on the bar were computed using a Kruskal-Wallis test normalized with a Bonferroni correction Full size image

Hierarchical clustering of the samples and total ARGs (n = 652) using the complete-linkage method generated two main clusters of samples (Fig. 3b). The majority of the samples from the EVC001-fed infants were clustered together within the lower-ARGs abundance panel. Row clustering by ARGs resulted in two groups. The most abundant genes clustered together and were annotated as being directly related to mechanisms of antimicrobial resistance. In particular, the proteins encoded by mdtB and mdtC form a heteromultimer complex that results in a multidrug transporter [57]. AcrD is an aminoglycoside efflux pump and its expression is regulated by baeR and cpxAR, which were also identified among the significant ARGs and best characterized in E. coli [58]. We also identified AcrB and TolC, which form the multidrug efflux complex, AcrA-AcrB-TolC, that confers multidrug resistance [59]. Moreover, RosA and RosB were significantly more abundant among infants not fed EVC001; these genes form an efflux pump/potassium antiporter system (RosAB) [60]. Three genes belonging to the multidrug efflux system, EmrA-EmrB-TolC, first identified in E. coli [61], were also significantly more abundant in the samples from control infants. In this complex, EmrB is an electrochemical-gradient powered transporter, whereas EmrA is the linker, and TolC is the outer membrane channel [62]. The complex confers resistance to fluoroquinolones, nalidixic acid, and thiolactomycin.

The majority (76%) of the significantly different ARGs were taxonomically assigned to bacteria belonging to the Enterobacteriaceae family (e.g., Escherichia coli) and its abundance was proportional to the presence of ARGs (R = 0.58; P < 0.00001, Pearson) (Fig. 3b). The absolute abundance (determined by qPCR) of Enterobacteriaceae was significantly lower (P < 0.0001) in EVC001-colonized infants (Fig. 4). Other ARGs reported multiple taxonomic assignments within the Proteobacteria phylum. According to NCBI’s taxonomic assignment and the CARD database, they could originate from any one of multiple, closely related species. These included: the efflux pump acrD; the MdtG protein, which appears to be a member of the major facilitator superfamily of transporters, that confers resistance to fosfomycin and deoxycholate [63]; BaeR, a response-regulator conferring multidrug resistance [64]; and marA, a global activator protein overexpressed in the presence of different antibiotic classes [65]. A global statistical analysis of ARGs by treatment group is reported in Additional file 5: Table S6.

Fig. 4 Quantification of Enterobacteriaceae family by group-specific qPCR. The data are represented as Log 10 CFU per μg of genomic DNA extracted from stool samples. Data in boxplots show the median, first, and third quartiles (P < 0.0001, Mann-Whitney Test) Full size image

Colonization by EVC001 decreases the total abundance and composition of ARGs

To compare the overall impact of EVC001 colonization on the diversity of ARGs, the alpha-diversity (e.g., number of unique ARGs) within each sample was compared using rarefaction curves. Notably, the diversity of the ARGs was independent from the number of sequences per sample (Fig. 5a). Overall, the EVC001-fed infants had half as many unique ARGs as the control infants (P = 0.001, t-test). The global resistome differences among samples and the effect-size of EVC001 colonization on the overall diversity of the two study groups were assessed. A Bray-Curtis dissimilarity matrix transposed into a principal coordinate analysis (PCoA) showed that samples from the EVC001-colonized group were clustered closely together compared to the control, which had a wider distribution (Fig. 5b; P = 0.001, F-test). This indicates that samples from the EVC001-colonized infants had a less abundant and less diverse resistome compared with the control group samples. Colonization with EVC001 contributed to a greater than a 30% reduction in global AR diversity in the infant gut microbiome than in the gut of the controls (R2 = 0.31, P = 0.001, adonis). Finally, there was no statistically significant difference in the abundance of ARGs detected in the control group, whether babies were born via C-section or vaginally (P = 0.30; Mann–Whitney) or whether infants were exposed to intrapartum antibiotics (P = 0.5; Mann–Whitney). Conversely, infants fed B. infantis EVC001 had significantly lower ARG abundance, independently from delivery mode, antibiotic exposure or any other clinical variable.

Fig. 5 Diversity analyses of infant resistomes according to B. infantis EVC001 colonization. a Rarefaction curves showing the number of unique antibiotic resistance genes (ARGs) identified in relation to the increasing number of sequences. Both EVC001 and the control group presented similar curve trends, suggesting that the sequencing depth is not associated with the diversity of antibiotic resistance. P-values were computed with a nonparametric two-sample t-test using Monte Carlo permutations (n = 999). b Global resistome profiles computed via a principal coordinate analysis (PCoA) based on a Bray-Curtis dissimilarity matrix. The EVC001-fed samples clustered closely, indicating that they shared a similar resistome compared to the controls, which had a more dispersed distribution. The effect of B. infantis EVC001 colonization by itself accounted for 31% of the total explained variation (adonis). The P-value was computed using F-tests based on the sequential sums of squares from permutations of the raw data Full size image

In vitro validation of in silico predicted ARGs

To validate the ARG presence in fecal samples, PCR primer pairs were designed to target seven of the most abundant ARGs in the resistome of the control infants. Amplicons were obtained in at least half of the analyzed fecal samples, with the exception of the primer pair targeting the mfd gene, which did not amplify from any sample. A nucleotide sequence analysis of the generated amplicons confirmed the sequence identity, with a nucleotide identity of > 70% to the open reading frame (ORF) of the predicted target gene. The nucleotide sequence analysis revealed a high homology (85–99%) with the genomic regions annotated to encode the expected functions in gut bacteria, and the predicted amino acid sequences contained highly conserved structural and functional domains in the corresponding encoded proteins (Additional file 6: Table S4).

To confirm the presence of full length, functional ARGs and the relationship of these ARGs to individual resistance phenotypes at the strain level, bacteria isolated from EMB agar were obtained from the fecal samples of four representative control infants. EMB was used to select for lactose-fermenting coliforms (e.g., E. coli) which had been identified by shotgun metagenome sequencing to harbor the most ARGs across individuals in both groups. Whole-genome sequencing of 12 isolates was performed on a MinION sequencer and assembly led to an average coverage of 18× (min 5.4; max 40) (Table 2). Taxonomic identification was confirmed via BLASTN against the NCBI nucleotide database (https://github.com/rrwick/Porechop), revealing three isolates classified as Raoultella planticola and the remaining nine as Escherichia coli. CARD protein sequence collection was used as a query against the 12 assembled isolates via TBLASTN. The presence of the 40 most differentially abundant ARGs identified via shotgun metagenomics (P < 0.02; Bonferroni) was confirmed in all or some of the 12 genomes (average identity > 93%), except for rosB (ARO:3003049), and rob (ARO:3004108). The latter genes were absent from the E. coli and R. planticola genomes but were predicted in other taxa not isolated here (Table 2).

Table 2 Summary of assembly statistics and ARGs assignments for 12 bacterial isolates obtained from four control subjects Full size table

The minimum inhibitory concentration (MIC) to ampicillin, cefepime, cefotaxime, cefazolin, tetracycline, and gentamicin was determined for these isolates. With the exception of three isolates obtained from the same infant (7174), all of the isolates exhibited ampicillin resistance. Among the multidrug-resistance isolates, resistance to ampicillin, cefazolin, and tetracycline was the most common. No resistance to gentamicin was detected (Table 3).