The human gut microbiota is an important metabolic organ, yet little is known about how its individual species interact, establish dominant positions, and respond to changes in environmental factors such as diet. In this study, gnotobiotic mice were colonized with an artificial microbiota comprising 12 sequenced human gut bacterial species and fed oscillating diets of disparate composition. Rapid, reproducible, and reversible changes in the structure of this assemblage were observed. Time-series microbial RNA-Seq analyses revealed staggered functional responses to diet shifts throughout the assemblage that were heavily focused on carbohydrate and amino acid metabolism. High-resolution shotgun metaproteomics confirmed many of these responses at a protein level. One member, Bacteroides cellulosilyticus WH2, proved exceptionally fit regardless of diet. Its genome encoded more carbohydrate active enzymes than any previously sequenced member of the Bacteroidetes. Transcriptional profiling indicated that B. cellulosilyticus WH2 is an adaptive forager that tailors its versatile carbohydrate utilization strategy to available dietary polysaccharides, with a strong emphasis on plant-derived xylans abundant in dietary staples like cereal grains. Two highly expressed, diet-specific polysaccharide utilization loci (PULs) in B. cellulosilyticus WH2 were identified, one with characteristics of xylan utilization systems. Introduction of a B. cellulosilyticus WH2 library comprising >90,000 isogenic transposon mutants into gnotobiotic mice, along with the other artificial community members, confirmed that these loci represent critical diet-specific fitness determinants. Carbohydrates that trigger dramatic increases in expression of these two loci and many of the organism's 111 other predicted PULs were identified by RNA-Seq during in vitro growth on 31 distinct carbohydrate substrates, allowing us to better interpret in vivo RNA-Seq and proteomics data. These results offer insight into how gut microbes adapt to dietary perturbations at both a community level and from the perspective of a well-adapted symbiont with exceptional saccharolytic capabilities, and illustrate the value of artificial communities.

Our intestines are populated by an almost unimaginably large number of microbial cells, most of which are bacteria. This species assemblage operates as a microbial metabolic organ, performing myriad tasks that contribute to our well-being, including processing components of our diet. The way this incredible machine assembles itself and operates remains mysterious. One approach to understanding its properties is to create artificial communities composed of a limited number of sequenced human gut bacterial species and to install them in the guts of germ-free mice that are then fed different diets. In this report, we adopt this approach. We describe the genome sequence of a new gut bacterial isolate, Bacteroides cellulosilyticus WH2, which is equipped with an unprecedented number of carbohydrate active enzymes. Deploying four different “omics” technologies, we characterize the response to diet, the relative stability, and the temporal dynamics of a 12-species artificial bacterial assemblage (including B. cellulosilyticus WH2) implanted in germ-free mouse guts. We also combine high-throughput substrate utilization screens and RNA-Seq to generate reference data analogous to a “Rosetta stone” in order to decipher what types of carbohydrates B. cellulosilyticus encounters and uses within the gut, and how it interacts with other organisms that have similar and/or distinct “professions.” This work sets the stage for future ecological and metabolic studies of more complex assemblages that more fully emulate the properties of our native gut communities.

Funding: The majority of this work was supported by grants from the NIH (DK30292, DK70977). Other sources of support included the Crohn's and Colitis Foundation of America, the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement no. 322820, and an ORNL Laboratory Director's Exploratory Seed Money grant (for metaproteomics). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Here, we adopt a multifaceted approach to study an artificial community in gnotobiotic mice fed changing diets in order to better understand (i) the process by which such a community reconfigures itself structurally in response to changes in host diet; (ii) how aggregate community function, as judged by the metatranscriptome and metaproteome, is impacted when host diet is altered; (iii) how the metabolic strategies of its individual component microbes change, if at all, when the nutrient milieu is dramatically altered, with an emphasis on one prominent but understudied member of the human gut Bacteroides; and (iv) whether a microbe's metabolic versatility/flexibility correlates with competitive advantage in an assemblage containing related and unrelated species.

Culture-independent surveys of the healthy adult gut microbiota consistently conclude that it is composed primarily of members of two bacterial phyla, the Bacteroidetes and Firmicutes [16] – [21] . The dominance of these two bacterial phyla suggests that their representatives in the human gut are exquisitely adapted to its dynamic conditions, which include a constantly evolving nutrient environment. Members of the genus Bacteroides are known to be adept at utilizing both plant- and host-derived polysaccharides [22] . Comparisons of available Bacteroides genomes with those from other gut species indicate that the former are enriched in genes involved in the acquisition and metabolism of various glycans, including glycoside hydrolases (GHs) and polysaccharide lyases (PLs), as well as linked environmental sensors that control their expression (e.g., hybrid two-component systems, extracytoplasmic function (ECF) sigma factors and anti-sigma factors). Many of these genes are organized into polysaccharide utilization loci (PULs) that are distributed throughout the genome [23] , [24] . Recent studies have focused on better understanding the evolution, specificity, and regulation of PULs in the genomes of species like Bacteroides thetaiotaomicron and Bacteroides ovatus [25] , [26] . Little is known, however, about the metabolic strategies adopted by multiple competing species in more complex communities, how dietary changes lead to reconfigurations in community structure through changes in individual species, or whether dietary context influences which genes dominant species rely on to remain competitive with other microbes, including those genes that are components of PULs.

The absence of a complete catalog of the microbial strains and associated genome sequences that comprise a given microbiota complicates efforts to describe how particular dietary substrates influence individual taxa, how taxa cooperate/compete to utilize nutrients, and how these many interactions in aggregate lead to emergent host phenotypes. Gnotobiotic mice colonized with defined consortia of sequenced human gut microbes, on the other hand, provide an in vivo model of the microbiota in which the identity of all taxa and genes comprising the system are known. Within these assemblages, expressed mRNAs and proteins can be attributed to their genome, gene, and species of origin, and findings of interest can be pursued in follow-up in vitro or in vivo experiments. These systems also afford an opportunity to tightly control experimental variables to a degree not possible in human studies and have proven useful in studying microbial invasion, microbe–microbe interactions, and the metabolic roles of key ecological guilds [11] – [15] . Studies aiming to better understand community-level assembly, resilience, and adaptation are therefore likely to benefit from a focus on such defined systems. However, the limited taxonomic and functional representation within artificial communities of modest complexity requires that caution be exercised when extrapolating results to more complex, naturally occurring gut communities (see Prospectus).

A growing body of evidence indicates that the tens of trillions of microbial cells that inhabit our gastrointestinal tracts extend our biological capabilities in important ways. Microbial enzymes process many compounds that would otherwise pass through our intestines unaltered [1] , and cases of particular nutrient substrates favoring the growth of particular taxa are being reported [2] – [5] . Changes in diet are therefore expected to lead to changes in the composition and function of the microbiota [6] – [10] . However, our understanding of diet–microbiota interactions at a mechanistic level is still in its infancy.

Results and Discussion

Sequencing the Bacteroides cellulosilyticus WH2 Genome Though at least eight complete and 68 draft genomes of Bacteroides spp. are currently available [27], there are numerous examples of distinct clades within this genus where little genomic information exists. To further explore the genome space of one such clade, we obtained a human fecal isolate whose four 16S rRNA gene sequences indicate a close relationship to Bacteroides cellulosilyticus (Figure S1A,B). The genome of this isolate, which we have designated B. cellulosilyticus WH2, was sequenced deeply, yielding a high-quality draft assembly (23 contigs with an N50 value of 798,728 bp; total length of all contigs in the assembly, 7.1 Mb; Table S1). Annotation of its 5,244 predicted protein-coding genes using the carbohydrate active enzyme (CAZy) database [28] revealed an extraordinary complement of 503 CAZymes comprising 373 GHs, 23 PLs, 28 carbohydrate esterases (CEs), and 84 glycosyltransferases (GTs) (see Table S2 for all annotated genes in the B. cellulosilyticus WH2 genome predicted to have relevance to carbohydrate metabolism). One distinguishing feature of gut Bacteroides genomes is the substantial number of CAZymes they encode relative to those of other intestinal bacteria [29]. The B. cellulosilyticus WH2 CAZome is enriched in a number of GH families even when compared with prominent representatives of the gut Bacteroidetes (Figure S2A). When we expanded this comparison to include all 86 Bacteroidetes in the CAZy database, we found that the B. cellulosilyticus WH2 genome had the greatest number of genes for 19 different GH families, as well as genes from two GH families that had not previously been observed within a Bacteroidetes genome (Figure S2B). Altogether, B. cellulosilyticus WH2 has more GH genes at its disposal than any other Bacteroidetes species analyzed to date. In Bacteroides spp., CAZymes are often located within PULs [30]. At a minimum, a typical PUL harbors a pair of genes with significant homology to the susC and susD genes of the starch utilization system (Sus) in B. thetaiotaomicron [30]–[32]. Other genes encoding enzymes capable of liberating oligo- and monosaccharides from a larger polysaccharide are also frequently present. The susC- and susD-like genes of these loci encode the proteins that comprise the main outer membrane binding and transport apparatus and thus represent key elements of these systems. A search of the B. cellulosilyticus WH2 genome for genes with strong homology to the susC- and susD-like genes in B. thetaiotaomicron VPI-5482 revealed an unprecedented number of susC/D pairs (a total of 118). Studies of other prominent Bacteroides spp. have found that the evolutionary expansion of these genes has played an important role in endowing the Bacteroides with the ability to degrade a wide range of host- and plant-derived polysaccharides [25],[33]. Analysis of deeply sampled adult human gut microbiota datasets indicates that B. cellulosilyticus strains are common, colonizing approximately 77% of 124 adult Europeans characterized in one study [18] and 62% of 139 individuals living in the United States examined in another survey [20]. We hypothesized that the apparent success of B. cellulosilyticus in the gut is derived in part from its substantial arsenal of genes involved in carbohydrate utilization.

Measuring Changes in the Structural Configuration of a 12-Member Model Microbiota in Response to a Dietary Perturbation To test the fitness of B. cellulosilyticus WH2 in relation to other prominent gut symbionts, and the importance of diet on its fitness, we carried out an experiment in gnotobiotic mice (experiment 1, “E 1 ,” Figure S3). Two groups of 10–12-wk-old male germ-free C57BL/6J animals were moved to individual cages within gnotobiotic isolators (n = 7 animals/group). At day zero, each animal was colonized by oral gavage with an artificial community comprising 12 human gut bacterial species (Figure 1A, Table S3). Each species chosen for inclusion in this microbial assemblage met four criteria: (i) it was a member of one of three bacterial phyla routinely found in the human gut (i.e., Bacteroidetes, Firmicutes, or Actinobacteria), (ii) it was identified as a prominent member of the human gut microbiota in previous culture-independent surveys, (iii) it could be grown in the laboratory, and (iv) its genome had been sequenced to at least a high-quality draft level. Species were also selected for their functional attributes (as judged by their annotated gene content) in an effort to create an artificial community that was somewhat representative of a more complex human microbiota. For example, although more than half of the species in the assemblage were Bacteroidetes predicted to excel at the breakdown of polysaccharides, several were also prominent inhabitants of the human gut that are thought to have limited carbohydrate utilization capabilities (e.g., Firmicutes from Clostridium cluster XIVa). Some attributes for the 12 strains included in the artificial community are provided in Table S4. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. COPRO-Seq analysis of the structure of a 12-member artificial human gut microbial community as a function of diet and time. (A) The 12 bacterial species comprising the artificial community. (B) Principal coordinates analysis (PCoA) was applied to relative abundance data generated by COPRO-Seq from two experiments (E 1 , E 2 ), each spanning 6 wk. Following colonization (day 0), mice were switched between two different diets at 2-wk intervals as described in Figure S3. COPRO-Seq data from E 1 and E 2 were ordinated in the same multidimensional space. For clarity, only data from E 2 are shown here (for the E 1 PCoA plot, see Figure S5A). Red/blue, feces; pink/cyan, cecal contents. (C) Proportional abundance data from E 1 illustrating the impact of diet on fecal levels of a diet-sensitive strain with higher representation on HF/HS chow (B. caccae), a diet-sensitive strain with higher representation on LF/HPP chow (B. ovatus), a diet-insensitive strain with no obvious diet preference (B. thetaiotaomicron), and a diet-sensitive strain with a preference for the LF/HPP diet that also achieves a high level of representation on the HF/HS diet (B. cellulosilyticus WH2). Mean values ± SEM are shown. Plots illustrating changes in abundance over time for all species in both experiments are provided in Figure S4C. https://doi.org/10.1371/journal.pbio.1001637.g001 For 2 wk, each treatment group was fed a standard low-fat/high-plant polysaccharide (LF/HPP) mouse chow, or a “Western”-like diet where calories are largely derived from fat, starch, and simple sugars (high-fat/high-sugar (HF/HS)) [12]. Over the course of 6 wk, diets were changed twice at 2-wk intervals, such that each group began and ended on the same diet, with an intervening 2-wk period during which the other diet was administered (Figure S3). Using fecal DNA as a proxy for microbial biomass, the plant polysaccharide-rich LF/HPP diet supported 2- to 3-fold more total bacterial growth (primary productivity) despite its lower caloric density (3.7 kcal/g versus 4.5 kcal/g for the HF/HS diet; Figure S4A). The HF/HS diet contains carbohydrates that are easily metabolized and absorbed in the proximal intestine (sucrose, corn starch, and maltodextrin), with cellulose being the one exception (4% of the diet by weight versus 46.3% for the other carbohydrate sources). Thus, in mice fed the HF/HS diet, diet-derived simple sugars are likely to be rare in the distal gut where the vast majority of gut microbes reside; this may provide an advantage to those bacteria capable of utilizing other carbon sources (e.g., proteins/oligopeptides, host glycans). In mice fed the LF/HPP diet, on the other hand, plant polysaccharides that are indigestible by the host should provide a plentiful source of energy for saccharolytic members of the artificial community. To evaluate the impact of each initial diet and subsequent diet switch on the structural configuration of the artificial community, we performed shotgun sequencing (community profiling by sequencing; COPRO-Seq) [11] of DNA isolated from fecal samples collected throughout the course of the experiment, as well as cecal contents collected at sacrifice. The relative abundances of the species in each sample (defined by the number of sequencing reads that could be unambiguously assigned to each microbial genome after adjusting for genome uniqueness) were subjected to ordination by principal coordinates analysis (PCoA) (Figure S5A). As expected, diet was found to be the predominant explanatory variable for observed variance (see separation along principal coordinate 1, “PC1,” which accounts for 52% of variance). The overall structure of the artificial community achieved quasi-equilibrium before the midpoint of the first diet phase, as evidenced by the lack of any significant movement along PC1 after day five. A structural reconfiguration also took place over the course of ∼5 d following transition to the second diet phase. Notably, the two treatment groups underwent a near-perfect inversion in their positions along PC1 after the first diet switch; the artificial community in animals switched from a LF/HPP to HF/HS diet took on a structure like that which arose by the end of the first diet phase in animals consuming the HF/HS diet, and vice versa. The second diet switch from phase 2 to 3 resulted in a similar movement along PC1 in the opposite direction, indicating a reversion of the artificial community's configuration to its originally assembled structure in each treatment group. These results, in addition to demonstrating the significant impact of these two diets on the structure of this 12-member artificial human gut community, also suggest that an assemblage of this size is capable of demonstrating resilience in the face of substantial diet perturbations. The assembly process and observed diet-induced reconfigurations also proved to be highly reproducible as evidenced by COPRO-Seq results from a replication of E 1 (experiment 2, “E 2 ”). In this follow-up experiment, fecal samples were collected more frequently than in E 1 , providing a dataset with improved temporal resolution. Ordination of E 2 COPRO-Seq data by PCoA showed that (i) for each treatment group in E 2 , the artificial community assembles in a manner similar to its counterpart in E 1 ; (ii) structural reconfigurations in response to diet occur with the same timing as in E 1 ; and (iii) the quasi-equilibria achieved during each diet phase are highly similar between experiments for each treatment group (compare Figures 1B and S5A). As in E 1 , cecal data for each E 2 treatment group overlap with their corresponding fecal samples, and DNA yields from E 2 fecal samples vary substantially as a function of host diet (Figure S4B). COPRO-Seq provides precise measurements of the proportional abundance of each member species present in the artificial community. Data collected in both E 1 and E 2 (Table S5) revealed significant differences between members in terms of the maximum abundance levels they achieved, the rates at which their abundance levels were impacted by diet shifts, and the degree to which each species demonstrated a preference for one diet over another (Figure S4C). Changes in each species' abundance over time replicated well across animals in each treatment group, suggesting the assembly process and diet-induced reconfigurations occur in an orderly, rules-based fashion and with minimal stochasticity in this artificial community. A species' relative abundance immediately after colonization (i.e., 24 h after gavage/day 1) was, in general, a poor predictor of its abundance at the end of the first diet phase (i.e., day 13) (E 1 R2 = 0.23; E 2 R2 = 0.27), suggesting that early dominance of the founder population was not strongly tied to relative success in the assembly process. In mice initially fed a HF/HS diet, four Bacteroides spp. (Bacteroides caccae, B. cellulosilyticus WH2, B. thetaiotaomicron, and Bacteroides vulgatus) each achieved a relative abundance of ≥10% by the end of the first diet phase (day 13 postgavage), with B. caccae attaining the highest levels (37.1±4.9% and 34.2±5.5%; group mean ± SD in E 1 and E 2 , respectively). In animals fed the plant polysaccharide-rich LF/HPP chow during the first diet phase, B. cellulosilyticus WH2 was dominant, achieving levels of 37.1±2.0% (E 1 ) and 41.6±3.9% (E 2 ) by day 13. B. thetaiotaomicron and B. vulgatus also attained relative abundances of >10%. Changes in diet often resulted in rapid, dramatic changes in a species' proportional representation. Because the dynamic range of abundance values observed when comparing multiple species was substantial (lowest, Dorea longicatena (<0.003%); highest, B. caccae (55.0%)), comparing diet responses on a common scale using raw abundance values was challenging. To represent these changes in a way that scaled absolute increases/decreases in relative abundance to the range observed for each strain, we also normalized each species' representation within the artificial community at each time-point to the maximum proportional abundance each microbe achieved across all time-points within each mouse. Plotting the resulting measure of abundance (percentage of maximum achieved; PoMA) over time demonstrates which microbes are strongly responsive to diet (experience significant swings in PoMA value following a diet switch) and which are relatively diet-insensitive (experience only modest or no significant change in PoMA value following a diet switch). Heatmap visualization of E 1 PoMA values (Figure S5B) indicated that those microbes with a preference for a particular diet in one animal treatment group also tended to demonstrate the same diet preference in the other. Likewise, diet insensitivity was also consistent across treatment groups; diet-insensitive microbes were insensitive regardless of the order in which diets were introduced. Of the diet-sensitive taxa, those showing the most striking responses were B. caccae and B. ovatus, which strongly preferred the “Western”-like HF/HS diet and the polysaccharide-rich LF/HPP diet, respectively (Figures 1C and S4C). Among the diet-insensitive taxa, B. thetaiotaomicron showed the most stability in its representation (Figures 1C and S4C), consistent with its reputation as a versatile forager. Paradoxically, B. cellulosilyticus WH2 was both diet-sensitive and highly fit on its less-preferred diet; although this strain clearly achieved higher levels of representation in animals fed the LF/HPP diet, it also maintained strong levels of representation in animals fed the HF/HS diet (Figures 1C and S4C). When taking into account the abundance data for all 12 artificial community members, proportional representation at the end of the first diet phase (i.e., day 13) was a good predictor of representation at the end of the third diet phase (i.e., day 42) (E 1 R2 = 0.77; E 2 R2 = 0.84), suggesting that the intervening dietary perturbation had little effect on the ultimate outcomes for most species within this assemblage. However, one very low-abundance strain (D. longicatena) achieved significantly different maximum percentage abundances across the two treatment groups in each experiment, suggesting that steady-state levels of this strain may have been impacted by diet history. In mice initially fed the LF/HPP diet, D. longicatena was found to persist throughout the experiment at low levels on both diet regimens. In mice initially fed the HF/HS diet, D. longicatena dropped below the limit of detection before the end of the first diet phase, was undetectable by the end of the second diet phase, and remained undetectable throughout the rest of the time course. This interesting example raises the possibility that for some species, irreversible hysteresis effects may play a significant role in determining the likelihood that they will persist within a gut over long periods of time.

The Cecal Metatranscriptome Sampled at the Time of Sacrifice These diet-induced reconfigurations in the structure of the artificial community led us to examine the degree to which its members were modifying their metabolic strategies. To establish an initial baseline static view of expression data for each microbe on each diet, we developed a custom GeneChip whose probe sets were designed to target 46,851 of the 48,023 known or predicted protein-coding genes within our artificial human gut microbiome (see Materials and Methods). Total RNA was collected from the cecal contents of each animal in E 1 at the time of sacrifice and hybridized to this GeneChip. The total number of genes whose expression was detectable on each diet was remarkably similar (14,929 and 14,594 detected in the LF/HPP→HF/HS→LF/HPP and HF/HS→LF/HPP→HF/HS treatment groups, respectively). A total of 11,373 genes (24.3%) were expressed on both diets (Figure S6A), while 2,003 (4.3%) were differentially expressed to a statistically significant degree, including 161 (6.1%) of the 2,640 genes in the microbiome encoding proteins with CAZy-recognized domains. Figure S6B illustrates the fraction of the community-level CAZome and several species-level CAZomes expressed on each diet (see Table S6 for a comprehensive list of all genes, organized by species and fold-change in expression, whose cecal expression was detectable on each diet and all genes whose expression was significantly different when comparing data from each treatment group). Among taxa demonstrating obvious diet preferences (as judged by relative abundance data), B. caccae and B. cellulosilyticus WH2 provided examples of CAZy-level responses to diet change that were different in several respects. Our observations regarding the carbohydrate utilization capabilities and preferences of B. caccae are summarized in Text S1. However, our ability to evaluate shifts in B. caccae's metabolic strategy in the gut was limited by its very low abundance in animals fed LF/HPP chow (i.e., our mRNA and subsequent protein assays were often not sensitive enough to exhaustively sample B. caccae's metatranscriptome and metaproteome). In contrast, the abundance of B. cellulosilyticus WH2, which favored the LF/HPP diet, remained high enough on both diets to allow for a comprehensive analysis of its expressed genes and proteins. This advantage, along with the exceptional carbohydrate utilization machinery encoded within the genome of this organism, encouraged us to focus on further dissecting the responses of B. cellulosilyticus WH2 to diet changes. Detailed inspection of the expressed B. cellulosilyticus WH2 CAZome (503 CAZymes in total) provided an initial view of this microbe's sophisticated carbohydrate utilization strategy. A comparison of the top decile of expressed CAZymes on each diet disclosed many shared elements between the two lists, spanning many different CAZy families, with just over half of the 50 most expressed enzymes on the plant polysaccharide-rich LF/HPP chow also occurring in the list of most highly expressed enzymes on the sucrose-, corn starch-, and maltodextrin-rich HF/HS diet (Figure 2A). Twenty-five of the 50 most expressed CAZymes on the LF/HPP diet were significantly up-regulated compared to the HF/HS diet; of these, seven were members of the GH43 family (Figure 2B). The GH43 family consists of enzymes with activities required for the breakdown of plant-derived polysaccharides such as hemicellulose and pectin. Inspection of the enzyme commission (EC) annotations for the most up-regulated GH43 genes shows that they encode xylan 1,4-β-xylosidases (EC 3.2.1.37), arabinan endo-1,5-α-L-arabinosidases (EC 3.2.1.99), and α-L-arabinofuranosidases (EC 3.2.1.55). The GH10 family, which is currently comprised exclusively of endo-xylanases (EC 3.2.1.8, EC 3.2.1.32), was also well represented among this set of 25 genes, with four of the seven putative GH10 genes in the B. cellulosilyticus WH2 genome making the list. Strikingly, of the 45 predicted genes with putative GH43 domains in the B. cellulosilyticus WH2 genome, none were up-regulated on the “Western”-style HF/HS diet. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 2. B. cellulosilyticus WH2 CAZyme expression in mice fed different diets. (A) Overview of the 50 most highly expressed B. cellulosilyticus WH2 CAZymes (GHs, GTs, PLs, and CEs) for samples from each diet treatment group. List position denotes the rank order of gene expression for each treatment group, with higher expression levels situated at the top of each list. Genes common to both lists are identified by a connecting line, with the slope of the line indicating the degree to which a CAZyme's prioritized expression is increased/decreased from one diet to the other. CAZy families in bold, colored letters highlight those list entries found to be significantly up-regulated relative to the alternative diet (i.e., a CAZyme with a bold green family designation was up-regulated on the LF/HPP diet; a bold orange family name implies a gene was up-regulated significantly on the HF/HS diet). Statistically significant fold-changes between diets are denoted in the “F.C.” column (nonsignificant fold-changes are omitted for clarity). (B) Breakdown by CAZy family of the top 10% most expressed CAZymes on each diet whose expression was also found to be significantly higher on one diet than the other. Note that for each diet, the family with the greatest number of up-regulated genes was also exclusively up-regulated on that diet (LF/HPP, GH43; HF/HS, GH13). In total, 25 genes representative of 27 families and 12 genes representative of 13 families are shown for the LF/HPP and HF/HS diets, respectively. https://doi.org/10.1371/journal.pbio.1001637.g002 The most highly expressed B. cellulosilyticus WH2 CAZyme on the plant polysaccharide-rich chow (which was also highly-expressed on the HF/HS chow) was BWH2_1228, a putative α-galactosidase from the GH36 family. These enzymes, which are not expressed by humans in the stomach or intestine, cleave terminal galactose residues from the nonreducing ends of raffinose family oligosaccharides (RFOs, including raffinose, stachyose, and verbascose), galacto(gluco)mannans, galactolipids, and glycoproteins. RFOs, which are well represented in cereal grains consumed by humans, are expected to be abundant in the LF/HPP diet given its ingredients (e.g., soybean meal), but potential substrates in the HF/HS diet are less obvious, possibly implicating a host glycolipid or glycoprotein target. Surface glycans in the intestinal epithelium of rodents are decorated with terminal fucose residues [34] as well as terminal sialic acid and sulfate [35]. Hydrolysis of the α-2 linkage connecting terminal fucose residues to the galactose-rich extended core is thought to be catalyzed in large part by GH95 and GH29 enzymes [36]. The B. cellulosilyticus WH2 genome is replete with putative GH95 and GH29 genes (total of 12 and 9, respectively), but only a few (BWH2_1350/2142/3154/3818) were expressed in vivo on at least one diet, and their expression was low relative to many other CAZymes (see Table S6). Cleavage of terminal sialic acids present in host mucins by bacteria is usually carried out by GH33 family enzymes. B. cellulosilyticus WH2 has two GH33 genes that are expressed on either one diet (BWH2_3822, HF/HS) or both diets (BWH2_4650), but neither is highly expressed relative to other B. cellulosilyticus WH2 CAZymes. Therefore, utilization of host glycans by B. cellulosilyticus WH2, if it occurs, likely requires partnerships with other members of the artificial community that express GH29/95/33 enzymes (see Table S6 for a list of members that express these enzymes in a diet-independent and/or diet-specific fashion). Among the 50 most highly expressed B. cellulosilyticus WH2 CAZymes, 12 were significantly up-regulated on the HF/HS diet compared to the LF/HPP diet, with members of family GH13 being most prevalent. While the enzymatic activities and substrate specificities of GH13 family members are varied, most relate to the hydrolysis of substrates comprising chains of glucose subunits, including amylose (one of the two components of starch) and maltodextrin, both prominent ingredients in the HF/HS diet. GeneChip-based profiling of the E 1 cecal communities provided a snapshot of the metatranscriptome on the final day of the final diet phase in each treatment group. The analysis of B. cellulosilyticus WH2 CAZyme expression suggested that this strain achieves a “generalist” lifestyle not by relying on substrates that are present at all times (e.g., host mucins), but rather by modifying its resource utilization strategy to effectively compete with other microbes for diet-derived polysaccharides that are not metabolized by the host.

Community-Level Analysis of Diet-Induced Changes in Microbial Gene Expression To develop a more complete understanding of the dynamic changes that occur in gene expression over time and throughout the artificial community following diet perturbations, we performed microbial RNA-Seq analyses using feces obtained at select time-points from mice in the LF/HPP→HF/HS→LF/HPP treatment group of E 2 (Figure S3). We began with a “top-down” analysis in which every RNA-Seq read count from every gene in the artificial microbiome was binned based on the functional annotation of the gene from which it was derived, regardless of its species of origin. In this case, the functional annotation used as the binning variable was the predicted EC number for a gene's encoded protein product. Expecting that some changes might occur rapidly, while others might require days or weeks, we searched for significant differences between the terminal time-points of the first two diet phases (i.e., points at which the model human gut microbiota had been allowed 13 d to acclimate to each diet). The 157 significant changes we identified were subjected to hierarchical clustering by EC number to determine which functional responses occurred with similar kinetics. The results revealed that in contrast to the rapid, diet-induced structural reconfigurations observed in this artificial community, community-level changes in microbial gene expression occurred with highly variable timing that differed from function to function. These changes were dominated by EC numbers associated with enzymatic reactions relevant to carbohydrate and amino acid metabolism (see Table S7 for a summary of all significant changes observed, including aggregate expression values for each functional bin (EC number) at each time-point). Significant responses could be divided into one of three groups: “rapid” responses were those where the representation of EC numbers in the transcriptome increased/decreased dramatically within 1–2 d of a diet switch; “gradual” responses were those where the representation of EC numbers changed notably, but slowly, between the two diet transition points; and “delayed” responses were those where significant change did not occur until the end of a diet phase (Figure 3, Table S7). EC numbers associated with reactions important in carbohydrate metabolism and transport were distributed across all three of these response types for each of the two diets. Nearly all genes encoding proteins with EC numbers related to amino acid metabolism that were significantly up-regulated on HF/HS chow binned into the “rapid” or “gradual” groups, suggesting this diet put immediate pressure on the artificial microbial community to increase its repertoire of expressed amino acid biosynthesis and degradation genes. Genes with assigned EC numbers involved in amino acid metabolism that were significantly up-regulated on the other, polysaccharide-rich, LF/HPP diet were spread more evenly across these three response types (Figure 3). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 3. Top-down analysis of fecal microbiome RNA expression in mice receiving oscillating diets. The fecal metatranscriptomes of four animals in the LF/HPP→HF/HS→LF/HPP treatment group of E 2 were analyzed using microbial RNA-Seq at seven time-points to evaluate the temporal progression of changes in expressed microbial community functions triggered by a change in diet. After aligning reads to genes in the defined artificial human gut microbiome, raw counts were collapsed by the functional annotation (EC number) of the gene from which the corresponding reads originated. Total counts for each EC number in each sample were normalized, and any EC numbers demonstrating a statistically significant difference in their representation in the metatranscriptome between the final days of the first two diet phases were identified using a model based on the negative binomial distribution [57]. Normalized expression values for 157 significant EC numbers (out of 1,021 total tested) were log-transformed, mean-centered, and subjected to hierarchical clustering, followed by heatmap visualization. “Rapid” responses are those where expression increased/decreased dramatically within 1–2 d of a diet switch. “Gradual” responses are those where expression changed notably, but slowly, between the two diet transition points. “Delayed” responses are those where significant expression changes did not occur until the end of a diet phase. EC numbers specifying enzymatic reactions relevant to carbohydrate metabolism and/or transport are denoted by purple markers, while those with relevance to amino acid metabolism are indicated using orange markers. A full breakdown of all significant responses over time and the outputs of the statistical tests performed are provided in Table S7. https://doi.org/10.1371/journal.pbio.1001637.g003 Careful inspection of our top-down analysis results and a complementary “bottom-up” analysis in which normalization was performed at the level of individual species, rather than at the community level, allowed us to identify other important responses that would have gone undetected were it not for the fact that we were dealing with a defined assemblage of microbes where all of the genes in component members' genomes were known. For example, an assessment of the representation of EC 3.2.1.8 (endo-1,4-β-xylanase) within the metatranscriptome before and after the first diet switch (LF/HPP→HF/HS) initially suggested that this activity was reduced to a statistically significant degree as a result of the first diet perturbation (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7A). Aggregation by species of all sequencing read counts assignable to mRNAs encoding proteins with this EC number revealed that over 99% of the contributions to this functional bin originated from B. cellulosilyticus WH2 (note the similarity in a comparison of Figure S7A and Figure S7B), implying that the community-level response and the response of this Bacteroides species were virtually one and the same. A tally of all sequencing reads assignable to B. cellulosilyticus WH2 at each time-point disclosed that although this strain maintains high proportional representation in the artificial community throughout each diet oscillation period (range, 10.3–42.5% and 11.6–43.3% for E 1 and E 2 , respectively), its contribution to the metatranscriptome is substantially decreased during the HF/HS diet phase (Figure S7C). This dramatic reduction in the extent to which B. cellulosilyticus WH2 contributes to the metatranscriptome in HF/HS-fed mice “masks” the significant up-regulation of EC 3.2.1.8 that occurs within the B. cellulosilyticus WH2 transcriptome following the first diet shift (day 13 versus day 27; Mann–Whitney U test, p = 0.03; Figure S7D). A further breakdown of endo-1,4-β-xylanase up-regulation in B. cellulosilyticus WH2 when mice are switched to the HF/HS diet reveals that most of this response is driven by two genes, BWH2_4068 and BWH2_4072 (Figure S7E). Our realization that we were unable to correctly infer the direction of one of the most significant diet-induced gene expression changes in the second most abundant strain in the artificial community when inspecting functional responses at the community level provides a strong argument for expanding the use of microbial assemblages comprised exclusively of sequenced species in studies of the gut microbiota. This should allow the contributions of individual species to community activity to be evaluated in a rigorous way that is not possible with microbial communities of unknown or poorly defined gene composition.

High-Resolution Profiling of the Cecal Metaproteome Sampled at the Time of Sacrifice In principle, protein measurements can provide a more direct readout of expressed community functions than an RNA-level analysis, and thus a deeper understanding of community members' interactions with one another and with their habitat [37],[38]. For these reasons and others, much work has been dedicated to applying shotgun proteomics techniques to microbial ecosystems in various environments [39],[40]. Though these efforts have provided illustrations of significant methodological advances, they have been limited by the complexity of the metaproteomes studied and by the difficulties this complexity creates when attempting to assign peptide identities uniquely to proteins of specific taxa. Recognizing that a metaproteomics analysis of our artificial community would not be subject to such uncertainty given its fully defined microbiome and thus fully defined theoretical proteome, we subjected cecal samples from two mice from each diet treatment group in E 1 (n = 4 total) to high-performance liquid chromatography-tandem mass spectrometry (LC-MS/MS; see Materials and Methods). We had three goals: (i) to evaluate how our ability to assign peptide-spectrum matches (PSMs) to particular proteins within a theoretical metaproteome is affected by the presence of close homologs within the same species and within other, closely related species; (ii) to test the limits of our ability to characterize protein expression across different species given the substantial dynamic range we documented in microbial species abundance; and (iii) to collect semiquantitative peptide/protein data that might validate and enrich our understanding of functional responses identified at the mRNA level, particularly with respect to the niche (profession) of CAZyme-rich B. cellulosilyticus WH2. Given the evolutionary relatedness of the strains involved, we expected that some fraction of observed PSMs from each sample would be of ambiguous origin due to nonunique peptides shared between species' proteomes. To assess which species might be most affected by this phenomenon when characterizing the metaproteome on different diets, we catalogued each strain's theoretical peptidome using an in silico tryptic digest. This simulated digest took into account both the potential for missed trypic cleavages and the peptide mass range that could be detected using our methods. The results (Figure S8A, Table S8) demonstrated that for an artificial community of modest complexity, the proportion of peptides within each strain's theoretical peptidome that are “unique” (i.e., assignable to a single protein within the theoretical metaproteome) varies substantially from species to species, even among those that are closely related. We found the lone representative of the Actinobacteria in the artificial community, Collinsella aerofaciens, to have the highest proportion of unique peptides (94.2%), while B. caccae had the lowest (63.0%). Interestingly, there was not a strong correlation between the fraction of a species' peptides that were unique and the total number of unique peptides that species contributed to the theoretical peptidome. For example, C. aerofaciens (2,367 predicted protein-coding genes) contributed only 81,894 (1.5%) unique peptides, the lowest of any artificial community member evaluated, despite having a proteome composed of mostly unique peptides. On the other hand, B. cellulosilyticus WH2 (5,244 predicted protein-coding genes) contributed 241,473 (4.5%) unique peptides, the highest of any member despite a high fraction of nonunique peptides (18.4%) within its theoretical peptidome. The evolutionary relatedness of the Bacteroides components of the artificial community appeared to negatively affect our ability to assign their peptides to specific proteins; their six theoretical peptidomes had the six lowest uniqueness levels. However, their greater number of proteins and peptides relative to the Firmicutes and Actinobacteria more than compensated for this deficiency; over 60% of unique peptides within the unique theoretical metaproteome were contributed by the Bacteroides. We also found that the proportion of PSMs uniquely assignable to a single protein within the metaproteome varied significantly by function, suggesting that some classes of proteins can be traced back to specific microbes more readily than others. For example, when considering all theoretical peptides that could be derived from the proteome of a particular bacterial species, those from proteins with roles in categories with high expected levels of functional conservation (e.g., translation and nucleotide metabolism) were on average deemed unique more often than those from proteins with roles in functions we might expect to be less conserved (e.g., glycan biosynthesis and metabolism) (see Table S8 for a summary of how peptide uniqueness varied across different KEGG categories and pathways, and across different species in the experiment). However, even in KEGG categories and pathways with high expected levels of functional conservation, the vast majority of peptides were found to be unique when a particular species was not closely related to other members of the artificial community. Next, we determined the average number of proteins that could be experimentally identified in our samples for each microbial species within each treatment group in E 1 . The results (Figure S8B, Table S9) illustrate two important conclusions. First, although equal concentrations of total protein were evaluated for each sample, slightly less than twice as many total microbial proteins were identified in samples from the LF/HPP-fed mice as those from mice fed the HF/HS diet (4,659 versus 2,777, respectively). While there are a number of possible explanations, both this finding and the higher number of mouse proteins detected in samples from HF/HS-fed animals are consistent with the results of our fecal DNA analysis, which indicated that the HF/HS diet supports lower levels of gut microbial biomass than the LF/HPP diet (Figure S4A,B). Second, a breakdown of all detected microbial proteins by species of origin (Figure S8B) revealed that the degree to which we could inspect protein expression for a given species was dictated largely by its relative abundance and the diet to which it was exposed. Our ability to detect many of B. cellulosilyticus WH2's expressed transcripts and proteins in samples from both diet treatment groups allowed us to determine how well RNA and protein data for an abundant, active member of the artificial community might correlate. These data also allowed us to evaluate whether or not the types of genes considered might influence the degree of correlation between these two datasets. We first performed a spectral count-based correlation analysis on the diet-induced, log-transformed, average fold-differences in expression for all B. cellulosilyticus WH2 genes that were detectable at both the RNA and protein level for both diets. The results revealed a moderate degree of linear correlation between RNA and protein observations (Figure S8C, black plot; r = 0.53). However, subsequent division of these genes into functionally related subsets, which were each subjected to their own correlation analysis, revealed striking differences in the degree to which RNA-level and protein-level expression changes agreed with one another. For example, diet-induced changes in mRNA expression for genes involved in translation showed virtually no correlation with changes measured at the protein level (Figure S8C, red plot; r = 0.03). Correlations for other categories of B. cellulosilyticus WH2 genes, such as those involved in energy metabolism (Figure S8C, green plot; r = 0.36) and amino acid metabolism (Figure S8C, orange plot; r = 0.48), were also poorer than the correlation for the complete set of detectable genes. In contrast, the correlation for the 110 genes with predicted involvement in carbohydrate metabolism was quite strong (Figure S8C, blue plot; r = 0.69), and was in fact the best correlation identified for any functional category of genes considered. The significant range of correlations observed in different categories of genes suggests that the degree to which RNA-based analyses provide an accurate picture of microbial adaptation to environmental perturbation may be strongly impacted by the functional classification of the genes involved. Additionally, these data further emphasize the need for enhanced dynamic range metaproteome measurements and better bioinformatic methods to assign/bin unique and nonunique peptides so that deeper and more thorough surveys of the microbial protein landscape can be performed and evaluated alongside more robust transcriptional datasets.

Identifying Two Diet-Inducible, Xylanase-Containing PULs Whose Genetic Disruption Results in Diet-Specific Loss of Fitness Several of the most highly expressed and diet-sensitive B. cellulosilyticus WH2 genes in this study fell within two putative PULs. One PUL (BWH2_4044–55) contains 12 ORFs that include a dual susC/D cassette, three putative xylanases assigned to CAZy families GH8 and GH10, a putative multifunctional acetyl xylan esterase/α-L-fucosidase, and a putative hybrid two-component system regulator (Figure 4A). Gene expression within this PUL was markedly higher in mice consuming the plant polysaccharide-rich LF/HPP diet at both the mRNA and protein level. Our mRNA-level analysis disclosed that BWH2_4047 was the most highly expressed B. cellulosilyticus WH2 susD homolog on this diet. Likewise, BWH2_4046/4, the two susC-like genes within this PUL, were the 2nd and 4th most highly expressed B. cellulosilyticus WH2 susC-like genes in LF/HPP-fed animals, and exhibited expression level reductions of 99.5% and 93% in animals consuming the HF/HS diet. The same LF/HPP diet bias was observed for other genes within this PUL (Figures 2A and 4B) but not for neighboring genes. The same trends were obvious and amplified when we quantified protein expression (Figure 4C). In mice fed LF/HPP chow, only three B. cellulosilyticus WH2 SusC-like proteins had higher protein levels than BWH2_4044/6, and only two SusD-like proteins had higher levels than BWH2_4045/7. Strikingly, we were unable to detect a single peptide from 9 of the 12 proteins in this PUL in samples obtained from mice fed the HF/HS diet, emphasizing the strong diet specificity of this locus. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 4. Two xylanase-containing B. cellulosilyticus WH2 PULs demonstrating strong diet-specific expression patterns in vivo. (A) The PUL spanning BWH2_4044–55 includes a four-gene cassette comprising two consecutive susC/D pairs, multiple genes encoding GHs and CEs, and a gene encoding a putative hybrid two-component system (HTCS) presumed to play a role in the regulation of this locus. GH10 enzymes are endo-xylanases (most often endo-β-1,4-xylanases), while some GH5 and GH8 enzymes are also known to have endo- or exo-xylanase activity. CE6 enzymes are acetyl xylan esterases, as are some members of the CE1 family. A second PUL spanning BWH2_4072–6 contains a susC/D cassette, an endo-xylanase with dual GH10 modules as well as dual carbohydrate (xylan) binding modules (CBM22), a hypothetical protein of unknown function, and a putative HTCS. (B) Heatmap visualization of GeneChip expression data for BWH2_4044–55 and BWH2_4072–6 showing marked up-regulation of these putative PULs when mice are fed either a plant polysaccharide-rich LF/HPP diet or a diet high in fat and simple sugar (HF/HS), respectively. Data are from cecal contents harvested from mice at the endpoint of experiment E 1 . (C) Mass spectrometry-based quantitation of the abundance of all cecal proteins from the BWH2_4044–55 and BWH2_4072–6 PULs that were detectable in the same material used for GeneChip quantitation in panel (B). Bars represent results (mean ± SEM) from two technical runs per sample. For each MS run, the spectral counts for each protein were normalized against the total number of B. cellulosilyticus WH2 spectra acquired. (D) Comparison of in vivo PUL gene expression as measured by RNA-Seq (top) and the degree to which disruption of each gene within each PUL by a transposon impacts the fitness of B. cellulosilyticus WH2 on each diet, as measured by insertion sequencing (INSeq, bottom). For the lower plots, fitness measurements were calculated by dividing a mutant's representation (normalized sequencing counts) within the fecal output population by its representation within an input population that was introduced into germ-free animals via a single oral gavage along with other members of the artificial community. For cases in which no instances of a particular mutant could be measured in the fecal output (resulting in a fitness calculation denominator of zero), data are plotted as “<0.01” and are drawn without error bars. https://doi.org/10.1371/journal.pbio.1001637.g004 A second PUL in the B. cellulosilyticus WH2 genome composed of a susC/D-like pair (BWH2_4074/5), a putative hybrid two-component system regulator (BWH2_4076), and a xylanase (GH10) with dual carbohydrate binding module domains (CBM22) (BWH2_4072) (Figure 4A) demonstrated a strong but opposite diet bias, in this case exhibiting significantly higher expression in animals consuming the HF/HS “Western”-like diet. Our mRNA-level analysis showed that this xylanase was the second most highly expressed B. cellulosilyticus WH2 CAZyme in animals consuming this diet (Figure 2A). As with the previously described PUL, shotgun metaproteomics validated the transcriptional analysis (Figure 4B,C); with the exception of the gene encoding the PUL's presumed transcriptional regulator (BWH2_4076), diet specificity was substantial, with protein-level fold changes ranging from 10–33 across the locus (Table S10). Recent work by Cann and co-workers has done much to advance our understanding of the regulation and metabolic role of xylan utilization system gene clusters in xylanolytic members of the Bacteroidetes, particularly within the genus Prevotella [41]. The “core” gene cluster of the prototypical xylan utilization system they described consists of two tandem repeats of susC/susD homologs (xusA/B/C/D), a downstream hypothetical gene (xusE) and a GH10 endoxylanase (xyn10C). The 12-gene PUL identified in our study (BWH2_4044–55) appears to contain the only instance of this core gene cluster within the B. cellulosilyticus WH2 genome, suggesting that this PUL, induced during consumption of a plant polysaccharide-rich diet, is likely to be the primary xylan utilization system within this organism. A recent study characterizing the carbohydrate utilization capabilities of B. ovatus ATCC 8483 also identified two PULs involved in xylan utilization (BACOVA_04385–94, BACOVA_03417–50) whose gene configurations differ from those described in Prevotella spp. [25]. Interestingly, the five proteins encoded by the smaller xylanase-containing PUL described above (BWH2_4072–6) are homologous to the products of the last five genes in BACOVA_4385–94 (i.e., BACOVA_4390–4). The order of these five genes in these two loci is also identical. The similarities and differences observed when comparing the putative xylan utilization systems encoded within the genomes of different Bacteroidetes illustrate how its members may have evolved differentiated strategies for utilizing hemicelluloses like xylan. Having established that expression of BWH2_4044–55 and BWH2_4072–6 is strongly dictated by diet, we next sought to determine if these PULs are required by B. cellulosilyticus WH2 for fitness in vivo. A follow-up study was performed in which mice were fed either a LF/HPP or HF/HS diet after being colonized with an artificial community similar to the one used in E 1 and E 2 (see Materials and Methods). The wild-type B. cellulosilyticus WH2 strain used in our previous experiments was replaced with a transposon mutant library consisting of over 90,000 distinct transposon insertion mutants in 91.5% of all predicted ORFs (average of 13.9 distinct insertion mutants per ORF). The library was constructed using methods similar to those reported by Goodman et al. ([42]; see Materials and Methods) so that the relative proportion of each insertion mutant in both the input (orally gavaged) and output (fecal) populations could be determined using insertion sequencing (INSeq). The INSeq results revealed clear, diet-specific losses of fitness when components of these loci were disrupted (Figure 4D). Additionally, as observed in E 1 and E 2 , expression of each PUL was strongly biased by diet, with genes BWH2_4072–5 demonstrating up-regulation on the HF/HS diet and BWH2_4044–55 on the LF/HPP diet. The extent to which a gene's disruption impacted the fitness of B. cellulosilyticus WH2 on one diet or the other correlated well with whether or not that gene was highly expressed on a given diet. For example, four of the five most highly expressed genes in the BWH2_4044–55 locus were the four genes shown to be most crucial for fitness on the LF/HPP diet. Of these four genes, three were susC or susD homologs (the fourth was the putative endo-1,4-β-xylanase thought to constitute the last element of the xylan utilization system core). Though the fitness cost of disrupting genes within BWH2_4044–55 varied from gene to gene, disruption of any one component of the BWH2_4072–6 PUL had serious consequences for B. cellulosilyticus WH2 in animals fed the HF/HS diet. This could suggest that while disruption of some components of the BWH2_4044–55 locus can be rescued by similar or redundant functions elsewhere in the genome, the same is not true for BWH2_4072–5. Notably, disruption of BWH2_4076, which is predicted to encode a hybrid two-component regulatory system, had negative consequences on either diet tested, indicating that regulation of this locus is crucial even when the PUL is not actively expressed. While many genes outside of these two PULs were also found to be important for the in vivo fitness of B. cellulosilyticus WH2, those within these PULs were among the most essential to diet-specific fitness, suggesting that these loci are central to the metabolic lifestyle of B. cellulosilyticus WH2 in the gut.

Characterizing the Carbohydrate Utilization Capabilities of B. cellulosilyticus WH2 and B. caccae The results described in the preceding section indicate that B. cellulosilyticus WH2 prioritizes xylan as a nutrient source in the gut and that it tightly regulates the expression of its xylan utilization machinery. Moreover, the extraordinary number of putative CAZymes and PULs within the B. cellulosilyticus WH2 genome suggests that it is capable of growing on carbohydrates with diverse structures and varying degrees of polymerization. To characterize its carbohydrate utilization capabilities, we defined its growth in minimal medium (MM) supplemented with one of 46 different carbohydrates [25]. Three independent growths, each consisting of two technical replications, yielded a total of six growth curves for each substrate. Of the 46 substrates tested, B. cellulosilyticus WH2 grew on 39 (Table S11); they encompassed numerous pectins (6 of 6), hemicelluloses/β-glucans (8 of 8), starches/fructans/α-glucans (6 of 6), and simple sugars (14 of 15), as well as host-derived glycans (4 of 7) and one cellooligosaccharide (cellobiose). The seven substrates that did not support growth included three esoteric carbohydrates (carrageenan, porphyran, and alginic acid), the simple sugar N-acetylneuraminic acid, two host glyans (keratan sulfate and mucin O-glycans), and fungal cell wall-derived α-mannan. B. cellulosilyticus WH2 clearly grew more robustly on some carbohydrates than others. Excluding simple sugars, fastest growth was achieved on dextran (0.099±0.048 OD 600 units/h), laminarin (0.095±0.014), pectic galactan (0.088±0.018), pullulan (0.088±0.026), and amylopectin (0.085±0.003). Although one study has reported that the type strain of B. cellulosilyticus degrades cellulose [43], the WH2 strain failed to demonstrate any growth on MM plus cellulose (specifically, Solka-Floc 200 FCC from International Fiber Corp.) after 5 d. Maximum cell density was achieved with amylopectin (1.17±0.02 OD 600 units), dextran (1.12±0.20), cellobiose (1.09±0.08), laminarin (1.08±0.08), and xyloglucan (0.99±0.04). Total B. cellulosilyticus WH2 growth (i.e., maximum cell density achieved) on host-derived glycans was typically very poor, with only two substrates achieving total growth above 0.2 OD 600 units (chondroitin sulfate, 0.50±0.04; glycogen, 0.99±0.02). The disparity between total growth on plant polysaccharides versus host-derived glycans, including O-glycans that are prevalent in host mucin, indicates a preference for diet-derived saccharides, consistent with our in vivo mRNA and protein expression data. We also determined how the growth rate of B. cellulosilyticus WH2 on these substrates compared to the growth rates for other prominent gut Bacteroides spp. After subjecting B. caccae to the same phenotypic characterization as B. cellulosilyticus WH2, we combined our measurements for these two strains with previously published measurements for B. thetaiotaomicron and B. ovatus [25]. The results underscored the competitive growth advantage B. cellulosilyticus WH2 likely enjoys when foraging for polysaccharides in the intestinal lumen. For example, of the eight hemicelluloses and β-glucans tested in our carbohydrate panel, B. cellulosilyticus WH2 grew fastest on six while B. ovatus grew fastest on two (Table S11). B. caccae and B. thetaiotaomicron, on the other hand, failed to grow on any of these substrates. Across all the carbohydrates for which data are available for all four species, B. cellulosilyticus WH2 grew fastest on the greatest number of polysaccharides (11 of 26) and tied with B. caccae for the greatest number of monosaccharides (6 of 15). B. thetaiotaomicron and B. caccae did, however, outperform the other two Bacteroides tested with respect to utilization of host glycans in vitro, demonstrating superior growth rates on four of five substrates tested (Table S11). B. cellulosilyticus WH2's rapid growth to high densities on xylan, arabinoxylan, and xyloglucan, as well as xylose, arabinose, and galactose, is noteworthy given our prediction that two of its most tightly regulated, highly expressed PULs appear to be involved in the utilization of xylan, arabinoxylan, or some closely related polysaccharide. To identify specific mono- and/or polysaccharides capable of triggering the activation of these two PULs, as well as the 111 other putative PULs within the B. cellulosilyticus WH2 genome, we used RNA-Seq to characterize its transcriptional profile at mid-log phase in MM (Table S12) plus one of 16 simple sugars or one of 15 complex sugars (Table S13) (see Materials and Methods; n = 2–3 cultures/substrate; 5.2–14.0 million raw Illumina HiSeq reads generated for each of the 90 transcriptomes). After mapping each read to the B. cellulosilyticus WH2 reference gene set, counts were normalized using DESeq to allow for direct comparisons across samples and conditions. Hierarchical clustering of the normalized dataset resulted in a well-ordered dendrogram in which samples clustered almost perfectly by the carbohydrate on which B. cellulosilyticus WH2 was grown (Figure 5A). The consistency of this clustering illustrates that (i) technical replicates within each condition exhibit strong correlations with one another, suggesting any differences between cultures in a treatment group (e.g., small differences in density or growth phase) had at best minor effects on aggregate gene expression, and (ii) growth on different carbohydrates results in distinct, substrate-specific gene expression signals capable of driving highly discriminatory differences between treatment groups. The application of rigorous bootstrapping to our hierarchical clustering results also revealed several cases of higher level clusters in which strong confidence was achieved. These dendrogram nodes (illustrated as white circles) indicate sets of growth conditions that yield gene expression patterns more like each other than like the patterns observed for other substrates. Two notable examples were xylan/arabinoxylan (which are structurally related and share the same xylan backbone) and L-fucose/L-rhamnose (which are known to be metabolized via parallel pathways in E. coli [44]). PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 5. In vitro microbial RNA-Seq profiling of B. cellulosilyticus WH2 during growth on different carbohydrates. (A) Hierarchical clustering of the gene expression profiles of 90 cultures grown in minimal medium supplemented with one of 31 simple or complex sugars (n = 2–3 replicates per condition). Circles at dendrogram branch points identify clusters with strong bootstrapping support (>95%; 10,000 repetitions). Solid circles denote clusters comprising only replicates from a single treatment group/carbohydrate, while open circles denote higher level clusters comprising samples from multiple treatment groups. Colored rectangles indicate the type of carbohydrate on which the samples within each cluster were grown. (B) Unclustered heatmap representation of fold-changes in gene expression relative to growth on minimal medium plus glucose (MM-Glc) for 60 of the 236 paired susC- and susD-like genes identified within the B. cellulosilyticus WH2 genome (for a full list of all paired and unpaired susC and susD homologs, see Table S2). Data shown are limited to those genes whose expression on at least one of the 31 carbohydrates tested demonstrated a >100-fold increase relative to growth on MM-Glc for at least one of the replicates within the treatment group. Yellow boxes denote areas of the map where both genes in a susC/D pair were up-regulated >100-fold for at least two of the replicates in a treatment group and where the average up-regulation for each gene in the pair was >100-fold across all replicates of the treatment group. Two sets of columns to the right of the heatmap indicate PULs that were detectably expressed at the mRNA level (left set of columns) and/or protein level (right set of columns) in experiment 1 (E 1 ). Red and black circles indicate that both genes in a susC/D pair were consistently expressed on a particular diet, as determined by GeneChip analysis of cecal RNA (≥5 of 7 animals assayed) or LC-MS/MS analysis of cecal protein (2 of 2 animals assayed). In both cases, a red circle denotes significantly higher expression on one diet compared to the other. https://doi.org/10.1371/journal.pbio.1001637.g005 Importantly, these findings suggested that by considering in vitro profiling data alongside in vivo expression data from the artificial community, it might be possible to identify the particular carbohydrates to which B. cellulosilyticus WH2 is exposed and responding within its gut environment. To explore this concept further, we compared expression of each gene in each condition to its expression on our control treatment, MM plus glucose (MM-Glc). The results revealed a dynamic PUL activation network in which some PULs were activated by a single substrate, some were activated by multiple substrates, and some were transcriptionally silent across all conditions tested. Of the 118 putative susC/D pairs in B. cellulosilyticus WH2 that we have used as markers of PULs, 30 were dramatically activated on one or more of the substrates tested; in these cases, both the susC- and susD-like genes in the cassette were up-regulated an average of >100-fold relative to MM-Glc across all technical replicates (Figure 5B). At least one susC/D activation signature was identified for every one of the 17 oligosaccharides and polysaccharides and for six of the 13 monosaccharides tested (Table S14). The lack of carbohydrate-specific PUL activation events for some monosaccharides (fructose, galactose, glucuronic acid, sucrose, and xylose) was expected, given that these loci are primarily dedicated to polysaccharide acquisition. Further inspection of gene expression outside of PULs disclosed that B. cellulosilyticus WH2 prioritizes use of its non-PUL-associated carbohydrate machinery, such as putative phosphotransferase system (PTS) components and monosaccharide permeases, when grown on these monosaccharides (Table S14). Several carbohydrates activated the expression of multiple PULs. Growth on water-soluble xylan and wheat arabinoxylan produced significant up-regulation of five susC/D-like pairs (BWH2_0865/6, 0867/8, 4044/5, 4046/7, and 4074/5). No other substrate tested activated as many loci within the genome, again hinting at the importance of xylan and arabinoxylan to this strain's metabolic strategy in vivo. Cecal expression data from E 1 showed that 15 of these activated PULs were expressed in vivo on one or both of the diets tested (see circles to the right of the heatmap in Figure 5B). In mice fed the polysaccharide-rich LF/HPP chow, B. cellulosilyticus WH2 up-regulates three susC/D pairs (BWH2_2717/8, 4044/5, 4046/7) whose expression is activated in vitro by arabinan and xylan/arabinoxylan. The three most significantly up-regulated susC/D pairs (BWH2_1736/7, 2514/5, 4074/5) in mice fed the HF/HS diet rich in sugar, corn starch, and maltodextrin are activated in vitro by amylopectin, ribose, and xylan/arabinoxylan, respectively. All three PULs identified as being up-regulated at the RNA level in LF/HPP-fed mice were also found to be up-regulated at the protein level (Figure 5B). Two of the three PULs up-regulated at the mRNA level in HF/HS-fed mice were up-regulated at the protein level as well. The presence of an amylopectin-activated PUL among these two loci is noteworthy, given the significant amount of starch present in the HF/HS diet. The up-regulation of four other PULs in HF/HS-fed animals was only evident in our LC-MS/MS data, reinforcing the notion that protein data both complement and supplement mRNA data when profiling microbes of interest. Two of the five susC/D pairs activated by xylan/arabinoxylan form the four-gene cassette in the previously discussed PUL comprising BWH2_4044–55 that is activated in mice fed the plant polysaccharide-rich chow (see Figure 4A). Another one of the five is the susC/D pair found in the PUL comprising BWH2_4072–6 that is activated in mice fed the HF/HS “Western”-like chow (see Figure 4A). Thus, we have identified a pair of putative PULs in close proximity to one another on the B. cellulosilyticus WH2 genome that encode CAZymes with similar predicted functions, are subject to near-identical levels of specific activation by the same two polysaccharides (i.e., xylan, arabinoxylan) in vitro, but are discordantly regulated in vivo in a diet-specific manner. The highly expressed nature of these PULs in the diet environment where they are active, their shared emphasis on xylan/arabinoxylan utilization, and their tight regulation indicate that they are likely to be important for the in vivo success of this organism in the two nutrient environments tested. However, the reasons for their discordant regulation are unclear. One possibility is that in addition to being activated by xylan/arabinoxylan and related polysaccharides, these loci are also subject to repression by other substrates present in the lumen of the gut, and this repression is sufficient to block activation. Alternatively, the specific activators of each PUL may be molecular moieties shared by both xylan and arabinoxylan that do not co-occur in the lumenal environment when mice are fed the diets tested in this study.