Our models, based on the abundance of several, mainly Firmicute species at baseline, predicted the responsiveness of the microbiota (AUC = 0.77–1; predicted vs. observed correlation = 0.67–0.88). Many of the predictive taxa showed a non-linear relationship with the responsiveness. The microbiota response associated with the change in serum cholesterol levels with an AUC of 0.96, highlighting the involvement of the intestinal microbiota in metabolic health.

Our study involved three independent cohorts of obese adults (n = 78) from Belgium, Finland, and Britain, participating in different dietary interventions aiming to improve metabolic health. We used a phylogenetic microarray for comprehensive fecal microbiota analysis at baseline and after the intervention. Blood cholesterol, insulin and inflammation markers were analyzed as indicators of host response. The data were divided into four training set – test set pairs; each intervention acted both as a part of a training set and as an independent test set. We used linear models to predict the responsiveness of the microbiota and the host, and logistic regression to predict responder vs. non-responder status, or increase vs. decrease of the health parameters.

Interactions between the diet and intestinal microbiota play a role in health and disease, including obesity and related metabolic complications. There is great interest to use dietary means to manipulate the microbiota to promote health. Currently, the impact of dietary change on the microbiota and the host metabolism is poorly predictable and highly individual. We propose that the responsiveness of the gut microbiota may depend on its composition, and associate with metabolic changes in the host.

Funding: The study involved samples from previously published studies including the Belgian trial funded the FNRS grant, British study funded by World Cancer Research Fund and the Finnish trials funded by EU 6th Framework Programme in project HEALTHGRAIN. WMdV and AS are funded by the grants to WMdV by Academy of Finland (grant 1141130) http://www.aka.fi/en-GB/A/ and ERC (grant 400795) http://erc.europa.eu/advanced-grants . KK is funded by the Helsinki Biomedical Graduate Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

We propose that the composition of the gut microbiota may be informative in predicting the responses of the microbiota and of the host to a dietary intervention. Community composition influences the responses of its members to disturbances through ecological and evolutionary interactions [15] ; the baseline composition of the gut microbiota is likely to influence the responses of individual bacterial strains, and consequently those of the bacterial community and the host. We test this hypothesis using three independent data sets of obese individuals undergoing different types of dietary interventions, and attempt to predict the responses of both the host and the microbiota.

The gut microbiota is an important contributor to human health, and is emerging as a promising target for therapeutic modulation [1] , [2] . Obesity-related diseases offer a prime example where intestinal bacteria have recently been implicated as one etiological factor [3] – [5] ; hence modifying the gut microbiota represents a potential strategy for successful treatment [3] , [6] , [7] . However, it is currently impossible to make practical guidelines as to how the microbiota should be modified. Although recent research has identified compositional and functional properties that characterize the intestinal microbiota in healthy individuals [8] , we are lacking a definition for a healthy microbiota, mainly because of the vast inter-individual variation [9] . Furthermore, individuals' responses to dietary interventions are highly variable and poorly predictable – both in terms of host metabolism as well as the gut microbiota – and sometimes even contrary to what was expected from in vitro studies [10] – [13] . Hence, the key challenge for the therapeutic modulation of the gut microbiota is to identify individuals who will benefit from a given intervention, with respect to their microbiota composition, and most importantly, with regard to clinical health markers. Personalized nutritional and pharmaceutical therapy, based on information of the individual's gut microbiota, have great prospects in the treatment of obesity and related conditions [10] , [14] .

To ensure that the normalization did not confound the analysis, we tested the models with the non-normalized microbiota data. The models performed well for studies A, B, and D. The responses within the study C, which had the most divergent L1 composition, could only be predicted after data normalization (data not shown).

Host responsiveness was treated as categorical (>10% increase vs. >10% decrease, excluding the cases with <10% change) and continuous. Model selection and validation for HOMA and CRP responses were conducted as detailed above. In the case of the responses, the different studies were not directly comparable due to different average responses per study. We corrected for this by including the study effect in the cholesterol response models as a fixed term, and performed model selection and validation by dividing the total data set randomly into a training set (75% of the data) and a validation set (25%). We present the combined result of 5 times repeated model validation.

To select and validate the predictive model for microbiota responsiveness, we fitted linear models (separately for each training data set) with the microbiota stability as the response variable and the abundance of each L2 and L3 bacterial group separately as the only explanatory variable, allowing for linear and quadratic relationships. Although linear models assume that the relationship between the predictor and response variable is linear, non-linear relationships can be estimated by including a quadratic term in the model: the relationship between the response variable may be linearly related to the predictor squared, and thus non-linearly (quadratically) related to the predictor. From these models we extracted p-values for the bacterial groups, as indicators of their potential relevance as predictors of microbiota responsiveness. We then built full models separately for each training set, which included all bacterial groups with p-values <0.02, now allowing for interactions between the bacterial groups. These models were then reduced using AIC (Akaike Information Criterion) as the criterion of inclusion/exclusion of variables. Several different penalty values (2–8) were used to arrive at a set of different-sized models. These models were then tested for their ability to predict the independent validation data set by calculating the correlation between the model-predicted and the observed stability values for the validation set. The final best model was chosen as the one, which emerged from all four training data sets, and was adequately able to predict all four validation data sets. The same procedure was conducted with the microbiota responder vs. non-responder categories, using logistic regression. To assess whether the model predicted temporal stability in general, or responsiveness to dietary intervention specifically, we used the model to predict temporal stabilities in the control samples.

From the total (species- and genus-level) microbiota data, we formed four training-validation data set pairs and performed model selection and validation separately for each data set pair following the same procedure (detailed below). Training set 1 included all studies except Study A, which acted as the validation set; training set 2 included all studies except study B; training set 3 included all but study C; and training set 4 included all but study D. Therefore, we had essentially four training data sets, with four independent validation data sets.

Unsupervised clustering and principal coordinates analysis of the baseline microbiota revealed that the data clustered by study ( Fig. 1A ). The nature of the observed differences in the microbiota composition between the studies suggested a technical rather than a biological basis: the gram-negative bacteria were elevated, and the gram-positive bacteria reduced in studies C and D compared to studies A and B ( Fig. S3 ). The effect of PCR bias or different analytical procedures can be excluded as all samples were processed similarly for the microarray hybridization. Instead, such differences can arise from the use of differentially efficient DNA extraction methods, as the gram-negative organisms become overrepresented with methods that fail to lyse part of the dominant, more recalcitrant gram-positive bacteria. Such suboptimal performance has been reported for the Qiagen kit [24] , even when preceded with short mechanical lysis [17] , which was used in study C. Indeed, the overall diversity, measured by the inverse Simpson diversity index, was significantly lower in study C compared to the other studies, suggesting incomplete DNA extraction. Secondly, the relative amount of Bacteroides spp. is sensitive to storage conditions; their amount is significantly higher in fresh than frozen samples [24] , potentially explaining the higher abundance of Bacteroidetes in samples of study D, which were extracted from fresh samples with mechanical lysis. To eliminate these presumably technical differences that prevented integrated analysis of the cohorts, we normalized the datasets: First, we calculated the total average (log-transformed) signal intensity of each L1 group over all samples (M T ), and average signal intensities for each L1 group in each study (M A , M B , M C , M D ). For each L1 group and study, we then calculated the % difference between the total average (M T ) and the study average as D A = (M A – M T )/M T , D B = (M B – M T )/M T , D C = (M C – M T )/M T , D D = (M D – M T )/M T . The normalized L2 and L3 signals were obtained by multiplying the original values with 1-D for the study and respective L1 group. After normalization, the studies no longer separated in PCO ( Fig. 1B ).

All samples were analyzed with the HITChip microarray, which is designed for the analysis of the human gut microbiota, relies on the identification of the V1 and V6 regions on the 16S rRNA gene, and can detect and quantify the relative abundances of over 1000 species-level (L3) phylotypes. These can be summarized into 130 genus-like groups (≥90% sequence similarity in the 16S rRNA gene; referred to as L2) and further to 23 L1 taxa that represent 10 phyla, the Firmicutes being further divided into Clostridium clusters, uncultured Clostridiales and Bacilli [21] . Probe signals summarized to the above-mentioned phylogenetic levels were used as indicators of bacterial abundance. The microbiota data, generated from fecal samples collected before and after the interventions, were extracted using min-max normalization [22] against an in-house data collection of over 5000 microarray experiments [23] . The microarray data are available from the Dryad Digital Repository: http://doi.org/10.5061/dryad.bv4k7 . To gain normality, the HITChip hybridization signals were log transformed. The Pearson correlation between the baseline and the post-intervention sample, based on the species-level data, was calculated to define the stability of the microbiota for each individual. The stability was used as an indicator of the microbiota responsiveness to dietary intervention and treated in two ways: as a continuous variable, in which case we attempted to predict the exact stability values, or as a categorical variable, including in the responder group those with Pearson correlation <0.87 (n = 14, 18% of the individuals), and in the non-responder group those with Pearson correlation >0.92 (n = 43, 55%). The cut-off values were based on the distribution of the stability values presented in Fig. S2 .

Total blood cholesterol, HOMA (Homeostastic Model Assessment, indicator of insulin sensitivity), and CRP (C-reactive protein, indicator of systemic inflammation) values, measured before and after the intervention, were available for all studies, except CRP for study D, and were used as markers for host responsiveness to the intervention. Blood sampling and analysis have been described previously for studies A and B [16] , study C [18] and study D [19] . Host blood marker values at baseline, and their relative change after intervention are presented in Fig. S1 .

Study D is a British 10-week trial [12] in which the participants (n = 13, all males, age 27–73, BMI 28–51), fulfilling the criteria for metabolic syndrome, consecutively received 3 different diets after a run-in diet for one week. The interventions, each for 3 weeks, included a resistant-starch-enriched diet, a non-starch-polysaccharide-enriched diet, and finally a weight-loss diet, low in carbohydrates and fat, and high in protein. We used the data collected during the run-in diet, and at the end of the weight-loss diet. The DNA was extracted from fresh fecal samples using the FastDNA Spin kit for soil (Qbiogene, Carlsbad, CA).

Study C is a Belgian 12-week trial [18] from which we included the intervention group (n = 13, all females, BMI >30 kg/m2), which received a daily dose of 8g inulin and 8g oligofructose. The fecal samples were stored in −20°C until DNA extraction with the QiAamp Stool DNA Mini Kit (Qiagen, Hilden,Germany). The kit procedure was modified according to Salonen et al. (2010); however, the fecal samples were not beat-beaten, but subjected to mechanical homogenization upon vortex agitation with micro-beads (VWR, Belgium), and the bacterial lysis was improved by heating samples at 95°C for 5 min.

Studies A and B consist of a Finnish 12-week trial with 52 participants (27 females, 25 males, age 40–65, BMI 26–39 kg/m 2 ) fulfilling the criteria for metabolic syndrome [16] . The participants were randomized into two intervention groups: one group (n = 28) ate high-fiber rye bread and whole-grain pasta (hereafter referred to as study A), and the other group (n = 24) substituted grains in their habitual diet with low-fiber, refined wheat bread (study B). The samples were frozen in −70°C until DNA extraction with the Repeated Bead Beating method [17] .

We used three previously published cohorts of Finnish, Belgian and British adults who were obese and/or had metabolic syndrome (n = 78; 71 were obese (BMI over 30 kg/m 2 ), and 7 were overweight (BMI 26–29) and had diagnosed metabolic syndrome). All subjects underwent dietary interventions, which altered the quantity and/or quality of ingested carbohydrates and by doing so, aimed for improved metabolic health and reduced risk for type 2 diabetes. The details of the study designs and diets, inclusion and exclusion criteria as well as the analytical procedures can be found in the original publications specified below. We used microbiota and clinical data collected at the beginning and at the end of each trial.

To confirm that the results were not platform-specific, we included pyrosequencing data in the analysis. The data were derived from fecal samples collected from 28 healthy adults (mean BMI 25) before and after a four-week intervention on brown rice and whole grain barley [27] . Most of the predictive bacteria identified with the HITChip as predictors were not detected in this data set, probably due to their low abundance ( Table 1 ), so we were unable to test the models with the sequencing data. However, for those bacteria, which were detected, the relationship with the microbiota responsiveness were comparable to that found in the HITChip data ( Fig. S8 ).

The CRP response was independent of the microbiota response, but was predicted by a model including the baseline abundances members of Clostridium clusters VI, XI, XIVa, and XVIII ( Table 1 , Table S1 ). The correlations between the predicted and observed CRP responses were between 0.46 and 0.80 in the different validation data sets ( Fig. 4B ).

In each case, one study was left out, while data from the other studies were fitted to the model, which was then used to predict the HOMA and CRP response for the independent data set (A–D). The dashed line represents the ideal situation where observed = predicted.

The HOMA response was not linked to the microbiota response, but was predicted by a model including the baseline abundances of members of Clostridium clusters XVI, and XVIa, Bacilli, and Proteobacteria ( Table 1 , Table S1 ). The correlations between the predicted and observed HOMA responses were between 0.56 and 0.79 in the different validation data sets ( Fig. 4A ).

Panels A, B, C: Three cholesterol response models: cholesterol response predicted by the microbiota stability (panel A), by the baseline abundance of E. ruminantium and C. felsineum (B), and by the baseline abundance of C. sphenoides (C). The data were divided randomly into a training set (75% of the data) and test set (the remaining 25%), and the ROC curves represent the ability of the models, fitted to the training data, to predict the cholesterol response (increase vs. decrease) in the test data. The ROC curve shows the true positive rate ( = sensitivity) against the false positive rate ( = 1-specificity) for the different possible cut points of a diagnostic test. The perfect diagnostic test would have a sensitivity = 1 and specificity = 1, and therefore the area under the curve (AUC) would be 1. A random guess would have a ROC curve following the diagonal; curves above the diagonal indicate that the classifier works better than a random guess. Shaded areas represent 95% confidence intervals for the ROC curve. Panels D, E, F: Comparison of cholesterol response groups (increase vs. decrease), with respect to microbiota stability (D), E. ruminantium and C. felsineum abundance (E), and C. sphenoides abundance (F).

The cholesterol, HOMA, and CRP responses varied widely ( Fig. S1 ), but were not interrelated. The cholesterol response was related to the overall microbiota responsiveness, as the individuals with a responsive microbiota all showed either a decrease (39%) or no marked change (62%) in cholesterol levels, while only 21% of the individuals with a non-responsive microbiota showed a decrease in cholesterol levels, and 23% showed an increase. The stability of the microbiota predicted the cholesterol response in the randomly selected validation data set (with different intercepts for different studies) with an AUC of 96% (95% CI: 89.33%–100%, Fig. 3A, D ). Moreover, the same species, which predicted the microbiota response (E. ruminantium and C. felsineum), predicted the cholesterol response with an AUC of 82.67% (65.17%–100%, Fig. 3B, E ). Finally, a model with only the abundance of the species Clostridium sphenoides and different intercepts for the different studies, predicted the cholesterol response with an AUC of 100% (100%–100%). The abundance of C. sphenoides was significantly (p<0.05) lower in the individuals with an increase in cholesterol levels, as compared to those with a decrease ( Fig. 3C, F ).

Finally, we were interested in identifying the bacterial groups, which could predict the change in bifidobacterial abundance, as many of the diets strongly affected bifidobacteria in some, but not all individuals. The direction and magnitude of change in bifidobacteria was correlated only with their own baseline abundance (Pearson correlation = −0.40, p<0.0001; Fig. S6 ), indicating that intestinal bifidobacterial populations are strongly regulated by negative density dependence.

When treating the responsiveness as a categorical variable, and including only the clear responders (stability <0.87) and clear non-responders (stability > 0.92; Fig. S2 ), the model with the baseline abundances of Eubacterium ruminantium and Clostridium felsineum was able to predict with great accuracy all independent data sets ( Fig. S4 ): AUC (Area Under the Curve) for study A = 98.15% (95% confidence interval: 93.02%–100%); study B = 77.78% (47.92%–100%); study C = 100% (100%–100%); study D = 94.44% (79.05%–100%). The non-responders were characterized by average abundances of both species, while the responders had either very low or very high baseline abundances of E. ruminantium plus C. felsineum ( Fig. S5 ).

A linear model with the baseline abundances of members of Clostridium clusters IV, IX, and XIVa, and Bacilli ( Table 1 ) was able to predict the overall responsiveness of the gut microbiota to all tested dietary interventions, as demonstrated by the strong correlations between the observed and the model-predicted values of microbiota stability ( Fig. 2 ). The parameter estimates are presented in Table S1 .

Discussion

The prognostic value of the gut microbiota This is the first study to explicitly address the individual-specific responses of the human microbiota to interventions, a long-know phenomenon, which has, to date, been treated largely as random noise. Our work revealed that rather than being random, the response of the gut microbiota to dietary interventions can be predicted with high accuracy based on the initial microbiota composition. Previously, the gut microbiota composition has been used to successfully differentiate individuals with type 2 diabetes [28], [29] and IBD [30] from healthy controls, but this is the first study to demonstrate the prognostic value of the gut microbiota. Obesity is a multifactorial state, where host genes, life style and, as recently identified, the gut microbiota [4], [5] interact in a complex and largely unknown way. Predicting how an individual will respond to a dietary intervention is a major challenge with the potential to revolutionize the management of obesity and associated pathologies. Previously, adipose gene expression profiles have been used to predict weight loss response with 80% accuracy [31]. We have, for the first time, provided evidence that intestinal bacteria, our microbial metabolic organ [4], can be used to predict the host's metabolic response to a dietary intervention. These results were found to apply to different types of dietary interventions, ranging from a simple addition of a prebiotic compound (study C), to a change in the type of grains in the diet (studies A and B), to a dietary change entailing profoundly altered macronutrient composition (study D). It remains to be studied whether the gut microbiota composition can be used to predict the response to other types of dietary changes, e.g. in fat content.

Microbiota and host responses are interconnected Our results indicate that some obese individuals gain health benefits from a very simple and easily managed dietary change, while others show no or even adverse responses, and may require more profound treatment approaches. In this cohort, the cholesterol responses were associated with the responsiveness of the gut microbiota: a change in the gut microbiota appeared to be necessary for the cholesterol values to lower. Similarly, Faith et al. (2013) reported, based on sequencing data of healthy US adults, that the change in BMI was associated with changing gut microbiota [32]. Overall, our results confirm the previously found link between the gut microbiota and host lipid metabolism [33], [34], and suggest that the successful improvement of lipid metabolism is associated, and possibly dependent on, a change in the gut microbiota composition. The responsiveness of the microbiota appears to be a separate phenomenon from the temporal dynamics in the absence of intervention, as our models were unable to predict the temporal stability of the microbiota in control individuals. This suggests that these two traits are determined by different factors. Responsiveness to a dietary change may, for example, reflect the primary response of nutritionally specialized microbes, or indirect effects due to cross feeding and/or competition. Temporal dynamics in the absence of any specific stimulator or disturbance, in turn, may reflect e.g. oscillatory dynamics due to density-dependent feedback (see 4.4) or other reasons.

Predictive organisms may be bioindicators Most strikingly, the cholesterol response could be predicted from the abundance of a single species, Clostridium sphenoides, measured from the fecal sample before the dietary intervention. A decrease in cholesterol levels was observed mostly among the individuals with high C. sphenoides abundance. Furthermore, the abundance of C. sphenoides was in general decreased in our obese study subjects as compared to healthy controls (Table 1). Obese individuals with a “healthy” abundance of C. sphenoides thus appear to benefit even from simple dietary interventions in terms of lipid metabolism, while those with abnormally low abundance do not. The abundance of C. sphenoides was not associated with the absolute levels of cholesterol (data not shown), and therefore may not be directly associated with cholesterol metabolism, but may rather be an indicator of a gut ecosystem which, upon improved diet, can contribute positively to host lipid metabolism. Very little is known about the two organisms, which predicted the responsiveness of the microbiota (C. felsineum and E. ruminantium). E. ruminantium belongs to the family Lachnospiraceae, has originally been isolated from bovine rumen, but is also part of the human intestinal microbiota [35]. It is xylanolytic and produces mainly formic acid, but also butyrate [36]. C. felsineum (family Clostridiaceae) is a pectinolytic butyrate-producer [37]. Hence, both bacteria occupy the most common niche in the gut, degradation and fermentation of indigestible carbohydrates. The predictive bacteria identified in this study were present at a very low abundance. Only the relative abundance of Oscillospira guillermondii-group, which itself was not predictive but modulated the effects of the predictive organisms (Table S1), was above 1% (Table 1). While high analytical depth is required to detect such minorities, their functional relevance should not be overlooked. As an example, the acetogens, methanogenic archaea, and sulfate-reducing bacteria, which dispose the colonic hydrogen gas generated during fermentation, are low in abundance, but critical for the functioning of the gut ecosystem [38]. It is very likely that the organisms we found are not per se causative of the responsiveness (of the host or the microbiota), but may rather be indicator species, particularly sensitive to the environment and therefore informative of important structural or functional differences between ecosystems, which lead to the differential responses. We acknowledge that the accurate identification of species-level phylotypes with the microarray cannot be ascertained, and hence the true identities of the implicated organisms need to be validated in further studies. Clostridial species dominate the list of predictive organisms (Table 2). Bacteroidetes were notably non-predictive, as was the Bacteroides/Prevotella ratio. This is somewhat surprising as both of these genera are, in parallel to above-mentioned Clostridiales, active degraders of dietary polysaccharides that were essential components of all intervention diets. The finding is interesting also in the light of the discussion about enterotypes, which have been defined largely by the abundance of the genera Bacteroides and Prevotella [39]. Our findings suggest that the major determinants of the inter-individual differences of the gut microbiota may not be the most relevant for predictive purposes. Microbiota richness has been positively associated with the microbiota responsiveness to weight loss diets in obese individuals [40], but in our study, species richness, or diversity, was not associated with the responsiveness. However, most of the diets in our study were not weight-loss diets, which may explain the difference. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 2. Predictive organisms are mostly clostridia. https://doi.org/10.1371/journal.pone.0090702.t002

The importance of non-linear relationships and density dependence Many of the predictive taxa showed non-linear associations with the host and microbiota responsiveness, which would have been missed, had we allowed only linear associations. Non-linear relationships abound in nature. For example, species responses to environmental gradients are very often unimodal, rather than linear [41]: there is a certain preferred level, below and above which the species does poorly. Instead of the low vs. high abundance of a given bacterium, we found that the important distinction was often between individuals with average vs. extreme, either low or high, abundances. It is possible that the extreme abundances of the identified predictor species indicate a shift in ecosystem function, and the magnitude of the shift, rather than the direction, is of prognostic relevance. A disturbance may reduce the abundances of some species, allowing others to overgrow. The direction of the shift in competitive balance may be relatively random between individuals, depending on subtle differences in the ecosystem structure, and hence, may be less important than the magnitude. Moreover, we present evidence of negative density dependence regulating the bacterial populations in the human intestine: The lower the baseline abundance of Bifidobacterium spp., the more they increased during the interventions, and vice versa (Fig. S6). This is a long-known phenomenon observed in prebiotic interventions aiming for specific increase of bifidobacteria [42]–[44]. These results indicate that ecological interactions within the microbiota, such as intra-specific competition or phage density, act in parallel to the intervention effects, or even override them. Yet, the importance of baseline abundances have so far been ignored in the community-wide microbiota analyses following dietary interventions. Negative density dependence was evident in all bacterial groups, not only in bifidobacteria (data not shown), which explain more generally why certain intestinal bacteria respond to dietary changes in some, but not all individuals, as noted in numerous studies (e.g. [16], [18], [45]). Hence, when assessing the effect of an intervention on a given bacterial group, we recommend including the baseline abundance in the analysis to control for the impact of density dependence.

Data normalization From the methodological perspective, our study is the first to demonstrate how the knowledge of sample processing effects can be utilized retrospectively, enabling meta-analysis or comparison of samples that have been treated differently in the pre-analytical phase. In this study, all samples were analyzed with the same microarray platform with identical primers and workflow. Therefore, the observed differences in the relative share of gram-negative and gram-positive bacteria are likely to originate from differences in DNA extraction and storage procedures. As true biological differences cannot be excluded, the validity of our normalization approach should be confirmed experimentally e.g. in the context of the International Human Microbiome Standards-project (http://www.microbiome-standards.org/). Especially in the absence of standardized procedures, validated data normalization represents an attractive strategy to facilitate efficient and reliable use of the accumulating wealth of human microbiome data sets.