Study design

First, the study was submitted to the University of Pittsburgh and University of KwaZulu-Natal Institutional Review Boards for approval. We employed a unique design (Table 1 and Supplementary Note 1), where 20 healthy middle-aged African Americans and 20 rural Africans (Supplementary Tables 10: Demographics) from the same communities previously studied5 were studied first for 2 weeks in their own HE, eating their usual food (HE study), and then again in-house while they were fed the intervention diet for 2 weeks (dietery intervention) study). Consequently, each subject served as his/her own control, which is important given the known wide individual variation in colonic microbiota composition. We chose intervention diets that were at the same time palatable and contained reverse quantities of fibre and fat, such that African Americans would be given ‘African style’ foods increasing their average fibre intake from 14 to 55 g per day and reducing their fat from 35% to 16% of total calories, whereas Africans were given a ‘western-style’ diet reducing their fibre from 66 to 12 g per day and increasing their fat from 16% to 52% (Supplementary Tables 1–3). Cogniscient of the problems of compliance to acute dietary change and the accuracy of dietary recall to estimate actual intakes within the community, we elected to perform all the dietary intervention studies in-house, where meals could be prepared and given under close supervision. With African Americans, participants were housed the University of Pittsburgh Clinical Translational Research Center, and with rural Africans we employed in a rural lodging facility, close to their homes, with full kitchen facilities. Body weights were maintained within 2 kg by adjusting food quantities while keeping the overall macronutrient composition the same. The sampling schedule is given on Table 1, showing that fresh fecal samples were taken at three intervals during the HE study and again three times after the diet switch. Colonoscopy was performed to identify latent disease, polyps, or cancer, to obtain 3 h colonic evacuates for analysis and biopsies for biomarkers of cancer risk4 at the beginning of the HE study and end of the dietery intervention study. Full details of the menus, cooking methods and total dietary compositions are given under Supplementary Tables 2, 3 and 4a–c, and Supplementary Note 2).

Subjects and recruitment

Age- and sex-matched healthy volunteers, with the age range 50–65 years, were randomly selected from the African American population in the Pittsburgh region of Pennsylvania and from the rural native South Africans from the rural Kwazulu region. We collaborated with Dr Stephen Thomas, Director of Minority Studies at the University of Pittsburgh School of Public Health, to recruit healthy African American volunteers from the Pittsburgh region, and also advertised the study with the approval of our Institutional Review Board in public areas. In South Africa, volunteers were be recruited through advertisements placed in public community centres, for example, post offices, town halls, civic centres and through the iZulu Community Health Center. Appropriate compensation (as advised by Minority studies and the iZulu Community and ratified by the University of Pittsburgh and KwaZulu-Natal) for time and testing was paid to volunteers for participation.

Screening

Informed signed consent was taken from each participant. All African volunteers could understand English, but a nurse-translator participated in the consent process to ensure proper understanding of the details of the research procedures. Screening was performed in Pittsburgh at the Clinical Translational Research Center and in Africa at Ngwelezana Hospital outpatient clinic, Empangeni, KwaZulu-Natal, South Africa. A detailed medical history was first taken. With rural Africans, a local bilingual nurse acted as an interpreter. A 20-ml blood sample was taken for full blood count, ESR, electrolytes and urea, albumin, alkaline phosphatase, AST and bilirubin. If the results were normal and if they satisfied the eligibility criteria, they were invited to participate in the study.

Subject eligibility

Details are given under Supplementary Note 1. Inclusion criteria were as follows: healthy volunteers, from GI standpoint between 40 and 65 years (age at which colon cancer screening/colonoscopy is recommended in this population) and body mass index between 18 and 35 kg m−2 (Supplementary Discussion). Exclusion criteria were as follows: participants pre-colonoscopy were ineligible if they had a history of familial adenomatous polyposis, hereditary non-polyposis colorectal cancer, inflammatory bowel disease or invasive cancer within 5 years before enrollment (h/o adenomatous polyps acceptable). In addition, ineligible participants were individuals with known renal, hepatic or bleeding disorders; previous GI surgery resulting in disturbed gut function due to of loss of bowel or altered anatomy, or any form of chronic GI disease resulting in disturbed gut function, diarrhoea and malabsorption; and individuals with antibiotic use within the past 12 weeks (Supplementary Discussion), current steroids use or with diabetes. Exclusion criteria post colonoscopy were detection of previously unrecognized ulceration (with depth and >0.5 cm), stricture, severe inflammation and polyps >1 cm diameter or cancer.

Fecal, colonic and mucosal sampling and colonoscopy

To synchronize the measurements of the microbiota, the metabolome and the colonic mucosa, the sampling was tied to the preparation for and conduct of a colonoscopy, as previously described4, at baseline while on their usual diet (ED1, Table 1), and then again at the conclusion of the dietary change (ED2, Table 1). In this, fresh fecal samples are collected before colonic evacuation and immediately frozen at −80 °C and the total colonic contents are collected for 3 h during evacuation with a simple polyethylene glycol solution. In this, 2 l of polyethylene glycol (60 g l−1, average molecular weight 3350g/mol) solution was consumed rapidly as possible over 30 min. We avoided using the commercial preparation ‘Golytely’, as it contains sodium sulfate, which is known to disturb the microbiota composition. Our experience had shown that the quality of bowel preparation with this technique was similar to the more conventional overnight bowel washout. The mucosa health status was assessed by visualization and biopsy. Polyps, when encountered, were removed per standard practice and biopsies were taken from normal mucosa from the proximal (caecum/ascending colon), mid (transverse) and distal (sigmoid) colon at 25 cm from the anal verge. Mucosal samples for immunohistochemistry were collected in formalized saline and for gene expression in RNAlater (Qiagen, Germantown, MD) before being stored at −80 °C.

Measurement of mucosal biomarkers

Histology. Colonic mucosal biopsies were obtained by colonoscopy before and after dietary switch from three different sites (ascending, transverse and descending) and stored in 10% buffered formalin. Later, the biopsy samples were embedded in paraffin and 5-mm sections were cut and stained with either routine haematoxylin and eosin (H&E) or immunohistochemical stains (see below). The histologic findings on the H&E-stained sections were evaluated by one blinded, experienced gastroenterological histopathologist (AK). Measurements focused on the numbers of inflammatory cells and eosinophils in the lamina propria and the numbers of intraepithelial lymphocytes. Scoring of the H&E-stained section was done as shown in Supplementary Table 5a. Any pathologic finding, such as the presence of parasitic organisms, was recorded.

Immunohistochemistry

Slides for CD3 and Ki67 staining were deparaffinized at 60 °C for 2 h. To inhibit endogenous peroxidase, the slides were pre-treated using 3% hydrogen peroxide/methanol at room temperature (RT) for 10 min, followed by antigen retrieval with 0.2% pepsin solution (P7012, Sigma, 3050 Spruce St, St Louis, MO, USA) at 37 °C for 10 min. Serum Free Protein Block (X0909, Dako, 6392 Via Real, Carpinteria, CA 93013, USA) was used at RT for 10 min. The slides were drained and incubated with primary antibodies CD3 and Ki67 (A0452, 1:100, rabbit polyclonal, Dako; MIB-1, 1:100, mouse monoclonal, Dako), respectively, at RT for 1 h. Secondary detection was applied using Immpress universal antibody Polymer detection kit (MP-7500, Vector Labs, 30 Ingold Road, Burlingame, CA 94010, USA) at RT for 30 min. The slides were stained with DAB substrate kit (SK-4100, Vector Labs) for 10 min and counter stained using Shandon Hematoxylin (6765015, Thermo Scientific, 81 Wyman St, Waltham, MA 02451, USA).

Slide staining for CD68 was done on a Ventana Benchmark Ultra slide stainer. Deparaffinized slides were pretreated using the ultra CC1 (950-224, Ventana, 1910 Innovation Park Dr, Tucson, AZ 85755, USA) for 24 min for antigen retrieval. Slides were incubated with the primary antibody CD68 (M087601, 1:100, mouse monoclonal, Clone PG-M1, Dako,) for 32 min RT. The Optivew DAB kit (760-700, Ventana) was used for secondary detection.

Quantification of immunohistochemical staining

Counting of the proportions of positive staining cells using light microscopy at × 400 magnification was performed by a single investigator (KM), under blinded conditions. To assess inter-observer variability, 40 slides were randomly selected and recounted by a second senior pathologist (AK), showing a concurrence of 88% for Ki67+ and 80% for CD68+ densities.

Ki67

The proportion of Ki67-positive staining cells were counted in well-oriented crypts (average/slide 8, range 4–14). Ki67 proliferation rate was defined as the number of Ki67+ cells divided by the total number of crypt cells and were expressed as percentage. The differences were found to be the same in the total crypt and in the upper crypt; hence, only the total crypt proportions are reported here.

CD3

Only CD3+ staining intraepithelial lymphocytes were counted in a representative area of at least 300 epithelial cells. The density of intraepithelial lymphocytes was expressed as an index of number of CD3-positive lymphocytes per 100 epithelial cells.

CD68

The number of CD68-positive cells (macrophages) within the lamina propria were counted and graded on a scale from 1 to 3: grade 1 (none/rare), grade 2 (scattered superficial collections) and grade 3 (strong, diffuse or band-like infiltrate in the superficial lamina propria) as shown in Fig. 1c.

Targeted analysis of fecal and colonic microbes and metabolites of special interest

Details of the materials and methods used for the collection, preparation and analysis of fecal samples for targeted analysis of microbes of special interest (real-time quantitative PCR) and their metabolites (Agilent Technologies 6890N Network GC System with a flame-ionization detector for short-chain fatty acids and Shimadzu HPLC–mass spectrometry for quantification using electrospray ionization in negative ion mode by monitoring the (M–H)− ion for bile acids) in African Americans and rural Africans have been previously published5. Justification for the use of the BcoA functional gene for butyrate production is that Flint’s group in Aberdeen have demonstrated that although there are a number of different metabolic pathways that use different enzymes that culminate in butyrate synthesis, butyryl-CoA:acetate CoA-transferase, the product of the BcoA gene, is responsible for the last step in butyrate synthesis in the vast majority of intestinal butyrate producers13. Similarly, there are other microbial enzymes that participate in bile acid deconjugation, but Wells et al.23 demonstrated that there was a good correlation between human fecal bacterial dehydroxylating activity measured by fecal dilution assay and their PCR assay for the baiCD gene, which encodes the key enzyme responsible for the bile acid 7α-dehydroxylation pathway.

Plasma amino acids

Fasting blood concentrations were measured in extracted plasma by reverse-phase C-18 precolumn derivatization HPLC (AccQ·Tag Ultra derivatization, Waters, Milford, MA) as previously described5.

Statistical analysis

Statistical analysis of the group differences in continuous variables was conducted using SPSS 16.0 (SPSS Inc.). The significance of group differences for normally distributed data was assessed with unpaired and paired Student’s t-tests. The non-parametric data were analysed with a Mann–Whitney U-test or Kruskal–Wallis one-way analysis of variance (ANOVA) by ranks for unpaired data and Wilcoxon signed-rank tests for paired data. The significance of the association was evaluated with Spearman’s rank correlation test. A level of P<0.05 was accepted as statistically significant. Data are presented as means ±s.e. Complex microbiota and metabonome data were analysed by several multivariate ordinations detailed below (principal component analyses and non-metric multidimensional scaling), Kruskal–Wallis independent tests and multivariate ANOVA with Bonferroni correction.

Global analysis of the microbiota composition and diversity

We chose the HITChip phylogenetic microarray for the global profiling of microbiota composition. It has been demonstrated that the HITChip analysis of fecal samples provides highly concordant results concerning the microbiota composition when compared with 16S rRNA gene or metagenome sequencing59,60,61,62, as it allows deep profiling of phylotypes at high resolution, down to <0.1% relative abundance, corresponding to a duplicated set of 100,000 pyrosequencing reads per sample with very high reproducibility (>98%)59 and at considerably lower cost.

HITChip analysis

DNA was isolated from fecal or colonic samples and subsequently used for phylogenetic profiling of the intestinal microbiota using the HITChip phylogenetic microarray16. Standardized quality control was maintained through our library of a duplicated set of 3,631 probes targeting the 16S rRNA gene sequences of over 1,000 intestinal bacterial phylotypes. Briefly, the full-length 16S rRNA genes were amplified, and PCR products were transcribed in vitro into RNA, labelled with Cy3 and Cy5, and fragmented. Hybridizations were performed in duplicate and data were extracted from microarray-scanned images using Agilent Feature Extraction software, version 10.7.3.1 (http://www.agilent.com). Array normalization was performed as previously described, using a set of custom R scripts (http://r-project.org) and stored in a custom MySQL database (http://www.mysql.com). Duplicate hybridizations with a Pearson correlation >98% were considered for further analysis. Microbiota profiles were generated by pre-processing the probe-level measurements with minimum–maximum normalization and the frozen Robust Probabilistic Averaging probe summarization62,63 into three phylogenetic levels: level 1, defined as order-like 16S rRNA gene sequence groups; level 2, defined as genus-like 16S rRNA gene sequence groups (sequence similarity >90%); and level 3, phylotype-like 16S rRNA gene sequence groups (sequence similarity >98%). In the present work, we primarily focus on the genus-level (level 2) variation. Significance of the differences between the time points (Fig. 3a) were estimated based on a (paired) linear model for microarrays (limma) with the threshold of false discovery rate (FDR)<0.2 estimated based on Benjamini–Hochberg procedure and a minimum fold change of 25% (0.1 at the Log10 scale)64.

Microbial co-occurrence network analysis

We constructed the co-occurrence networks between the 130 genus-like bacterial groups based on their logarithmic abundances (HITChip log10 signal) within each treatment group (African Americans and Native Africans; before and after the dietary intervention). For a robust correlation analysis, we applied the SparCC algorithm using 20 iterations and with 50 bootstrap data sets for significance testing, followed by q-value65 correction of the pseudo P-values from the bootstrap analysis. We focused on the significant (|r|>0.5; q<0.01) correlations between the genus-like groups where a qualitative change (change of sign) in the correlation was observed following the dietary intervention. To ensure robust analysis, we included only the correlations that changed drastically in the intervention, with a difference of >1 between the correlation values before and after intervention (so that the correlations would change for instance from −0.5 to 0.5, or higher). This provided a list of genus-like bacteria that had significant changes in their mutual correlation networks following the dietary intervention in either rural African or African American group. To simplify interpretation, we clustered the associated genus-like groups into coherent network modules with complete-linkage hierarchical clustering based on the SparCC correlations. We defined a module as a connected sub-network where the correlations between all genus-like groups within the module are r>0.5. Networks were visualized using the network visualization and exploration platform Gephi66. The resulting modules are highlighted in Fig. 3b.

Global analysis of the metabolome: sample preparation for NMR spectroscopic analysis

All urinary samples were thawed at RT and vortexed for 10 s. A total of 400 μl of urinary sample was thoroughly mixed with 250 μl of 0.2 M sodium phosphate buffer containing 20% D 2 O, pH 7.4, 0.01% 3-(trimethylsilyl)-[2,2,3,3-2H 4 ]propionic acid sodium salt and 3 mM sodium azide (NaN 3 ). The mixture was subsequently centrifuged at 10,000g for 10 min and 600 μl of supernatant was transferred into an NMR tube with an outer diameter of 5 mm. Approximately 200 mg of wet fecal sample was mixed with 600 μl of H 2 O (HPLC grade) in a 1.5-ml Eppendorf tube. A cycle of 30-s vortexing, 5-min sonicating at 4 °C and 30-s vortexing was then carried out on the mixture, followed by centrifuging at 10,000g for 10 min. A total of 400 μl of supernatant was added into a 1.5-ml Eppendorf tube containing 250 μl of aforementioned sodium phosphate buffer, vortexed and spun at 10,000g for 10 min. An amount of 600 μl of supernatant was put into an NMR tube.

1H NMR spectroscopy

1H NMR spectra of urinary and fecal water extract samples were acquired using a Bruker 600 MHz spectrometer (Bruker, Rheinstetten, Germany) at the operating 1H frequency of 600.13 MHz at a temperature of 300 K. An NMR pulse sequence (recycle delay -90°-t 1 -90°-t m -90°-acquisition) and standard parameters were used to obtain standard one-dimensional 1H NMR spectral data as described in Beckonert et al.67

Multivariate statistical data analysis

1H NMR spectra of urine and fecal extracts were automatically phased, referenced to 3-(trimethylsilyl)-[2,2,3,3-2H 4 ]propionic acid sodium salt at 1H δ0.00 and baseline-corrected using an in-house developed MATLAB script (Dr Tim Ebbels, Imperial College London). The processed NMR spectra (1H δ 0 to 10) were imported to MATLAB (R2012a, MathWorks) and digitized into 20k data points with a resolution of 0.0005 p.p.m. using an in-house developed script. The water peak region in urine (4.7–5.14 p.p.m.) and fecal water spectra (4.7–5.18 p.p.m.) were removed to minimize the effect of the disordered baseline caused by water suppression. In addition, regions (1H 0–0.3) in urine spectra and regions (1H 0–0.25) in fecal water spectra containing only noise were therefore removed, together with the urea signal in urine (1H 5.48–6.24). Because of the heavy peak shifting in urinary spectra, a recursive segment-wise peak alignment method was applied, to improve metabolic biomarker recovery68. Probabilistic quotient normalization was subsequently performed on the resulting data sets, to account for dilution of complex biological mixtures69. Principal component analysis and orthogonal partial least-squares discriminant analysis was carried out with a unit variance scaling method in SIMCA (P+13.0) and MATLAB software. P-values of metabolites were calculated using ANOVA and FDR was also calculated for each metabolite marker70. For the two-way clustered heatmap (Fig. 4b) between fecal microbiota and metabolite concentrations in urine and feces, the partial correlation was calculated for each phylotype and metabolite. The correlation was adjusted for country and dietary intervention to account for baseline differences. A pFDR cutoff value of 0.3 was used to define the significance (denoted by ‘+’), to account for the different numbers of samples in each data set. For each data set (microbiota, fecal metabolites and urinary metabolites), the data were clustered using hierarchical cluster analyses with complete linkage. The optimal number of clusters was chosen as the splitting that maximized the modularity using the correlation as weight of each edge in the network.

Metabolic reaction network generation

Using the freely available MetaboNetworks software19 reactions that occur spontaneously or by means of enzymes linked to human and/or bacterial genomes were identified in the KEGG. All complete genome sequences listed in KEGG that could be associated to the bacterial groups represented on the HITChip were included in creating the database, resulting in a total of 817 bacterial genomes included in the database. Next, the software was used to construct a shortest-path metabolic reaction network between all (fecal and urinary) metabolites significantly different in any of the different dietary comparisons. From this ‘global network’, two sub-graphs were generated for each of the two dietary switches using significantly associated fecal metabolites. To highlight the differential expression of the metabolites associated with specific pathways, the background shading was added to the graphs to indicate the different interconnecting pathways (Fig. 5a–c). The abbreviations and full names of these metabolites can be found in Supplementary Table 11.