Ethics statement

The collection of samples from paediatric patients was approved by the Research Ethics Board of the Children’s Hospital of Eastern Ontario (CHEO). The protocol for the use of Il10−/− mice was approved by the Institutional Animal Care and Use Committee of University of North Carolina at Chapel Hill and the University of Florida. Written informed consent was obtained from the participants or parents.

Study design and sampling

We conducted a cross-sectional study of all eligible patients under 18 years of age scheduled to undergo colonoscopy for their initial diagnostic work-up for suspected IBD. Those who were not diagnosed with IBD acted as controls. Exclusion criteria relating to known conditions affecting the intestinal microbiota composition included: (1) a body mass index greater than the 95th percentile for age; (2) diabetes mellitus (insulin and non-insulin dependent); (3) infectious gastroenteritis within the preceding 2 months; and (4) use of any antibiotics or probiotics within 4 weeks before colonoscopy, or (5) irritable bowel syndrome. These same exclusion criteria were applied to the non-IBD control group. All IBD cases met the standard diagnostic criteria following thorough clinical, endoscopic, histologic and radiological evaluation39. Disease location was based on endoscopic and radiologic appearance and characterized using the Paris modification of the Montreal Classification for IBD40. Clinical disease activity of CD was determined using the PCDAI41 and of UC using the PUCAI (Pediatric Ulcerative Colitis Activity Index)42. All the controls had macroscopically and histologically normal mucosa, and did not carry a diagnosis for any known chronic intestinal disorder (for example, celiac disease, eosinophilic enterocolitis, irritable bowel syndrome). Subject clinical data were collected and managed using REDCap (Research Electronic Data Capture) hosted at the CHEO Research Institute. REDCap is a secure, web-based application designed to support data capture for research studies43.

Mucosal–luminal interface samples were collected from the right (ascending) colon (RC); n=94, 63 and 37 for CD, control and UC, respectively at the time of diagnostic colonoscopy. Colonoscopy preparation was done as per standard protocol modified to one day44. During colonoscopy, once the cecum and proximal ascending colon was reached, any loose fluid and debris was aspirated. Thereafter, sterile water was flushed onto the mucosa to dislodge the mucus layer from mucosal epithelial cells and the mixture of water, mucus and intestinal cells was aspirated into a sterile container through the colonoscope. These samples were immediately placed on ice in the endoscopy suite and immediately transferred to the lab for processing and storage at −80 °C. In addition, mucosal biopsies (n=21 and 10 for CD and control, respectively) were collected from macroscopically involved and non-involved (if the latter were identified) areas of the mid right colon. The biopsies were flash-frozen on dry ice in the endoscopy suite and immediately stored at −80 °C until further processing.

Metagenomic DNA extraction

Metagenomic DNA was extracted using the Fast DNA Spin Kit (MP Biomedicals, Solon, OH, USA) and a FastPrep-24 instrument (MP Biomedicals) with two mechanical lysis cycles at speed 6.0 m s−1 for 40 s, with 5 min cooling on ice between the two cycles. Next, the DNA was isolated and purified as per the Fast DNA Spin Kit protocol. The extracted DNA was quantified by Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) and stored at −20 °C until use.

16S rDNA-V6 library construction for Illumina sequencing

The V6 region of 16S rDNA was amplified using two successive PCR reactions. The first PCR added the Illumina paired-end sequencing adaptors and the barcode sequences using modified universal 16S rDNA-V6 primers45 (Supplementary Table 2, Supplementary Fig. 16). Each reaction was prepared in a total volume of 50 μl using 50 ng of the extracted DNA, 0.5 μM of each primer and 1 × Phusion Flash High-Fidelity PCR Master Mix (Thermo Scientific, Vilnius, Lithuania). The reaction was heated to 98 °C for 30 s and then subjected to 10 cycles of 98 °C for 5 s, 61 °C for 15 s with 1 °C drop each cycle and 72 °C for 15 s, followed by additional 15 cycles with an annealing temperature of 51 °C for 15 s and a final extension at 72 °C for 5 min. The second PCR was carried out using 10 μl of the first PCR products in a final volume of 50 μl using the primers PCRFWD1/PCRRVS1 (Supplementary Table 2). The PCRFWD1/PCRRVS1 primers are complementary to the flow cell primers at the 5′ end and Illumina paired-end sequencing adaptor at their 3′ end. The second PCR conditions were 1 min at 98 °C, 15 cycles of 10 s at 98 °C, 30 s at 65 °C and 30 s at 72 °C followed by a final extension step at 72 °C for 5 min. The amplicons of each sample were visualized on 1.5% agarose gel and purified using a Montage PCR 96 Cleanup Kit (Millipore, Billerica, MA, USA). Afterwards, the DNA concentration in each reaction was quantified using the Qubit fluorometer (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. Finally, equimolar quantities of the amplicon from all the samples were pooled, gel purified using a QIAquick Gel Purification Kit (Qiagen, Hilden, Germany) and sent to The Center for Applied Genomics, Toronto, for 100 bp paired-end Illumina HiSeq2500 sequencing.

Illumina sequencing of pure culture bacterial cultures

Pure cultures of various bacterial species (Actinomyces odontolyticus ATCC 17424, Campylobacter jejuni NCTC 11168, Enterococcus faecalis ATCC 14433, Fusobacterium nucleatum ATCC 25586, Lactobacillus rhamnosus ATCC 7464, Streptococcus cristatus clinical isolate) were grown as required under either anaerobic, microaerophilic (10% O 2 , 5% CO 2 , 85% N 2 ) or aerobic conditions at 37 °C. DNA for each sample was extracted using the Fast DNA Spin Kit as described above (hereafter referred to as ‘pure’ samples) and Sanger sequenced to confirm the identity of each bacterial species. In addition, aliquots of the extracted DNA were repurified a second time using the FastDNA kits to assess the impact of potential kit contamination (hereafter referred to as ‘re-extract’ samples). 16S rDNA-V6 libraries for all the samples were constructed, with triplicate libraries constructed for the re-extract samples and sent for Illumina sequencing as described above.

Illumina microbiota sequencing data analysis

Raw data were first analysed for decreasing quality along the read. As expected, we noted decreasing quality near the end of the reads. Given that our paired end reads are expected to overlap by ∼21–25 nt (Supplementary Fig. 16), we trimmed four nucleotides from the 3′ end of the sequences using the fastx_trimmer script (http://hannonlab.cshl.edu/) to remove error-prone bases that could result in incorrect read merging. To note, this trimming step reduced the total number of singletons/doubletons in the final rarefied OTU table by 5% (data not shown). The trimmed paired-end sequences were merged using Fast Length Adjustment of Short reads46. More than 95% of the reads were successfully merged, while the sequences that failed to merge were discarded. The merged reads were then quality filtered with a minimum quality score of 20 over 90% of the sequence using the fastq_quality_filter command from the Fastx toolkit (http://hannonlab.cshl.edu/). High-quality reads were demultiplexed according to the forward and the reverse barcode sequences and the barcode trimmed using NovoBarCode (Novocraft.com). Afterwards, the reads were converted from fastq to fasta and were fed to QIIME 1.8.0 (ref. 47) to determine the taxonomic and diversity profiles of the samples. First, reads were clustered into OTUs using a closed-reference OTU picking workflow against the Greengenes reference set (release 4 February 2011) based on an average percentage of identity of 97%. Note that the same Greengenes database was used for the Illumina, Ion Torrent and 454 pyrosequencing data sets discussed below. Singletons and doubletons were removed and a table of OTU counts per sample was generated. Next, the OTU table was randomly subsampled to a total number of reads per sample of 200,000. The resulting rarefied OTU table was used to analyse the microbiota structure and diversity using the microbial ecology tools available in the QIIME package and as input for all other downstream analyses. Any further analysis-specific filtering of the input OTU table is described within each individual ‘Methods’ section. The phylogenetic tree was constructed using the FastTree method of QIIME 1.8.0 and the generated file was exported to the Interactive Tree of Life (iTOL) software48,49 to view and format the constructed tree.

Illumina pure culture sequencing data analysis

The raw reads obtained for the pure culture sequencing experiments were processed as described above for the Illumina metagenome analysis to generate an OTU table rarefied to 200,000 reads per sample and analysed in phyloseq50. Each pure culture consisted of single ‘dominate’ OTU representing with a median of >99% of the total reads and with a taxonomic classification matching the expected bacteria at the genus level (Supplementary Fig. 5, Supplementary Data 5). Any OTUs present with a taxonomic classification matching the expected bacteria at the family level were considered to be potential sequencing errors. The reads that were matched against these erroneous OTUs were extracted and re-aligned using BLASTN against the dominant OTU to determine the PCR/sequencing error rate (Supplementary Fig. 6, Supplementary Data 5). OTUs present in >90% of the samples (in either the pure or re-extract sample sets) were considered to be background contamination originating from either the DNA extraction kits or reagents used during PCR amplification (Supplementary Fig. 5).

Library construction for Ion Torrent sequencing

The V4 or V6 region of 16S rRNA was amplified in a single PCR reaction that incorporated both the Ion Torrent adaptors and an 11 bp barcode sequence using modified universal 16S rDNA-V4 (ref. 51) and V6 (ref. 45) primers (Supplementary Table 3). Note that the conserved V6 sequence used for the Ion Torrent sequencing is identical to those used in the 16S rDNA-V6 Illumina library construction described above. Each reaction was prepared in a total volume of 50 μl using 50 ng extracted metagenomic DNA, 0.5 μM of each primer, and 1 × Phusion Flash High-Fidelity PCR Master Mix. For the V6 reactions, the PCR cycling conditions were identical to the first PCR reaction described in the 16S rDNA-V6 Illumina library construction ‘Methods’ section. For the V4 reactions, the reaction was heated to 98 °C for 30 s and then subjected to 34 cycles of 98 °C for 30 s, 55 °C for 30 s and 72 °C for 90 s, followed by a final extension at 72 °C for 5 min as previously described51. The samples were purified using Purelink PCR purification columns and quantified using a Qubit fluorometer (Life Technologies). PCR products were visualized on a 1.5% agarose gel to ensure successful amplification and that the products were the expected size. Two hundred nanograms of each sample was subsequently pooled together for both the V4 and V6 libraries. The pooled libraries were size-selected using Agencourt AMPure XP DNA purification beads to remove primer dimers. The molar concentrations of each library pool were determined using an Agilent Bioanalyzer and the V4/V6 libraries were combined in an equimolar ratio. This equimolar V4/V6 pool was subjected to emulsion PCR using the Ion OneTouch HiQ Template Kit and sequencing of the templated beads was completed on a single Proton Chip on an Ion Torrent Proton using an Ion Proton HiQ sequencing kit according to the manufacturer’s instructions (Life Technologies).

Ion Torrent sequencing data analysis

Ion Torrent sequencing was done for both the V4/V6 regions from 36 right colon samples (n=13, 12, 11 for control, CD and UC). The unaligned BAM file generated by the Ion Torrent Proton (note that adaptor sequences/primer dimers are automatically removed by the Proton analysis pipeline) was downloaded, converted to fastq format and reads split according to the expected sizes for the V4/V6 regions plus the barcodes (234–266 and 162–194, respectively, representing a ±16 bp window), using samtools52 in conjunction with bioawk (https://github.com/lh3/bioawk). The reads were quality trimmed using the modified Mott trimming algorithm implemented in seqtk (https://github.com/lh3/seqtk) and reads that were shorter than the expected size for either the V4 or V6 regions (as described above) after trimming were discarded. The V4/V6 reads were then demultiplexed using FastqMultX53, converted from fastq to fasta and fed to QIIME 1.8.047 to determine the taxonomic profiles of the samples. The reads were clustered into OTUs using a closed-reference OTU picking workflow against the Greengenes reference set (release 4 February 2011) based on an average percentage of identity of 97%. The singletons/doubletons were removed from both the V4/V6 OTU tables and the number of reads/sample was normalized across sample groups (100,000 reads per sample for V4 and 200,000 reads per sample for V6). The rarefied tables were used to calculate alpha/beta diversities in phyloseq50 and to identify the microbial biomarkers for each tested category using the linear discriminant analysis effect size algorithm11 with the default parameters. To compare the Ion Torrent sequencing results to the Illumina results, the Illumina V6 data from the samples subjected to Ion Torrent sequencing was extracted from the Illumina rarefied OTU table (note that the Illumina V6 and Ion Torrent V6 samples were rarefied to the same depth) and analysed as described above for the Ion Torrent data.

16S rDNA-V6 454-pyrosequencing

The 454-pyrosequencing sequencing library of the 16S rRNA-V6 region was constructed using a two-step PCR strategy using the primers; 16SF 5′-AAACTCAAAKGAATTGACGG-3′ and 16SR 5′-ACGAGCTGACGACARCCATG-3′ (ref. 45). Ten replicates from each sample were amplified in a 25 μl reaction containing 50 ng DNA template, 1 × PCR buffer, 2 mM MgCl 2 , 0.2 mM dNTPs mix, 0.2 μM of each primer and 2.5 U Platinum Taq polymerase (Invitrogen, Carlsbad, CA, USA). The reactions were heated to 95 °C for 5 min, followed by 15 cycles of 94 °C for 40 s, 46 °C for 1 min and 72 °C for 30 s, and a final extension at 72 °C for 5 min. The amplicons from each sample were pooled and purified with Qiagen’s MiniElute PCR purification columns (Qiagen, Hilden, Germany) following its standard protocol and eluted in 25 μl molecular biology-grade water. Two microlitres of the pooled purified amplicons were then used as a template for the second PCR following the same PCR conditions except using 30 amplification cycles instead of 15 cycles. The second PCR added the titanium tails to the 16S-V6 amplicons, which are essential for the 454 sequencing procedure. The amplicons from the second PCR were pooled and purified again using Qiagen MiniElute PCR purification columns. Negative control reactions (no DNA template) were included in all the experiments. PCR products were visualized on a 1.5% agarose gel to check the amplification success. The purified amplicons from different samples were normalized to the same concentration (100 ng μl−1), captured to streptavidin coated sepharose beads and exposed to emulsion PCR. Afterwards, the immobilized double stranded amplicons were denatured into single-stranded DNA, which was then annealed to the sequencing primer by heating at 65 °C for 5 min. The single-stranded DNA-containing beads were sequenced on a 454 Genome Sequencer FLX System (Roche Diagnostics GmbH) using GS Titanium chemistry according to the standard amplicon sequencing protocol. Samples covering the three tested phenotypes (two to four samples each) were sequenced on the same run with each being sequenced in a 1/16 section of 70 × 75 picotiter plate. Note that as the samples were physically separated from each other during the sequencing reaction, there was no need to use barcoded primers.

454-pyrosequencing data analysis

A total of 346,160 reads were generated from the 454 pyrosequencing of 16S rDNA-V6 region from 26 right colon samples. The raw pyrosequencing reads were de-noised and processed to remove low quality and short reads using Quantitative Insights Into Microbial Ecology pipeline release 1.8.0 (QIIME 1.8.0; ref. 47) according to the following parameters: (1) minimum read length of 100 bp, (2) exact matching to the sequencing primers, (3) no ambiguous nucleotides and (4) a minimum average quality score of 20. Next, sequences were de novo clustered using UCLUST54 based on average percentage of identity of 97%. The most abundant read from each OTU was picked as a representative sequence for that cluster. The representative sequences were then aligned using PyNAST with a minimum alignment length of 100 and a minimum percentage identity of 75%, followed by checking the chimeric OTUs with the blast_fragments approach. Only six representative sequences were identified as chimeric and therefore were removed. Taxonomy assignments were made with BLAST55 by searching the representative sequences against the Greengenes database (release 4 February 2011) with an e value <1e−8 and a confidence score of ≥0.5. Next, singletons (OTUs that had only one matching sequence) were filtered out from the resulting OTU table. The OTU table was then used to determine the alpha and beta diversity within and between the samples using the QIIME’S default criteria. To identify the microbial biomarkers for each tested category, the relative abundance of different phylogenetic levels computed by QIIME was analysed by linear discriminant analysis effect size algorithm11 using its default parameters. The 454-pyrosequencing reads assigned as Atopobium by QIIME analysis were retrieved and found to match to A. parvulum following alignment of the reads against the RDP56 and NCBI databases (the aligned region covered the entire 454 sequence length with >99% sequence identity to A. parvulum).

Statistical analysis of the microbiota data

Several statistical approaches were used to identify taxa significantly associated with disease status and severity. Identification of OTUs exhibiting differential abundance between patients with different disease severity (mild versus moderate versus severe) was performed using the metagenomeSeq R package—a statistical approach that controls for confounding factors and accounts for undersampling12. Briefly, the OTU counts were normalized using a cumulative sum scaling approach and OTUs with a normalized count <5 in at least one of the groups were discarded. OTUs exhibiting differential abundance as a function of disease severity were identified using a zero-inflated Gaussian mixture model by incorporating gender and the inflammation status of the ascending colon (inflamed or not inflamed) as confounding covariates. Differentially abundant OTUs were selected according to the following algorithm: (1) OTU present in more than 50% of the samples from at least one group; (2) a fold change in relative abundance ≥2 and (3) a statistical significance with an adjusted P value (calculated using the Benjamini–Hochberg method) <0.05. A Kruskal–Wallis test with a post hoc Dunn’s test was performed to compare the relative abundance of specific taxa or groups of bacteria as a function of disease status (CD versus UC versus control) and disease severity (mild versus moderate versus severe). The relative abundance of the taxa identified was also analysed by principal coordinate analysis (PCoA). All the statistical analyses were performed using XLSTAT (Addinsoft, NY, USA) and/or GraphPad Prism version 6 (GraphPad, La Jolla, CA, USA).

Validation of V6-16S sequencing

The quantification of A. parvulum and sulfate reducing bacteria relative to total bacteria was determined by conducting quantitative PCR (qPCR) on the extracted metagenomic DNA using an Applied Biosystems 7,300 and primers listed in Supplementary Table 4. Each sample was tested in duplicate in a total volume of 25 μl per reaction. Fifty nanograms of template DNA was added to a reaction mixture containing 0.3 μM of each primer (0.5 μM for 16S rDNA universal primers) and 1 × QuantiFast SYBR Green PCR master mix (Qiagen, Hilden, Germany). The amplification conditions were 5 min at 95 °C followed by 40 cycles of 95 °C for 10 s and 66 °C for 1 min with data collection at the second step of each cycle. Ct values were then extracted using the Applied Biosystems 7300 software version 1.3.1 and the relative abundance of each taxon was calculated as a ΔCt value (taxon Ct–universal 16S rRNA Ct). To validate the specificity of Apar-711F and Aparv-881R, fresh PCR amplicons from total DNA extracted from two different mucosal aspirates was cloned using the TOPO TA cloning vector (Invitrogen, Carlsbad, CA, USA) according to the manufacturer’s instructions. Next, the plasmid containing the 16S rDNA gene fragment was extracted from six different clones by the QIAprep Spin Miniprep kit (Qiagen, Hilden, Germany) followed by capillary sequencing using M13F and M13R primers.

Quantification of butyrate-producers by qPCR

The overall abundance of butyrate-producing bacteria was determined by quantifying the amount of the butyryl CoA:acetate CoA-transferase (BCoAT) gene using the primers BCoATscrF/BCoATscrR (Supplementary Table 4) as described elsewhere57,58. The BCoAT gene was amplified from 50 ng of metagenomic DNA in a 25 μl qPCR reaction containing 1 × QuantiTect SYBR Green PCR master mix (Qiagen) and 2.5 μM of BCoATscrF/BCoATscrR primers. The amplification conditions were as follows: one cycle of 95 °C for 15 min; 40 cycles of 94 °C for 15 s, 53 °C and 72 °C each for 30 s with data acquisition at 72 °C. The 16S rDNA gene was used to normalize total bacterial content in each sample and was amplified using adapted universal primers UniF/UniR (Supplementary Table 4)59. The assay was done in duplicate for each sample. Delta-Ct (ΔCt) for each target was calculated by subtracting the Ct of the 16S rDNA from the BCoAT Ct. Then, the ΔCt values were compared between the CD and control groups using a Mann–Whitney two-tailed test with a Dunn’s post hoc test and P values below 0.05 were considered significant.

Microbial correlation network analysis

A correlation network was constructed to identify co-occurrence and mutual exclusion among OTUs from the control subjects and CD patients. To focus on bacteria potentially associated with disease, only the OTUs exhibiting differential abundance as a function of disease severity were chosen for this network analysis. In addition, OTUs that were found to be absent in more than two-third of the samples or with a minimum occurrence <15 across all the samples were removed from the data set, leaving a matrix of 97 OTUs. The network was built as previously described60 using the CoNet Cytoscape plug-in. The correlation scores were calculated for each OTU pair using a combination of two correlation (Pearson and Spearman) and two dissimilarity measures (Bray–Curtis and Kullback–Leibler). The 400 top- and bottom-ranking edges from each method were retrieved. Edge- and method-specific permutations (100) were used to control for potential false-positive correlations. The resulting distribution was used to calculate statistical significance for each edge using 100 bootstrap iterations. The P value was adjusted for multiple comparisons using a Benjamini–Hochberg correction and a value <0.05 was deemed significant. The P value was obtained for each correlation or dissimilarity measure and only edges supported by at least two significant measures were included. The final network consisted of 89 nodes (OTUs) and 341 edges (correlations). The resulting network was visualized using Cytoscape 3.2.1 (ref. 61).

Stable isotope labelling by amino acids in cell culture

The human hepatic HuH7 cells (HuH-7) and human embryonic kidney 293 cells (HEK-293) were originally acquired from ATCC and the human colorectal cancer 116 cells (HCT-116) were obtained from the Japanese Collection of Research Bioresources Cell Bank. HuH-7, HEK-293 and HCT-116 were individually grown at 37 °C in a 5% CO 2 humidified incubator (to note, all three cell lines used in this study were found to be mycoplasma positive). Stable isotope labelling by amino acids in cell culture (SILAC) medium was prepared as follows: DMEM lacking lysine, arginine and methionine was custom prepared by AthenaES (Baltimore, MD, USA) and supplemented with 30 mg l−1 methionine (Sigma-Aldrich; Oakville, ON, Canada), 10% (v/v) dialysed FBS (GIBCO-Invitrogen; Burlington, ON,Canada), 1 mM sodium pyruvate (Gibco-Invitrogen), 28 g ml−1 gentamicin (Gibco-Invitrogen), and [13C 6 ,15N 2 ]-L-lysine, [13C 6 ,15N 4 ]-L-arginine (heavy isotopic form of amino acids; Heavy Media) from Sigma-Aldrich (Oakville, ON, Canada) at final concentrations of 42 and 146 mg l−1 for arginine and lysine, respectively. For HCT-116, the concentration of arginine was increased to 84 mg l−1. The cells were grown for at least 10 doublings in SILAC media to allow for complete incorporation of the isotopically labelled amino acids into the cells.

Determination of the rate of SILAC amino acids incorporation

The cells were grown to 80% confluency in SILAC medium (5 × 106 cells were plated in 10-cm dish). Next, the cells were washed twice with ice-cold phosphate-buffered saline and lysed by addition of 1 ml of 1X RIPA buffer (50 mM Tris (pH 7.6), 150 mM NaCl, 1% (v/v) NP-40, 0.5% (w/v) deoxycholate, 0.1% (w/v) SDS with protease inhibitor cocktail (Complete Mini Roche; Mississauga, ON,Canada) and phosphatase inhibitor (PhosStop Roche tablet)). The lysates were then transferred to 15 ml conical tubes and the proteins were precipitated by addition of 5 ml ice-cold acetone followed by incubation at −20 °C overnight. The proteins were collected by centrifugation (3,000g, 10 min, 4 °C), washed with ice-cold acetone two times and the protein pellets were re-solubilized in 300 μl of a 50 mM NH 4 HCO 3 solution containing 8 M urea. The protein concentrations were determined by the Bradford dye-binding method using Bio-Rad’s Protein Assay Kit (Mississauga, ON, Canada). For the general in-solution digestion, 200 μg of protein lysates were reconstituted in 50 mM NH 4 HCO 3 (200 μl) and the proteins were reduced by mixing with 5 μl of 400 mM DTT at 56 °C for 15 min. The proteins were then subjected to alkylation by mixing with 20 μl of 400 mM iodoacetamide in darkness (15 min at room temperature) followed by addition of 800 μl of 50 mM NH 4 HCO 3 to reduce the urea concentration to ∼0.8 M. Next, the proteins were digested with TPCK-trypsin solution (final ratio of 1:20 (w/w, trypsin:protein) at 37 °C for 18 h. Finally, the digested peptides were desalted using C18 Sep-Pack cartridges (Waters), dried down in a speed-vac, and reconstituted in 0.5% formic acid before mass spectrometric analysis (as described below) and the determination of labelling efficiency. The incorporation efficiency was calculated according to the following equation: (1-1/Ratio(H/L)); where H and L represents the intensity of heavy and light peptides detected by mass-spectrometry, respectively. Labelling was considered complete when values reached at least 95% for each cell type.

Proteomic analysis of biopsies

Biopsies were lysed in 4% SDS, 50 mM Tris-HCl (pH 8.0) supplemented with proteinase inhibitor cocktail (Roche) and homogenized with a Pellet pestle. The lysates were sonicated three times with 10 s pulses each with at least 30 s on ice between each pulse. The protein concentrations were determined using the Bio-Rad DC Protein Assay. The proteins were processed using the Filter Aided Sample Preparation method as previously described with some modifications62. Biopsy lysates (45 μg of proteins) and heavy SILAC-labelled cell lysates (15 μg from each HuH-7, HEK- 293 and HCT-116 cells) were mixed at a 1:1 weight ratio and transferred into the filter. The samples were centrifuged (16,000g, 10 min), followed by two washes of 200 μl 8 M urea, 50 mM Tris-HCl pH 8.0. The samples were then reduced by incubation in 200 μl of 8 M urea, 50 mM Tris-HCl (pH 8.0) supplemented with 20 mM dithiothreitol. After centrifugation, the samples were subjected to alkylation by adding 200 μl of 8 M urea, 50 mM Tris-HCl pH 8.0, containing 20 mM iodoacetamide (30 min at room temperature protected from light). The samples were washed twice using 200 μl 8 M urea, 50 mM Tris-HCl pH 8.0 to remove excess SDS. To further dilute urea, two washes of 200 μl 50 mM Tris-HCl pH 8.0 were performed. For the trypsin digest, the samples were incubated in 200 μl of 50 mM Tris-HCl pH 8.0, containing 5 μg of Trypsin (TPCK Treated, Worthington) on a shaker (250 rpm) at 37 °C overnight. Finally, 200 μl of 50 mM Tris-HCl pH 8.0 was added to elute the peptides by centrifugation (twice). The peptides were fractionated, using an in-house constructed SCX column with five pH fractions (pH 4.0, 6.0, 8.0, 10.0 and 12.0). The buffer composition was 20 mM boric acid, 20 mM phosphoric acid and 20 mM acetic acid, with the pH adjusted by using 1 M NaOH. Finally, the fractionated samples were desalted using in-house C18 desalting cartridges and dried in a speed-vac before LC-MS analysis.

Mass-spectrometry analyses

All resulting peptide mixtures were analysed by high-performance liquid chromatography/electrospray ionization tandem mass spectrometry (HPLC-ESI-MS/MS). The HPLC-ESI-MS/MS consisted of an automated ekspert nanoLC 400 system (Eksigent, Dublin, CA, USA) coupled with an LTQ Velos Pro Orbitrap Elite mass spectrometer (ThermoFisher Scientific, San Jose, CA, USA) equipped with a nanoelectrospray interface operated in positive ion mode. Briefly, each peptide mixture was reconstituted in 20 μl of 0.5% (v/v) formic acid and 12 μl was loaded on a 200 μm × 50 mm fritted fused silica pre-column packed in-house with reverse phase Magic C18AQ resins (5 μm; 200 Å pore size; Dr Maisch GmbH, Ammerbuch, Germany). The separation of peptides was performed on an analytical column (75 μm × 10 cm) packed with reverse phase beads (3 μm; 120 Å pore size; Dr Maisch GmbH, Ammerbuch, Germany) using a 120 min gradient of 5–30% acetonitrile (v/v) containing 0.1% formic acid (v/v) (JT Baker, Phillipsburg NJ, USA) at an eluent flow rate of 300 nl min−1. The spray voltage was set to 2.2 kV and the temperature of heated capillary was 300 °C. The instrument method consisted of one full MS scan from 400 to 2,000 m/z followed by data-dependent MS/MS scan of the 20 most intense ions, a dynamic exclusion repeat count of 2 and a repeat duration of 90 s. The full mass was scanned in an Orbitrap analyzer with R=60,000 (defined at m/z 400), and the subsequent MS/MS analyses were performed in LTQ analyzer. To improve the mass accuracy, all the measurements in the Orbitrap mass analyzer were performed with on-the-fly internal recalibration (‘Lock Mass’). The charge state rejection function was enabled with charge states ‘unassigned’ and ‘single’ states rejected. All the data were recorded with Xcalibur software (ThermoFisher Scientific, San Jose, CA, USA).

Mass-spectrometry database search and bioinformatic analysis

All the raw files were processed and analysed by MaxQuant, Version 1.2.2.5 against the decoy Uniprot-human database (86,749 entries), including commonly observed contaminants. The following parameters were used: cysteine carbamidomethylation was selected as a fixed modification, with methionine oxidation, protein amino (N)-terminal acetylation and heavy proline set as variable modifications. Enzyme specificity was set to trypsin. Up to two missing cleavages of trypsin were allowed. SILAC double labelling (light: K0R0; heavy: K8R10) was set as the search parameter to assess the conversion efficiency. The precursor ion mass tolerances were 7 p.p.m. and the fragment ion mass tolerance was 0.5 Da for MS/MS spectra. The false discovery rate for peptides and proteins was set at 1% and a minimum length of six amino acids was used for peptide identification. The peptides file was imported into Perseus (version 1.2.0.17) to extract the lysine- and arginine-containing peptides, respectively. The protein-group file was imported into Perseus (version 1.3.0.4) for statistical analysis. Differences between control and CD protein expression were determined using a t-test with an adjustment for multiple comparisons (Benjamini–Hochberg method; false discovery rate =0.05) and q values of less than 0.05 were considered significant. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was achieved using the DAVID Bioinformatics Resources (http://david.abcc.ncifcrf.gov/). DAVID statistical analyses were performed against the whole genome. Proteomics has a tendency to oversample proteins from the cytosol and nucleus while under-sampling membrane-associated proteins. Therefore, the results from DAVID were re-tested against the set of proteins that were not changing in our data set to eliminate any pathway/GO enrichment biases.

Total RNA extraction from intestinal mucosal aspirates

RNA integrity was preserved by adding an equal volume of RNAlater (Ambion) to the mucosal aspirates before freezing at −80 °C. The frozen aliquot (2 ml) was thawed on ice and the total RNA was extracted following the hot phenol protocol described previously63. Briefly, 4 ml of each sample in RNAlater were pelleted by centrifugation at 13,000 × g for 5 min at 4 °C. The pellets were washed twice by resuspension in 50% RNAlater/PBS buffer and centrifugation at 13,000g for 5 min at 4 °C. Next, the pellets were resuspended and lysed in 2 ml denaturing buffer (4 M guanidium thiocyanate, 25 mM sodium citrate, 0.5% N-laurylsarcosine and 1% N-acetyl cysteine, 0.1 M 2-mercaptoethanol). The lysate was divided into 500 μl aliquots, to which 4 μl of 1 M sodium acetate (pH 5.2) was added. Each aliquot was then incubated with 500 μl of buffer saturated phenol (pH 4.3) at 64 °C for 10 min with intermittent mixing. The tube was placed on ice for 10 min followed by centrifugation at maximum speed for 30 min at 4 °C. One millilitre of chloroform was added to the aqueous layer and incubated for 15 min on ice, followed by centrifugation at 18,000g for 30 min at 4 °C. Afterwards, RNA was precipitated from the aqueous layer by the addition of one-tenth volume 3 M sodium acetate, 500 mM DEPC-treated EDTA and two volumes of cold ethanol followed by overnight incubation at −80 °C. Later, the RNA was pelleted by centrifugation at maximum speed for 30 min at 4 °C, aspirated with 80% cold ethanol and resuspended in 100 μl of RNAse free dH 2 O. The extracted RNA was treated twice with Dnase I (Epicentre) followed by PCR amplification using the 16S rDNA universal primers; Bact-8F and 1391-R (ref. 64), to confirm the absence of genomic DNA. The quality and the quantity of the extracted RNA were determined by NanoDrop 2000 spectrophotometer (ThermoScientific) and confirmed by Bio-Rad’s Experion StdSens RNA system according to the manufacturer’s instructions and stored at −80 °C until use.

Quantification of H 2 S detoxification genes expression level

The quantification of the expression level of TST, SQRDL (Sulfide Quinone Reductase Like) and COX4-1 (Cytochrome C oxidase subunit IV isoform 1) relative to hGAPDH (Glyceraldehyde-3-Phosphate Dehydrogenase) was determined using an Applied Biosystems 7300 and Quantitect SYBR Green RT-PCR kit (Qiagen, Hilden, Germany). The primers used were either designed using the NCBI Primer-BLAST tool65 or extracted from the literature (Supplementary Table 5). The specificity of the primers was confirmed by TOPO TA cloning and capillary DNA sequencing as previously mentioned in the qPCR validation of V6-16S sequencing methods section. Each reaction contained 100 ng RNA template, 0.5 μM of each primer, 1 × Quantitect SYBR Green RT-PCR master mix and 0.25 μl Quantitect RT-mix. The one-step qRT–PCR conditions were 50 °C for 30 min, 95 °C for 15 min followed by 40 cycles of 15 s at 94 °C, 30 s at 60 °C and 30 s at 72 °C with data collection at the third step of each cycle. The amplification specificity was checked by the melting profile of the amplicon and 2% agarose gel electrophoresis. The Ct values were then extracted using the Applied Biosystems 7300 software version 1.3.1. The Ct values of TST, SQRDL or COX4 were normalized to the Ct values of hGAPDH generating ΔCt. Next, ΔΔCt was calculated by subtracting the average ΔCt of the control group from the ΔCt of each sample. The relative quantification was then calculated as 2−ΔΔCt as described previously66.

Microbiota–mitochondrial proteins correlation analysis

Pairwise correlations between the relative abundance of the significant OTUs and mitochondrial proteins identified via MS were calculated using the non-parametric Kendall Tau and Spearman’s rank correlation coefficients. To avoid spurious correlations, OTUs and mitochondrial proteins absent in more than half of the samples in each group were excluded from each pairwise comparison. As a result, pairwise correlations were calculated between 62 OTUs and 106 mitochondrial proteins from the 21 patients for which we had both data sets. To further avoid spurious correlations, correlations were considered significant only if the upper-tail probability occurrence was below 0.05 for both the Kendall Tau and the Spearman’s rank statistics.

Il10−/− mouse experiments and tissue processing

Germ-free SvEv129/C57BL6 Il10−/−;NF-κBEGFP male/female mice (8–12 weeks old, n=12) were transferred to specific pathogen free (SPF) conditions and after 2 weeks mice from a randomly selected cohort (n=6) were gavaged once weekly with A. parvulum (1 × 108 CFUs) for 6 weeks and sacrificed 6 weeks later (14 weeks total). A. parvulum ATCC 33793 was grown in fastidious anaerobic broth (Lab M, Canada). To investigate the involvement of complex microbiota in the development of colitis, we performed two subsequent experiments using C57BL6 Il10−/− male/female mice. In the first experimental setting, gnotobiotic Il10−/− mice (n=16) were randomized into two groups; 1: GF only (n=6) and 2: GF+A. parvulum (1 × 108 CFUs; n=10). The mice were killed after 8 weeks of mono-association. The mice were fed normal diet (Teklan Global 18% Protein Rodent Diet). In the second experimental setting, gnotobiotic Il10−/− male/female mice (n=31) were transferred to SPF conditions for 2 weeks and then randomized into two groups; 1: SPF only (n=7), 2: SPF plus A. parvulum (1 × 108 CFUs; n=8), 3: SPF plus bismuth (III) diet (n=8) and 4: SPF plus A. parvulum and bismuth (III) diet (n=8). As described for the first cohort, the mice were gavaged weekly with A. parvulum (1 × 108 CFUs) for 6 weeks and then killed after 6 weeks of colonization (14 weeks total). Bismuth (III) subsalicylate (Sigma-Aldrich, Saint Louis, MO, USA) was incorporated to the chow (Teklan Global 18% Protein Rodent Diet) at a concentration of 7 g kg−1 (Harlan Laboratories, Madison, WI, USA) and then irradiated for gnotobiotic experiments. The mice were fed with this diet 1 week before the colonization with A. parvulum. The tissue samples from the colon were collected for RNA and histology as previously described67. All the animal procedures were approved by the University of North Carolina at Chapel Hill Animal Care and Use Committee. The histological images were acquired using a DFC310 FX (LEICA) coupled to LEICA Application Suite AFv4.6 software. Intestinal inflammation was blindly scored as previously described18. The tissue was divided into four quarters to allow individual scoring of the mid-section, which was affected by A. parvulum more so than in other colitis models. A score was given to each quarter separately and then added to generate a final colitis score on a scale of 0–16.

Statistical analyses of Il10−/− mice results

Unless specifically noted, the statistical analyses were performed using GraphPad Prism version 6 (GraphPad, La Jolla, CA, USA). The comparisons of mouse studies were made with a nonparametric analysis of variance, and then a Mann–Whitney U-test. The experiments were considered statistically significant if P<0.05.

Mouse endoscopy

Colonoscopy was performed using a ‘Coloview System’ (Karl Storz Veterinary Endoscopy) as described previously68. The mice were anaesthetized using 1.5 to 2% isoflurane and ∼4 cm of the colon from the anal verge from the splenic flexure was visualized. The procedures were digitally recorded on an AIDA Compaq PC.

qRT–PCR on mouse intestinal samples

Total RNA from intestinal tissues was extracted from the distal part of the colon using TRIzol (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s protocol. The complementary DNA was reverse-transcribed using M-MLV (Invitrogen, Carlsbad, CA, USA) and mRNA expression levels were measured using SYBR Green PCR Master mix (Applied Biosystems) on an ABI 7900HT Fast Real-Time PCR System and normalized to β-actin. The primers used are listed in Supplementary Table 5. The PCR reactions were performed for 40 cycles according to the manufacturer’s recommendations, and RNA fold changes were calculated using the ΔΔCt method66.

Data availability

The demultiplexed reads for all V6-16S experiments that support the findings of this study have been deposited to the NCBI Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra). The pure culture experiments were submitted under accession number SRP075131, the Illumina microbiota reads under accession numbers SRP034595/SRP056939, the 454 microbiota reads under accession number SRP034632 and the Ion Torrent reads under accession number SRP075131. All the raw data files for the mass spectrometry that support the findings of this study have been submitted to ProteomeXchange under accession number PXD002882. The remaining data are available within the Article and Supplementary Information files or available from the author on request.