Participant recruitment

Volunteers were recruited through advertisements and using flyers, which were distributed in the areas surrounding 4 different Italian cities: Bologna, Parma, Torino and Bari. Thirty healthy adult volunteers (15 men and 15 women) were enrolled, with an age of 25–55 years (36 ± 7.0), and a BMI > 18 (21.89 ± 2.20). The volunteers had been following an omnivorous, an ovo-lacto-vegetarian or a vegan diet for at least one year. Omnivorous, vegetarian, or vegan diets were validated by one week FFQ (Food Frequency Questionnaire)47. The sample set included individuals who followed an omnivorous (total no = 10; 5 men and 5 women), an ovo-lacto-vegetarian (total no = 10; 4 men and 6 women) or a vegan (total no = 10; 5 men and 5 women) diet. Volunteer features, recruitment and exclusion criteria, dietary information, sample collection and storage procedures are reported in De Filippis et al., 201614. Prospective participants were excluded according to the following criteria: V, VG and O regime followed for less than one year, age under 18 or over 60, regular consumption of drug, regular supplementation with prebiotics or probiotics, consumption of antibiotics in the previous 3 months, evidence of intestinal pathologies (Crohn’s disease, chronic ulcerative colitis, bacterial overgrowth syndrome, constipation, celiac disease, Irritable Bowel Syndrome) and other pathologies (type I or type II diabetes, cardiovascular or cerebrovascular diseases, cancer, neurodegenerative diseases, rheumatoid arthritis, allergies), pregnancy and lactation. All participants were asked questions about consumption of animal product in order to understand if their dietary habits in the last year diverged from the self-declared diet type. The subjects were instructed on how to self-collect the samples; all materials were provided in a sterile convenient, refrigerated, specimen collection kit (VWR, Milan, Italy). Faecal samples were collected on the same day of three consecutive weeks, and the three samples were pooled before microbiome, metaproteome and metabolome analyses. Home collected samples were transferred to the sterile sampling containers using a polypropylene spoon and immediately stored at 4 °C by the volunteers. The specimens were transported to the laboratory within 12 hours of collection at a refrigerated temperature. Containers were immediately stored at −80 °C. Food and beverage intake was estimated by means of a 7-day weighed food diary, which was completed every day for a total of one week, to collect metadata and to confirm the type of diet. The intake of macronutrients and micronutrients was calculated using the Microsoft Access application coupled to the European Institute of Oncology food database (European Institute of Oncology, 2008).

Ethical statement

The study protocol was approved by the Ethics Committee of: (a) Azienda Sanitaria Locale (Bari) (protocol N.1050), (b) Azienda Ospedaliera Universitaria of Bologna (protocol N.0018396), (c) Province of Parma (protocol N.22884) and (d) University of Torino (protocol N.1/2013/C) after having ascertained its compliance with the dictates of the Declaration of Helsinki (IV adaptation). All methods were performed in accordance with relevant guidelines and regulations. All patients provided written informed consent prior to participation in the study protocol. The study protocol was registered on ClinicalTrials.gov, with the identified number NCT02118857.

DNA extraction and sequencing

Triplicate fecal aliquots collected from each volunteer were pooled for DNA extraction. Ten grams of the pooled sample was aseptically homogenized with 90 ml of Ringer’s solution (Oxoid) for 2 min in a Stomacher. A 2-ml aliquot was collected and centrifuged at the maximum speed for 30 s; the supernatant was removed, and the DNA was extracted from the pellet using a Powersoil DNA kit (MO-BIO, Carlsbad, CA, USA) according to the manufacturer’s instructions. Single-end DNA library construction (one 151-bp) was performed by using the TruSeq DNA library preparation kit, and shotgun sequencing for the HiSeq. 1500 platform (Illumina, San Diego, CA, USA) was performed according to the manufacturer’s instructions (G4L Company, Salerno, Italy).

Functional meta-genomic annotation and statistical analysis

Raw sequencing reads were quality-trimmed (Phred score < 30), and reads shorter than 60 bp were discarded using the SolexaQA + + (v3.1.7.1) software48. The remaining reads were aligned against the Integrated Gene Catalogue17 (IGC) of human gut developed within the MetaHit project using Bowtie2 (v2.3.5.1) software49 with the following parameters: -t -f -D 20; -R 3; -N 0; -L 20; -i S, 1, 0.50 – local. Reads that showed the best hit ( > 90% of identity over at least 30% of the query length) against the IGC were extracted using SAMtools (version 1.9) and normalized to the total read number mapped to the whole catalogue. An average value of 90% of reads were mapped against the IGC and only genes with KEGG ID were extracted and further used for downstream analysis (3,644 KEGG Orthology (KO) genes). Shotgun reads were also assembled with Velvet v1.2.10 with default parameters50. Reads that are human contaminants have been discarded by using the BMTagger software (ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/bmtagger/). Each contig was analyzed by using the automated gene prediction and annotation pipeline PROKKA51 v1.12. In order to reconstruct metabolic pathways the FASTA and genbank files relative to the set of annotated contigs were parsed then used as input for Pathway Tools v19.0.

Data normalization and the determination of differentially abundant genes, among the three dietary groups, were then conducted using the Bioconductor DESeq2 package20 in the statistical environment R with default parameters. P values were adjusted for multiple testing using the Benjamini-Hochberg procedure, which assesses the FDR.

PCoA was performed with R “adegenet” package (https://cran.r-project.org/web/packages/adegenet/adegenet.pdf) using the gene relative abundance based on Euclidean, Bray-Curtis and Jaccard distances. The Random Forests algorithm was used to discriminate genes among diet groups. The phylogenetic characterization of the shotgun sequences was assessed using MetaPhlAn218 software with default parameters. The resulting biological observation matrix (.biom files) was then imported into QIIME52 to produce an OTU table at the genus level. In order to find differences in microbiome composition among the samples as a function of diet the Wilcoxon test in R was used.

Alpha diversity indices were estimated by the R phyloseq package

Spearman’s non-parametric correlations through the psych package of R were used to study the relationships between the relative abundance of microbial taxa abundance and dietary variables. The correlation plots were visualized in R using the made4 package of R.

A succinct step-by-step workflow summarizes the analyses carried out both for the meta-genomic and the meta-proteomic counterparts (Supplementary Fig. S3).

Protein extraction, denaturation, digestion and desalting

Pooled fecal samples (2 g) were suspended in 18 ml of ice-cold Tris-buffered saline (TBS) buffer and homogenized using a lab Stomacher. The homogenate was passed through a 20-μm vacuum filter unit to remove larger fibrous material and human cells and centrifuged (1000 rpm for 5 min) to pellet bacterial cells. The pellet was collected, washed with 5 ml of Tris-HCl (50 mM pH 7.5) to remove attached human proteins and lysed via sonication. Proteins were precipitated with 20% TCA, digested by trypsin and analyzed by UHPLC-MS/MS.

Samples were prepared for digestion using the filter-assisted sample preparation (FASP) method53. Briefly, the samples were suspended in 1% SDC, 50 mM Tris-HCl, pH 7.6, and 3 mM DTT, sonicated briefly, and incubated in a Thermo-Mixer at 40 °C, 1000 rpm for 20 min. The samples were clarified by centrifugation, and the supernatant was transferred to a 30 kDa MWCO device (Millipore) and centrifuged at 13 kg for 30 min. The remaining sample was buffer exchanged with 1% SDC, 100 mM Tris-HCl, pH 7.6, then alkylated with 15 mM iodoacetamide. The SDC concentration was reduced to 0.1%. The samples were digested using trypsin at an enzyme-to-substrate ratio of 1:100 overnight at 37 °C in a Thermo-Mixer at 1000 rpm. Digested peptides were collected by centrifugation. A portion of the digested peptides, approximately 20 µg, were desalted using reversed phase stop-and-go extraction (STAGE) tips54. Peptides were eluted with 80% acetonitrile and 0.5% acetic acid and lyophilized to near dryness in a SpeedVac (Thermo Savant) for approximately 1 h.

Liquid chromatography-tandem mass spectrometry

Each digestion mixture was analyzed by liquid chromatography (LC) by using an Easy-nLC 1000 UHPLC system (Thermo Fisher). Mobile phase A was 97.5% MilliQ water, 2% acetonitrile, and 0.5% acetic acid. Mobile phase B was 99.5% acetonitrile and 0.5% acetic acid. The 240-min LC gradient ran from 0% B to 35% B over 210 min and then to 80% B for the remaining 30 min. Samples were loaded directly into the column. The column was 50 cm × 75 μm I.D. and packed with 2 micron C18 media (Thermo Easy Spray PepMap). The LC was interfaced to a quadrupole-Orbitrap mass spectrometer (Q-Exactive, Thermo Fisher) via nanoelectrospray ionization using a source with an integrated column heater (Thermo Easy Spray source). The column was heated to 50 °C. An electrospray voltage of 2.2 kV was applied. The mass spectrometer was programmed to acquire tandem mass spectra from the top 10 ions in the full scan from 400 to 1200 m/z by data-dependent acquisition. Dynamic exclusion was set to 15 s, singly charged ions were excluded, the isolation width was set to 1.6 Da, the full MS resolution was set to 70,000 and the MS/MS resolution was set to 17,500. Normalized collision energy was set to 25, max fill MS was set to 20 ms, max fill MS/MS was set to 60 ms and the underfill ratio was set to 0.1%. The mass spectrometer RAW data files were converted to mzML format using msconvert.

Functional meta-proteomic annotation and statistical analysis

Mascot Generic Format (MGF) files were generated from mzML using the Peak Picker HiRes tool, part of the OpenMS framework. All search instances were performed on an Amazon Web Services–based cluster through the Proteome Cluster interface. Proteome Cluster builds monthly species- and genus-specific protein sequence libraries from the most current UniProtKB distribution. The most recent protein sequence libraries available from UniProtKB were used. MGF files were searched using X!Tandem55, both with the native56 and k-score57 algorithms and using OMSSA58. XML output files were parsed and non-redundant protein sets were determined using the Proteome Cluster based on previously published rules59. MS1-based isotopic features were detected, and peptide peak areas were calculated using the Feature Finder Centroid tool from the OpenMS framework60. Data normalization and the determination of differentially abundant proteins, among the three dietary groups, were conducted using the Bioconductor DESeq2 package20 in the statistical environment R with default parameters. Wald test p-values were corrected for multiple testing by using the Benjamini-Hochberg post hoc procedure.

Looking for evidence of structure among the analysed diet groups, we filtered out non-informative non-core proteins, i.e. those proteins that occurred with a maximum of 15% in each diet group. We ran DAPC by using the adegenet R package. In this multivariate analysis we used the belonging to the sample diet group as the a priori clustering condition and we retained 4 principal components. We plotted the first two discriminant functions with the scatter function of the adegenet R package. In order to ascertain if DAPC classification is consistent with the original clusters (known from diet diaries), we used the “assignplot” R function to calculate the proportions of successful reassignments (based on the discriminant functions). This function is particularly useful when prior biological groups are used, as one may infer admixed or misclassified individuals.

Microbiome pathway reconstruction

PT software version 19.021 and the coupled MetaCyc multiorganism database were used to reconstruct metabolic pathways. For the meta-genomic counterpart, the assembled genbank and .fasta files were both used to generate the .pt (pathologic) format. For the proteomic data batch, the protein output was converted into the .pf format, miming the genbank entry fields. The .pf and .pt supported formats were then used, through the built-in Pathologic software, to obtain new PGDB databases.

The numbers of reactions (total reactions in the base pathways) and pathways (base pathways) where compared in each sample and used to generate 0/1 matrices. The PT software allowed us to infer the prediction of metabolic pathway hole in our meta-genomic and -proteomic samples. The REST-style version of the KEGG API utility (http://www.kegg.jp/kegg/rest/) was used to enrich the protein dataset in terms of KEGG codes and EC numbers.

Gas-chromatography mass spectrometry/solid-phase microextraction (GC-MS/SPME) analysis

According to the manufacturer’s instructions, the DVB/CAR/PDMS fibre (Supelco, Bellefonte, PA, USA) was exposed to headspace for 40 min to extract volatile organic compounds (VOCs) from fecal samples. VOCs were thermally desorbed by immediately transferring the fiber into the heated injection port (220 °C) of a Clarus 680 (Perkin Elmer, Beaconsfield UK) gas chromatography equipped with an Rtx-WAX column (30 m × 0.25 mm i.d., 0.25 μm film thickness) (Restek) and coupled to a Clarus SQ8MS (Perkin Elmer) with source and transfer line temperatures kept at 250 and 210 °C, respectively. Each chromatogram was analyzed for peak identification using the National Institute of Standard and Technology 2008 (NIST) library. Quantitative data of the identified compounds were obtained by interpolation of the relative areas versus the internal standard area.

Fecal microbes and preparation of protein extracts for HT29 cell line assays

Thirty fecal samples analyzed by a multi-omic approach, plus 31 additional samples belonging to the previous larger cohort (13), for a total of 22 omnivores, 20 vegetarians and 19 vegans, were used. MFCs and MPCEs were obtained using the protocols applied for meta-proteomic analysis. MFC samples were washed with sterile PBS and added to DMEM at a final cell density (O.D. 620 nm) of 0.65 UA, corresponding to ca. 8 log cells/ml. MPCEs were analyzed by the Bradford method to quantify the total protein concentrations. The flagellin content in the 61 MPCE samples was also purified by liquid chromatography and quantified by nano-HPLC coupled with nano-ESI-MS/MS. Each MPCE was added to DMEM at a final protein concentration of 15 mg/ml. Flagellin was also used at final concentrations of 0.015 and 0.090 µg/ml in DMEM.

Cell line

Based on the above results showing that diet modulates the microbial synthesis of molecules/proteins (e.g., SCFA and flagellin) involved in oncogenesis or tumor suppression, the microbiomes of 61 volunteers were tested using the human HT29 colon carcinoma cell line. HT29 cells were cultured in DMEM containing fetal bovine serum (10%, FBS, Life Technologies), 2 mM glutamine and 100 u/ml penicillin/100 μg/ml streptomycin (Life Technologies) at 37 °C in the presence of 5% CO 2. For the co-incubation experiments with MFCs, MPCEs, fecal microbiomes or flagellin (InvivoGen, San Diego, CA, USA), the cells were maintained at 37 °C under CO 2 -independent conditions and cultured with the above-described standard DMEM supplemented with 25 mM HEPES.

HT29 cell viability assays

The cell viability of HT29 cells was assessed by the SRB assay61 using an initial cell density of 5,000 or 20,000 cells/well, respectively. The cells were incubated with MFCs, MPCEs or flagellin for 24, 48 and 72 h. After washing with PBS, the cells were fixed with 10% TCA. Staining of cells was performed using SRB for 30 min, and the cells were flushed repeatedly with 1% acetic acid. SRB was desorbed using 10 mM Trizma, and the plate was read at 492 nm using a microplate reader. Cells incubated in DMEM alone were used as controls.

Gene expression analyses of HT29 cells

HT29 cells grown with DMEM or DMEM plus MFCs, MPCEs or commercial flagellin for 6 and 24 h were washed twice with PBS containing Pen-Strep and 50 µg/ml gentamicin and stored at −80 °C until use. Total RNA was extracted from the HT29 cells using a commercial kit (Ribospin Minikit-GeneAll, Seoul, Korea). cDNA was synthesized from 2 μg of template RNA in a 20-μl reaction volume using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Monza, Italy). Ten microliters of total RNA was added to the Master Mix and subjected to RT-PCR in a thermal cycler (Stratagene Mx3000P Real Time PCR System, Agilent Technologies Italia S.p.A., Milan, Italy). The cDNA was amplified and detected through Taqman primer-probe sets (Applied Biosystems) (IL8, Hs00174103_m1; IL22, Hs01574154_m1; IL23A, Hs00372324_m1; TLR5, Hs01920773_s1; and REG3A, Hs00170171_m1). Human glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as the housekeeping gene and detected through Taqman primer-probe Hs999999_m1. The relative fold change in expression was normalized to GAPDH expression. All procedures were performed according to the manufacturer’s instructions. TLR-5 was quantified by chromatin immunoprecipitation (ChIP) using EZChIPTM chromatin immunoprecipitation kit (Upstate) as described by Kumar Thakur et al.62. In details, HT29 cell pellets were resuspended in the lysis buffer of the kit, and the chromatin was precipitated overnight with 2 μg of rabbit antibodies against RNA polymerase II, Sp1, Sp3, acetyl-H3, acetyl-H4, p300, HDAC1 or IgG (negative control). At the end of incubation, samples were treated with Protein G agarose for 1 h. The immunoprecipitated complex was washed and subsequently extracted with elution buffer. DNA-protein complexes were reversed and DNA was purified by ethanol precipitation. The relative binding of proteins to the TLR-5 promoter was quantitatively analyzed by qPCR.

Enzyme-linked immunosorbent assay (ELISA)

Cell culture supernatants were analyzed for IL-8, IL-22 and IL-23 release in triplicate using an ELISA kit (Human IL-8/CXCL8; IL-22 and IL-23 DuoSet ELISA R&D Systems, Minneapolis, MN, USA CN: DY208, DY782 and DY1290 respectively).

Statistical analysis

All data coming from gas-chromatography mass spectrometry-solid-phase microextraction (GC-MS/SPME) were obtained at least in triplicates. The GC-MS/SPME analysis, was carried out on transformed data followed by separation of means with Tukey’s HSD, using a statistical software Statistica for Windows (Statistica 6.0 per Windows 1998, (StatSoft, Vigonza, Italia).

For cell line assay statistical analyses (data at least in triplicate), differences between groups were analyzed using the ANOVA test. The correction for multiple comparisons was performed using the Tukey test and the function glht (general linear hypothesis tests) in “multcomp” R package63.

Degree of association between genera and nutrients were assessed by Spearman correlation coefficients than clustered by Euclidean distance and Ward linkage hierarchical clustering. Correlations between enzymes abundances and dietary intake were assessed by using Spearman’s correlation coefficients (FDR < 0.05 and R > 0.6); the p-values were corrected for multiple testing by using the Bonferroni adjustment within the psych R package.