Study design

We enrolled 21 participants (nine men and twelve women), without any known food intolerance and without known gastrointestinal disorders, in our GFD study for 13 weeks (Fig. 1). After baseline measurements (T = 0), all the participants started a GFD for four weeks (T = 1–4), followed by a “wash-out” period of five weeks. Subsequently, data were collected when they returned to their habitual diets (HD, gluten-containing) for a period of four weeks (T = 5–8) (Fig. 1). Fecal samples were collected at all time points. Blood was collected at baseline, at T = 2 and T = 4 on GFD, and at T = 6 and T = 8 on HD.

Fig. 1 Timeline of GFD study, including number of participants and collected samples Full size image

The participants were aged between 16 and 61 years (mean age, 36.3 years). Mean BMI was 24.0 and 28.6 % (n = 6) of participants were smokers. The majority of participants were European (n = 19), two participants were South American, and one was Asian. Except for one, none of the participants had taken an antibiotic treatment for the year prior to the study start. In both diet periods (GFD, HD), participants kept a detailed three-day food record. All 21 participants completed the GFD period; for 17 participants all data points were available. An overview of the participants’ characteristics can be found in Additional file 1: Figure S1.

Written consent was obtained from all participants and the study followed the sampling protocol of the LifeLines-DEEP study [10], which was approved by the ethics committee of the University Medical Centre Groningen, document no. METC UMCG LLDEEP: M12.113965.

Gluten-free diet and dietary intake assessment

Methods to assess GFD adherence and dietary intake have been described previously by Baranska et al. [11] In short, before the start of the study, the participants were given information on gluten-containing food products by a dietician and they were instructed how to keep a three-day food record. The food records were checked for completeness and the macronutrient intake was calculated. Days on which a participant had a daily energy intake below 500 kcal or above 5000 kcal were excluded from our analysis (n = 2). Of 21 participants, 15 (71 %) completed the dietary assessments; three were excluded from food intake analysis because of incomplete food records. We used the paired t-test to compare group means between GFD and HD.

Blood sample collection

Participants’ blood samples were collected after an overnight fast by a trained physician assistant. We collected two EDTA tubes of whole blood at baseline (T0) and during the GFD period at time points T2 and T4; during the HD period one EDTA tube was collected at time points T6 and T8. Plasma was extracted from the whole blood within 8 h of collection and stored at −80 °C for later analysis.

Microbiome analysis

Fecal sample collection

Fecal samples were collected at home and immediately stored at −20 °C. At the end of the 13-week study period, all samples were stored at −80 °C. Aliquots were made and DNA was isolated with the QIAamp DNA Stool Mini Kit. Isolated DNA was sequenced at the Beijing Genomics Institute (BGI).

Sequencing

We used 454 pyrosequencing to determine the bacterial composition of the fecal samples. Hyper-variable region V3 to V4 was selected using forward primer F515 (GTGCCAGCMGCCGCGG) and reverse primer: “E. coli 907-924” (CCGTCAATTCMTTTRAGT) to examine the bacterial composition.

We used QIIME [12], v1.7.0, to process the raw data files from the sequencer. The raw data files, sff files, were processed with the defaults of QIIME v1.7.0, however we did not trim the primers. Six out of 161 samples had fewer than 3000 reads and were excluded from the analysis. The average number of reads was 5862, with a maximum of 12,000 reads.

OTU picking

The operational taxonomic unit (OTU) formation was performed using the QIIME reference optimal picking, which uses UCLUST [13], version 1.2.22q, to perform the clustering. As a reference database, we used a primer-specific version of the full GreenGenes 13.5 database [14].

Using TaxMan [15], we created the primer-specific reference database, containing only reference entries that matched our selected primers. During this process we restricted the mismatches of the probes to the references to a maximum of 25 %. The 16S regions that were captured by our primers, including the primer sequences, were extracted from the full 16S sequences. For each of the reference clusters, we determined the overlapping part of the taxonomy of each of the reference reads in the clusters and used this overlapping part as the taxonomic label for the cluster. This is similar to the processes described in other studies [9, 15–18].

OTUs had to be supported by at least 100 reads and had to be identified in two samples; less abundant OTUs were excluded from the analysis.

Estimation of gene abundance and pathway activity

After filtering the OTUs, we used PICRUSt [19] to estimate the gene abundance and the PICRUSt output was then used in HUMAnN [20] to calculate the bacterial pathway activity. First, the reference database was clustered based on 97 % similarity to the reference sequence to better reflect the normal GreenGenes 97 % database required for PICRUSt. Three out of 1166 OTUs did not contain a representative sequence in the GreenGenes 97 % set and were therefore excluded from the analysis. Since merging the reference database at 97 % similarity level led to merging of previously different clusters, for the pathway analysis we chose to permute the cluster representative names in the OTU-table 25 times; this was to be sure that our OTU picking strategy would not cause any problems in estimating the genes present in each micro-organism. Next, we ran PICRUSt on the 25 permuted tables and calculated the average gene abundance per sample. The average correlations between the permutations within a sample was higher than 0.97 (Pearson r). Hence, we averaged the PICRUSt output, which was then used to calculate the pathway activity in HUMAnN.

Changes in the gut microbiome or in gene abundance due to diet

To identify differentially abundant taxa, microbial biomarkers, and differences in pathway activity between the GFD and HD periods, we used QIIME and MaAsLin [21]. QIIME was used for the alpha-diversity analysis, principal coordinate analysis (PCoA) over unifrac distances, and visualization. In the MaAsLin analysis we corrected for ethnicity (defined as continent of birth) and gender. MaAsLin was used to search for differentially abundant taxonomic units to discriminate between the GFD and HD time points. Additionally, we tested for during transition from HD to GFD (T0–T4). MaAsLin uses a boosted, additive, general linear model to discriminate between groups of data.

In the MaAsLin analysis we did not test individual OTUs, but focused on the most detailed taxonomic label each OTU represented. Using the QIIMETOMAASLIN [22] tool, we aggregated the OTUs if the taxonomic label was identical and, if multiple OTUs represented a higher order taxa, we added this higher order taxa to the analysis. In this process, we went from 1166 OTUs to 114 separate taxonomic units that were included in our analysis. Using the same tool, QIIMETOMAASLIN, we normalized the microbial abundance using acrsin square root transformation. This transformation leads to the percentages being normally distributed.

In all our analyses we used the Q-value calculated using the R [23] Q-value package [24] to correct for multiple testing. The Q-value is the minimal false discovery rate at which a test may be called significant. We used a Q-value of 0.05 as a cutoff in our analyses.

Biomarkers

Six biomarkers related to gut health were measured in the “Dr. Stein & Colleagues” medical laboratory (Maastricht, the Netherlands). These biomarkers included: fecal calprotectin and a set of plasma cytokines as markers for the immune system activation [25–27]; fecal human-β-defensin-2 as a marker for defense against invading microbes [28, 29]; fecal chromogranin A as a marker for neuro-endocrine system activation [30–32]; fecal short-chain fatty acids (SCFA) secretion as a marker for colonic metabolism [33]; and plasma citrulline as a measure for enterocyte mass [34, 35]. The plasma citrulline level and the panel of cytokines (IL-1β, IL-6, IL-8, IL-10, IL-12, and TNFα) were measured by high-performance liquid chromatography (HPLC) and electro-chemiluminescence immunoassay (ECLIA), respectively. In feces, we measured calprotectin and human-β-defensin-2 levels by enzyme-linked immunosorbent assay (ELISA), chromogranin A level by radioimmunoassay (RIA), and the short-chain fatty acids acetate, propionate, butyrate, valerate, and caproate by gas chromatography–mass spectrometry (GC-MS). All biomarker analyses were performed non-parametrically, with tie handling, because of the high number of samples with biomarker levels below the detection limit. We used the Wilcoxon test to compare the average biomarker levels between the diet periods and the Spearman correlation to search for relations between the microbiome or gene activity data and the biomarker levels.