Our (hierarchical) hypothesis was that at 6 weeks after allogenic (lean donor) FMT, the peripheral insulin sensitivity (Rd) would improve by 5 μmol kg-1 min-1 (with a standard deviation of 4 μmol kg-1 min-1). Moreover, we also aimed to study whether a second lean donor FMT at 6 weeks, on top of the first treatment, would maintain this 5 μmol kg-1 min-1 increase in Rd at 18 weeks. We expected that no extra treatment at 6 weeks (thus only single allogenic FMT at baseline) would result in Rd-levels at 18 weeks similar to baseline Rd-levels (exploratory analyses). With a randomization ratio of 2:1, the sample size should be 24 metabolic syndrome subjects treated with allogenic FMT, whereas we would need 12 metabolic syndrome subjects treated with autologous (own) FMT. Taking a dropout of 20% in each treatment arm into account, we aimed to include 45 metabolic syndrome subjects in total. With this sample size, the study has > 80% power in a 2-sided test with α = 0.05.

Primary endpoint of the trial was the change in intestinal microbiota composition upon FMT in relation to insulin sensitivity. Other endpoints were changes in post-prandial lipid and glucose excursions, as well as plasma metabolites. A non-Gaussian distribution for all clinical data was assumed, and thus results are presented as medians and interquartile ranges. Post-prandial results (e.g., for plasma glucose, triglycerides, bile acids and enteroendocrine hormones) are described as (incremental) area under the curves (iAUC) for the 4 hour post-prandial follow-up, calculated by using the trapezoidal method.

Statistical testing was carried out using non-parametric tests. For between-group comparisons, either the Mann-Whitney U test or Kruskal-Wallis test was used. Friedman or Wilcoxon signed rank test was used for within-group comparisons of repeated-measurements. A false discovery rate corrected p value below 0.05 was considered significant, corrected for multiple testing in case of microbiota and metabolite data, as described underneath.

Multivariate Machine Learning Analysis

Meyer, 2000 Meyer C.D. Matrix Analysis and Applied Linear Algebra. To study dynamics of biomarkers, e.g., species-level microbiota (level 3) and fasting metabolites, we computed the relative change for each individual subject over time. The relative change is, for example, the difference in microbial abundance between baseline and 6 weeks, divided by the microbial abundance at baseline, computed for each bacterial species per subject. In case of the microbiota analysis, this resulted into three datasets: 1) relative change in duodenal microbial composition of the allogenic and autologous treatment groups; 2) relative change in fecal microbial composition of the allogenic and autologous treatment groups; 3) relative change in fecal microbial composition of the responder and non-responder subjects. To assess the amount of change in intestinal microbial composition for each subject, we computed the magnitude of change by using L2 norm (). Informally, L2 (or Euclidian) norm is a measure of the vector length that is computed via calculating the sum of squared values of the relative differences of all species (between baseline and 6 weeks) per subject. The final result is calculated by taking the square root of the obtained value.

Zou and Hastie, 2005 Zou H.

Hastie T. Regularization and variable selection via the elastic net. Tibshirani, 1996 Tibshirani R. Regression shrinkage and regression via the Lasso. Meinshausen and Bühlman, 2010 Meinshausen N.

Bühlman P. Stability selection. Meinshausen and Bühlman, 2010 Meinshausen N.

Bühlman P. Stability selection. Botschuijver et al., 2017 Botschuijver S.

Roeselers G.

Levin E.

Jonkers D.M.

Welting O.

Heinsbroek S.E.M.

de Weerd H.H.

Boekhout T.

Fornai M.

Masclee A.A.

et al. Intestinal fungal dysbiosis associates with visceral hypersensitivity in patients with irritable bowel syndrome and rats. Biomarkers that allowed accurate discrimination among groups of subjects (allogenic versus autologous, responders versus non-responders) were selected by means of the elastic net algorithm (). Elastic net method is particularly applicable for the analysis of structured and high-dimensional data. It is a regularized method that combines the advantages of two techniques: LASSO () (with variable selection property of reducing coefficients to zero values) and ridge regression (with shrinking coefficients to values for ‘correlated trending’ toward each other). This combination allows for the selection of the most important biomarkers, while taking the correlation (so called ‘grouping effect’) among them into account. Furthermore, by imposing an L1-penalty on the coefficients we obtained an interpretable model and viewed non-zero coefficients as the predictors that have the strongest predictive power. We used an adapted version of the elastic net algorithm (with Hinge loss function), which is specifically tailored for identification of the most important biomarkers (e.g., microbial species and metabolites) in the collected dataset, that jointly have an effect on differentiating between allogenic and autologous subjects as well as responders and non-responders. We trained the model by taking the gradient of the loss that is estimated at each sample at a time (stochastic gradient descent learning). Our statistic learning approach also includes stability selection (). While the biomarkers identified by elastic net algorithm usually lead to statistically significant results, they can frequently be unstable. In our approach, we address this problem via stability selection procedure () coupled with the model selection. Biomarker stability is reflected in the frequency that a particular biomarker was identified in multiple simulations on a re-randomized dataset. This procedure is especially relevant for small- to medium-sized data collections as recently published by our group ().

To avoid over-fitting, we used a 10-fold stratified cross-validation procedure over the training partition of the data (80%) while the remaining 20% was used as the testing dataset. Parameters to be selected are ratio between L1, L2 norms, and regularization threshold. Stability selection was performed by randomly subsampling 80% of the data 100 times. During stability selection procedure, all features having non-zero weight coefficient were counted. These counts were normalized and converted to stability coefficients having value between 1.0 for the feature that was always selected and 0.0 for feature which was never selected. We used Python (version 2.7.8, packages Numpy, Scipy) for implementing elastic net model and R (version 3.1.2) for visualization.

A randomization test was conducted to evaluate the statistical validity of the results obtained via elastic net algorithm. We followed the procedure where the outcome variable (e.g., allogenic versus autologous or responder versus non-responder) was randomly reshuffled while the corresponding microbial profiles were kept intact. This was repeated up to 100 times and Receiver-Operating-Characteristics Area-Under-Curve (ROC AUC) scores were computed each time. The performance measure used for a binary classification task is a ROC AUC. The ROC can be understood as a plot of the probability of correctly classifying allogenic versus autologous treated subjects or responders versus non-responders. Cross-validation within the dataset was accomplished by randomly hiding 20% of the subjects from the model and evaluating the prediction quality on that group. The ROC AUC score measures the predictive accuracy of the classification model with 0.5 AUC corresponding to a random result. A critical value of 0.05 was defined and the true AUC of the original dataset was compared with this value.