Ethical review

We obtained written informed consent from both subjects enrolled in the study. This study was approved by the MIT Committee on the Use of Humans as Experimental Subjects (Study #0903003155) and complied with the Helsinki Declaration.

Microbial sampling

Subject A collected gut microbiota samples between days 0 and 364 of the study and saliva microbiota samples between days 26 and 364. Subject B primarily collected gut microbiota samples between study days 0 and 252. Gut microbiota were sampled non-invasively using fecal collection. Stool samples were taken in duplicate by coring out feces with inverted sterile 1 mL pipette tips. These tips were then deposited in 15 mL Falcon tubes. Saliva was sampled by 10 s of oral rinsing with 10 mL of sterile phosphate-buffered saline and also stored in 15 mL Falcon tubes. Samples collected at home were stored temporarily at -20°C until transport to the laboratory, where they were then stored in -80°C freezers. Subject A’s samples collected abroad were stored at -20°C, shipped to the United States on dry ice, and then stored at -80°C.

DNA extraction

We used the QIAamp DNA Stool Mini Kit (Qiagen) and a modified version of its protocol to isolate bacterial DNA from fecal and saliva samples. For stool, we included a bead-beating step at the beginning of DNA extraction, in order to increase cell lysis. First, we used a chilled centrifuge to remove frozen stool cores from the 1 mL pipette tips (30 s at 3,000 g and 4°C). Once samples thawed to 20°C, we added 700 μL of buffer ASL per 100 mg of stool. Next, we used a digital vortex (VWR) and 2 mL of garnet beads (MoBio Laboratories) to break apart stool samples (10 s at 3,000 rpm). We then bead-beat the suspended stool with a Vortex Genie2 (MoBio Laboratories) and 2 mL of 0.1 mm glass beads (MoBio Laboratories) for 10 min at the setting ‘10’, in order to physically lyse cells. Each tube was subsequently heated at 95°C for 6 min to lyse remaining unbroken cells. Afterwards, the Qiagen InhibitEX tablet was added and we followed the QIAamp kit protocol.

For saliva, we initially captured bacterial cells from saliva samples using a 0.22 μm filter (Millipore) and a syringe to apply positive pressure. We placed these filters into 180 μL of lysis buffer (without lysozyme) and bead-beat on a Mini-beadbeater (Biospec products) with 0.1 mm glass beads for 1 min at room temperature and at maximum speed. Next, we added another 180 μL lysis buffer with 40 mg/mL lysozyme and spun for 1 h at 450 rpm and 37°C. Since filtered saliva likely contains fewer PCR inhibitors than stool, we skipped addition of the Qiagen InhibitEX tablet and then followed the remainder of the QIAamp kit protocol.

DNA sequencing

We used the V4 region of the 16S ribosomal RNA gene subunit to identify bacteria in a culture-independent manner. Extracted DNA was amplified using custom barcoded primers and sequenced with paired-end 100 bp reads on an Illumina GAIIx according to a previously published protocol [19].

OTU picking

We used the QIIME analysis pipeline (v1.3) to process raw DNA reads into OTU counts [20]. We wrote a Python script to format raw sequence files for input into QIIME. We used the split_libraries_illumina.py QIIME script to initially process reads. To minimize the effects of sequencing errors, we retained only high-quality, full-length reads (max_bad_run_length was set to 0 and the min_per_read_length was assigned to 101). We next used the script parallel_pick_otus_uclust_ref.py to pick OTUs; this script relies on UCLUST [40], which can perform gapped alignments against reference sequences. We used as a reference a set of OTUs assembled at 97% similarity from the Greengenes database [41] (constructed by the nested_gg_workflow.py QiimeUtils script on 4 Feb 2011 [20]). We trimmed the reference FASTA file to span only the 16S region sequenced by our primers.

Sample quality control

We used pairwise similarity between samples to identify, and subsequently correct or exclude cases of mislabeling or mishandling that may have occurred in our sample processing pipeline. Based on this analysis, we excluded Subject B gut samples from days 229 and 230, which showed an unexpected similarity to those of Subject A. We also excluded a subset of gut samples stored in either ethanol (Subject A gut days 75 and 76) or RNAlater (Subject A gut days 258 to 270) prior to DNA extraction. We chose not to include these samples in our analysis since their storage protocol differed from other samples and could introduce a bias in our results. Finally, samples with unusually low read counts (<10,000) were excluded from further analysis.

Host metadata

We collected metadata chronicling host health and behavior using iOS devices. We modified a database iOS app (TapForms) to facilitate recording subjects’ daily health and behavior across 13 metadata categories: ailments, bowel movements, daily notes, diet, exercise, fitness, location change, medication, mood, oral hygiene, sleep, urination, and vitamin intake (described in more detail in Additional file 12). At the beginning of the study, subjects were familiarized with the TapForms app and instructed to carry their iOS devices at all times. We asked subjects to record daily health markers and actions relevant to the metadata categories. We then used a custom Python script to parse the TapForms SQL database and generate metadata time series for correlation with OTUs. A template of the TapForms forms used in this study can be downloaded and installed from the GitHub repository [42].

Augmented Dickey-Fuller (ADF) stationarity testing

We used the ADF test [22] to determine if and when microbial taxa were at equilibrium. We employed the ADF test implemented in the Statsmodels Python module [43]. This method (‘adfuller’) was run on time series mean-centered in log10-space and was called with regression parameters of no constant and no trend (regression = ‘nc’). The number of lags was chosen using the t-statistic (autolag = ‘t-stat’). Test results were compiled for the 100 most abundant OTUs in each microbial community. We chose not to include less-abundant OTUs in these analyses because the dynamics of OTUs near our detection limit are more likely to be noisy. We used the P-test [44] implemented in PyCogent [45] (set to 1,000 permutations and the ‘corrected’ P value) to measure the significance of non-stationary OTU clustering on the reference Greengenes 16S rRNA tree.

Host factor/OTU correlation detection

We constructed a time series analysis pipeline to detect relationships between host metadata and the microbiota. Our pipeline integrated steps to account for numerical artifacts associated with microbiota studies, low abundance OTUs, auto-correlated time series, and to reduce multiple hypothesis testing (Additional file 13). We designed several of these steps to avoid finding spurious correlations between variables, which can commonly occur in time series analysis [46, 47]. To facilitate understanding novel components of our pipeline, we have provided online demonstrations and Python code for time series normalization [48] and detrending auto-correlated time series [49].

Normalization

Sequencing-based 16S rRNA surveys are usually normalized by converting OTU sequence counts into fractional abundances for each sample. However, this standard technique leads to what is known as compositional effects [50], and may cause false relationships between OTUs, or between OTUs and metadata. For example, suppose a host switches to a higher fiber diet, which causes fiber-sensitive gut bacteria to multiply, but does not affect fiber-independent OTUs. Standard normalization will suggest the diet shift leads to more fiber-sensitive bacteria and fewer fiber-independent bacteria because the latter group comprises a smaller fraction of the post-diet shift bacterial community. One might then falsely conclude that fiber actively inhibits fiber-independent OTUs when in fact those OTUs do not respond to changes in the nutrient’s availability.

To avoid normalization artifacts when comparing host metadata and microbiota, we developed a novel normalization technique that does not assume sample reads sum to the same fixed value (see subsection below, Comparison to SparCC, for more details on how this method differs from our previous work on microbiota correlation). The method we introduce here assumes at least half of the OTUs held in common between two communities do not change in abundance. We then use statistics robust to the effect of outliers to estimate the median OTU fold-change between communities and rescale all OTUs by that value (Additional file 14). Mathematically, we model OTUs in two samples x (the observed) and y (the reference) as y i = m i x i and find the median m i , which we call m. We then rescale all x i by m. Unlike the standard normalization, our technique does not infer abundance changes in all OTUs when only a small number actually change. Moreover, since our method does not require that each samples’ reads sum to the same value, we can compare total bacterial load between samples.

Our regression technique is implemented as follows. First, we normalized time points in the standard manner so that all fractional OTU abundances each day sum to 1. Second, we restricted our analysis to a subset of highly-abundant OTUs, since regularly undetected OTUs will have a zero or undefined m i . We then sorted OTUs by abundance and selected the first set of OTUs that accounted for 90% of median daily reads. Third, we randomly chose time points to normalize. We normalized each time point to a reference community (rather than a single time point), to minimize the effects of anomalous time points during normalization. We did not use the same reference community for each time point since multiple microbiota states may exist in a single time series (for example, Subject B before and after diarrheal infection). Rather, we computed a reference for each sample based on other time points with similar community structure. We used a weighted median across all time points to compute reference OTU values, where we set time point weights to be (1 - j)2 and j was the pairwise JSD score to the sample being normalized. Our OTU abundance data appeared heteroscedastic, meaning that the variance of more abundant OTUs was higher than the variance of less abundant OTUs. We applied a common solution to this problem, which was to solve for m i in log-space. Fourth, we discarded time points with an uncertain estimate of m (median(|y i - mx i |) > 0.4).

Because no gold standard dataset matches sequencing-base 16S surveys to overall bacterial load in the human gut, we used simulated data to test our normalization scheme. We modeled synthetic microbial communities on microbiota observed in our experiments. Each OTU in our synthetic community behaved according to an Ornstein-Uhlenbeck (OU) process, which can be thought of as a random-walk modified to mean-revert over time. We simulated an OU process using the following function [51], where S i is OTU abundance at time i, λ describes how quickly the process returns to the mean, μ is the mean, σ the average magnitude of fluctuations, and δ is the time between simulation steps (we set to 1):

S i + 1 = S i e - λδ + μ 1 - e - λδ - σ 1 - e - 2 λδ 2 λ N 0 , 1

We calculated maximum likelihood OU parameters for 3,383 OTUs drawn from Subject B’s gut time series using the following system of equations [51]:

S x = ∑ i = 1 n S i - 1 S y = ∑ i = 1 n S i S xx = ∑ i = 1 n S i - 1 2 S xy = ∑ i = 1 n S i - 1 S i S yy = ∑ i = 1 n S i 2

μ = S y S xx - S x S xy n S xx - S xy - S x 2 - S x S y λ = - 1 δ ln S xy - μ S x - μ S y + n μ 2 S xx - 2 μ S x + n μ 2 α = e - λδ σ ¯ 2 = 1 n S yy - 2 α S xy + α 2 S xx - 2 μ 1 - α S y - α S x + n μ 2 1 - α 2 σ 2 = σ ¯ 2 2 λ 1 - α 2

To test our normalization technique in the face of microbiota perturbations, we simulated two OU processes for each OTU: one calibrated using Subject B’s pre-infection samples and one using post-infection samples. We joined these processes (each with 50 time points) into a single time series (100 time points). Lastly, we simulated daily microbiota surveys by randomly sampling OTUs according to their fractional abundance at each time point. We randomly chose total read counts from the set of read counts observed in Subject B’s microbiota time series.

Our robust regression accurately normalized the simulated time series. We evaluated our method by first tracking changes in bacterial load over time within our simulated communities. We compared these changes to bacterial load predictions from normalized time series of synthetic sequencing runs. We note that the standard normalization approach cannot infer bacterial load changes, since it predicts each samples’ OTU abundances sum to the same value. Over four separate synthetic datasets, the Spearman correlation between simulated bacterial loads and our inferred bacterial loads was never less than 0.58 (all P values ≤1.37e-10; Additional file 15), even after accounting for the autocorrelated nature of total bacterial load (see section below on Autocorrelation elimination).

Comparison to SparCC

Surveys of 16S rRNA are usually treated as fractional abundances, rather than absolute ones. This traditional approach leads to read totals that sum to 1, meaning fractions cannot change independently of each other; this in turn could lead to false relationships between OTUs, or between OTUs and metadata [52]. We have previously developed a method, termed SparCC, for inferring correlations between OTU from genomic survey data [50]. However, SparCC applies to independent samples, while this study is concerned with autocorrelated time series. Moreover, in this study we are interested in inferring correlations between OTUs and metadata, whereas SparCC focuses solely on inter-OTU correlations. We therefore introduced here a new method for normalizing 16S rRNA time series, as well as finding metadata-OTU correlations.

OTU filtering

We only tested relationships between common OTUs (present in at least half of a given period’s samples) and host metadata. Focusing on common OTUs increased the likelihood we detected true interactions, since we could analyze shifts in bacterial abundance and not simply OTU presence or absence. Moreover, filtering out OTUs reduced the total number of statistical tests we performed and thus reduced the burden of multiple hypothesis testing. After filtering, 750 OTUs, 621 OTUs, and 289 OTUs remained from Subject A’s gut, Subject B’s gut, and Subject A’s salivary microbiota time series, respectively.

Autocorrelation elimination

Autocorrelated processes occur when measurements taken at one time point are correlated with measurements at previous or future time points. For example, subjects’ weights in this study are autocorrelated, as their weight on a given day is likely to be highly similar to their weight the previous day. This poses a challenge for finding statistical relationships between host metadata and their microbiota, because it is well-known in time series analysis that cross-correlations between autocorrelated variables have unreliable P values [46, 47]. To avoid this problem, we fitted time series models to each variable and computed cross-correlations on the differences (residuals) between modeled trends and the observed data [47]. For the microbial time series, we use the R (ver. 2.15.1) ‘forecast’ package to fit standard time series models known as autoregressive integrated moving average (ARIMA) models [53]. ARIMA models are commonly used tools in econometrics and time series analysis to model longitudinal data [46]. We fit ARIMA parameters using the ‘auto.arima’ function with a maximum p of 2, a maximum q of 2, and a d of either 0 or 1. We chose which value of d to use by minimizing a common measure of model complexity (Bayesian information criterion). We applied a similar procedure to the host metadata, except in the case of variables whose behavior appeared binary (that is, in at least 75% of the time series, the variable had a value of zero). Because ARIMA models may not be appropriate for non-continuous time series [54], we used a logistic regression model designed for binary longitudinal data (the R ‘bild’ package [55]). We calculated two kinds of serial dependence models (first-order and second-order) for each metadata variable and again picked the one that minimized a measure of model complexity (the Akaike information criterion). In all cases, if the autocorrelation of the residual time series was higher than the autocorrelation of the original time series itself, we discarded the residual series and worked only on the original data.

Clustering

To further reduce the number of tested microbial and metadata interactions, we clustered OTUs sharing similar temporal dynamics. We computed pairwise distances between OTUs as 1-ρ, where ρ was the Spearman correlation (‘rcorr’ function in the R ‘hmisc’ package [56]) between the OTUs’ time series. We rounded negative distances up to 0. We next passed the OTU distance matrix to the ‘linkage’ function in the SciPy [57] (ver. 10.1) hierarchical clustering package (scipy.cluster.hierarchy). We used the ‘weighted’ linkage method to compute OTU clustering. Cluster assignments were retrieved using the ‘fcluster’ function with the clustering criterion set to ‘distance’ and a clustering threshold of 80% of the maximum distance between nodes in the linkage matrix. This pipeline produced 138 clusters for Subject A’s gut time series, 90 clusters for Subject B’s gut time series, and 46 clusters for Subject A’s salivary time series (Additional file 16). We modeled cluster dynamics using the median OTU value at each time point over all the OTUs within the cluster. Lastly, to again guard against autocorrelations in the cluster time series, we fit ARIMA models to each cluster and computed residual time series.

Correlation

We used rank-based non-parametric statistics to detect correlations between time series of detrended OTU clusters and detrended metadata. We lagged the cluster time series between -7 and +7 days, relative to the metadata, and computed Spearman correlations again using the ‘rcorr’ function in the R ‘hmisc’ package. Microbial or metadata variables with high autocorrelation (P <0.01) were excluded from analysis. We estimated false discovery rates separately for a given lag and body site (‘fdrtool’ R package [58]). As a final check for spurious cross-correlations [47], we excluded interactions that when regressed against each other, exhibited auto-correlated errors (P <0.01, Durbin-Watson test [59]).

Disturbance analyses

To simplify our analysis of how OTUs responded to prolonged travel abroad or enteric infection, we constructed a clustering pipeline similar to the one used in our host-factor/microbiota correlation testing. We inputted standardly normalized time series into this pipeline because our robust regression-based normalization routine could not confidently infer scaling factors during both Subject A and B’s diarrheal illnesses. We also used a slightly more permissive clustering threshold than the previous section (90% of the maximum distance between nodes in the linkage matrix) because we wanted to study broad bacterial trends and not more minor OTU dynamical patterns. This clustering pipeline yielded 11 OTU clusters for both subjects’ time series (Additional file 17).

We used Fischer’s exact test (SciPy.stats) to determine if clustered OTUs shared significant phylogenetic similarity. We used the Greengenes 16S tree from OTU picking as our reference phylogeny and the PyCogent library [45] to analyze this tree. Closely-related taxa with similar temporal dynamics could reflect sequencing errors associated with a single strain; this artifact could in turn cause falsely-significant phylogenetic grouping. To control for such errors, we collapsed subtrees of OTUs sharing the same cluster assignment and leaf pairwise distances less than 0.2 down to a single leaf. We then used the collapsed reference tree to construct a 2X2 contingency table with rows counting how many OTUs were part of, or excluded from, a given set of clusters and columns counting how many OTUs were within, or outside of, a given subtree.

Data availability

The read data for all samples have been deposited in the European Bioinformatics Institute (EBI) European Nucleotide Archive (ENA) under the nucleotide accession number ERP006059. Subject A’s nutritional metadata (that is, estimated daily intake of calories, total fat, saturated fat, cholesterol, protein, sodium, carbohydrates, fiber, sugars, and calcium) are provided under the group accession number SAMEG179160 and can also be found in Additional file 18. For other metadata requests, please contact the corresponding author.