Participants

Study participants consisted of individuals registered with the Netherlands Twin Register (NTR) (N = 419, 272 females, 147 males). The majority of individuals assayed were MZ twins and their spouses (MZ N = 332 (166 pairs), DZ N = 6 (3 pairs), spouses of MZ twins = 42, unrelated individuals = 39). Within the group of MZ twins, 45 pairs still cohabitated (mean age = 23.33, range 19-68), and 121 pairs no longer did so (mean age = 35.76, range 19-59, minimum live-apart time: 1 year, mean live-apart time: 17.77 years).

Fecal collection, DNA isolation and sequencing

Fecal samples collected from participants were stored at 4 °C until delivered to the laboratory within a 36-h period. Anaerocult was used in order to preserve anaerobic species present within a sample. The samples were homogenized, aliquoted, and stored at − 80 °C until used for microbial DNA extraction. DNA extraction was performed using the Qiagen Powersoil kit with the addition of the heating step from the Powerfecal kit (heating at 65 °C for 10 min). Resulting microbial DNA was subjected to PCR in order to amplify the V4 region of the 16S rRNA gene. The resulting library fragments were normalized using the SequalPrep normalization plates (Invitrogen, Carlsbad, CA). The libraries were pooled and analyzed via an Agilent Bioanalyzer trace. Samples were split into two sequencing runs in order to increase sample read depth. Samples were grouped together by family groups (twins, spouses) in order to make sure all samples from a family were sequenced in the same sequencing run. Sequencing data was generated on the MiSeq platform, using a 2 × 251 paired-end sequencing run with 20% Phix to increase base diversity during the run.

Sequence processing

Sequence processing was carried out as previously described [24, 25]. Briefly, after the DNA sequencing process, demultiplexed forward and reverse reads were obtained after the DNA sequencing process using Mothur (version 1.39.5 )[26]. Forward and reverse reads were overlapped in order to obtain contigs. We subsequently screened to discard reads longer than 275 base-pairs or reads that contained any ambiguous base calls. Unique reads were then aligned to a trimmed version of the SILVA (version 128) database containing the V4 region of the 16S rRNA gene. Reads that fell outside of this region were discarded. Performing the preclustering step, reads that only differed by up to two nucleotides were grouped. Chimera detection was performed using the VSEARCH algorithm (version 2.3.4) and probable chimeric reads were removed. Sequences were classified using a Bayesian classifier trained on the RDP database (version 16). Non-bacterial reads were removed from downstream analysis. After the aforementioned quality control process, sequence reads were clustered into species level operational taxonomic units (OTUs) at a 97% similarity cutoff through the use of the Opti-Clust algorithm [27], a de novo OTU clustering method implemented in Mothur (version 1.39.5). The formed OTUs were taxonomically labelled using the consensus taxonomy for each OTU. In order to explore higher taxonomic levels, phylotype binning was performed based on the classification of each sequence read. Phylotyping was performed at both the genus and family levels using Mothur (version 1.39.5). Total reads for each sample were subsampled to the depth of the sample with the lowest sequence coverage (16,242 reads). After subsampling, alpha diversity was calculated for each sample in the form of Shannon and Chao1 indices. Beta diversity measures were also generated by computing Bray-Curtis dissimilarity measures between all individuals. A mock community was also sequenced along with the samples. Analysis of the mock community sequences after the sequence QC process determined the error rate to be 0.00253%.

Cardiometabolic and inflammatory factor measurement

Data on a number of cardiometabolic and inflammatory factors were available for all of participants within the study. The measurement of these factors has been previously described elsewhere as well as the criteria for exclusion based on laboratory measurements, such as the measurements falling outside the limit of detection for a particular assay [23]. Cardiometabolic measures included body mass index (BMI), waist-hip ratio (WHR), LDL-cholesterol, HDL-cholesterol, triglycerides, glucose, insulin, systolic blood pressure (SBP), and diastolic blood pressure (DBP). Inflammatory traits included fibrinogen, interleukin-6 (IL-6), C-reactive protein (CRP), and tumor necrosis factor alpha (TNF-α). These data were used to generate disease burden scores separately for both inflammatory and cardometabolic traits. Disease burden scores were standardized (i.e., mean of zero, standard deviation of one). To ensure that an increase in the variables assayed is associated with an increase in disease burden, the scale of some variables (e.g., HDL) were reversed by multiplying the standardized score by − 1. Next, following the metabolic syndrome definition of the American Heart Association, we summed Z-scores for WHR, triglycerides, HDL-cholesterol, SBP and glucose to a single cardiometabolic burden score. There were 416 individuals that had valid measurements for all of the cardio-metabolic factors utilized. An additional pro-inflammatory burden score was computed by summing the Z-scores for fibrinogen, IL-6, CRP, and TNF-α. There were 401 individuals that had valid measurements for all of the inflammatory factors utilized.

Statistical analyses

After generation of Shannon and Chao1 indices for all participants, we sought to compare the resemblance of gut microbiota alpha diversity between individuals forming a twin or spouse pair and individuals sharing and not sharing a household. Pearson correlations in alpha diversity were computed in 1) cohabitating MZ twin pairs, 2) he non-cohabitating MZ twin pairs, 3) spouse pairs, and 4) pairs of randomly selected unrelated individuals who did not share a household. Selection of the latter pairs was performed in a manner that ensured the resulting unrelated pairs were not matched with the spouse of a co-twin and that both unrelated individuals were sequenced in the same sequencing batch. Matching a twin to the spouse of a co-twin could possibly inflate the level of similarity between the unrelated individuals whereas inclusion of unrelated pairs derived from multiple sequencing batches could artificially inflate the dissimilarity of the unrelated individuals relative to the various family pairings (MZ twins or spouse pairs), which were always sequenced in the same sequencing batch. To confirm the effects of the household on adult gut microbiota composition, Bray-Curtis (BC) dissimilarity measures based on the species OTU counts were calculated. BC measures of cohabitating and non-cohabitating twins were compared using a t-test with 10,000 permutations. Likewise, the BC measures of cohabitating spouse pairs were compared to that of non-cohabitating unrelated pairs who were sequenced in the same sequencing batch (to account for the fact that all family members – twins and spouses- were in the same sequencing batch).

Finally, restricting the our analyses to OTUs present in 40% of individuals, we computed the intraclass correlation coefficients (ICCs) in the four different sets of pairings (cohabitating MZ pairs, non-cohabitating MZ twin pairs, unrelated spousal pairs sharing a household, and unrelated opposite sex pairs not sharing a household) for individual species level OTUs to detect household effects on specific OTUs. OTUs were restricted to those present in 40% of the individuals to ensure to restrict the range of sample sizes used to estimate the ICC’s. Otherwise vastly different sample sizes. (i.e. ranging from 5 to 419) would cause strong sampling variation bias in the estimation of the ICCs. Unrelated opposite sex pairs were generated in a similar manner to the pairs of unrelated individuals described above with the addition of making sure the unrelated individuals were of the opposite sex and within 4 years of age. The threshold age difference was chosen because the mean difference in age between the spouses in the data was 3.4 years. We also tested the group differences in ICCs at the family, genus and species level using an F-test, with a Bonferroni correction on the subsequent p-values to account for multiple testing.

We considered OTUs to be affected by the shared household if they had significant intrapair similarity in cohabitating MZ pairs and significant intrapair similarity in spouse pairs but not in non-cohabitating MZ twin pairs or unrelated opposite sex pairs with a similar age distribution. For completeness we also identified OTUs with significant intrapair similarity in MZ pairs, whether cohabitating or non-cohabitating, and interpreted these as reflective of predominant genetic effects.

Given that mode of birth (cesarean section vs. vaginal birth) has been shown to at least influence the gut microbiota composition at least temporarily [28], we repeated all analyses after the exclusion of the individuals born via cesarean section (N = 43) to see how this impacted the results.