Study Approval

All volunteers provided written informed consent to participate in this study. All protocols and consent forms have been approved by the GHESKIO and Weill Cornell Medicine institutional review boards. All methods and procedures were performed in accordance with the relevant institutional guidelines and regulations.

Patient Recruitment and Protection of Human Subjects

Subjects were enrolled through the Tri-Intuitional Tuberculosis Research Unit (TBRU) in conjunction with the GHESKIO Centers, in Port-au-Prince, Haiti, where all participants provided written, informed consent. All TBRU protocols and consent forms for samples collected at GHESKIO were approved by Institutional Review Boards at the GHESKIO and Weill Cornell Medicine (see Study Approval). A dedicated clinical field team at the GHESKIO Centers in Port-au-Prince, Haiti recruited research volunteers as part of the NIH U19-funded Tuberculosis Research Unit (AI111143). Patient Mtb-infection status is determined using quantiferon IFNγ release assay (IGRA) status, and active TB disease is determined using standard clinical assessments. All cases with active pulmonary TB receive periodic follow-up appointments while on treatment, and anyone with known contact to an active TB patient receives a six-month follow-up and is re-screened for IGRA status. All patient samples were de-identified on site using a barcode system before they were shipped to NYC for analysis. Human DNA was decontaminated from metagenomic shotgun sequencing data before analysis and publication, consistent with the removal of all biometric identifiers according the Health Insurance Portability and Accountability Act30. All clinical metadata was collected on site and managed through the REDCap data management system31.

Clinical characteristics of study groups from the TBRU study

We recruited four groups of individuals using a cross-sectional research study design. To characterize the intestinal microbiomes of individuals from the Haitian population, we recruited two groups of control individuals, 50 with no Mtb infection (IGRA-), and 25 latently infected by Mtb (LTBI), as defined by a positive Interferon Gamma Release Assay (IGRA) test. To determine the effect of HRZE antimycobacterial treatment on the intestinal microbiome, we recruited 19 volunteers currently on treatment with HRZE for drug sensitive Tuberculosis. 3 of these treated individuals were on TB therapy for longer than the standard 6 months, due to clinician discretion (see Table 1). In addition, to determine the duration of the microbiome perturbation of HRZE treatment, we recruited 19 previously treated cases who were cured of active TB. The clinical characteristics of the groups are given in Table 1. To appropriately control for age, we divided our LTBI group into two distinct control subgroups, designated LTBI (treatment control) and LTBI (cured control), since microbiome composition can vary significantly with age32. Given the age range of the treatment and cured patients, we used controls under the age of 33 years old for the treatment control group, and controls under the age of 30 for the cured control group. All subjects are HIV negative. However, other clinical variables, such as diabetes history, were not available.

DNA extraction from stool

Stool specimens were collected and stored for less than 24 hours at 4 °C, aliquoted (~2 mL each), frozen at −80 °C, and shipped to NYC. ≈500 mg of stool from frozen samples was suspended in 500 μl of extraction buffer (200 mM Tris-HCl, pH = 8.0; 200 mM NaCl; 20 mM EDTA), 210 μl of 20% SDS, 500 μl of phenol/chloroform/isoamyl alcohol (25:24:1), and 500 μl of 0.1-mm-diameter zirconia/silica beads (BioSpec Products). Samples were lysed via mechanical disruption with a bead beater (BioSpec Products) for two minutes, followed by two extractions with phenol/chloroform/isoamyl alcohol (25:24:1). DNA was precipitated with ethanol and sodium acetate at −80 °C for 1 hour, re-suspended in 200 μl of nuclease-free water, and further purified with the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’s protocols, including Protein removal by Proteinase K treatment. DNA was eluted in 200 μl of nuclease-free water and sorted at −20 °C.

16S rDNA sequencing

Primers used to amplify rDNA were: 563 F (59-nnnnnnnn-NNNNNNNNNNNN-AYTGGGYDTAAAGN G-39) and 926 R (59-nnnnnnnn-NNNNNNNNNNNN-CCGTCAATTYHTTTR AGT-39). Each reaction contained 50 ng of purified DNA, 0.2 mM dNTPs, 1.5 μM MgCl 2 , 1.25 U Platinum TaqDNA polymerase, 2.5 μl of 10 × PCR buffer and 0.2 μM of each primer. A unique 12-base Golay barcode (Ns) preceded the primers for sample identification after pooling amplicons. One to eight additional nucleotides were added before the barcode to offset the sequencing of the primers. Cycling conditions were the following: 94 °C for 3 min, followed by 27 cycles of 94 °C for 50 s, 51 °C for 30 s and 72 °C for 1 min, where the final elongation step was performed at 72 °C for 5 min. Replicate PCRs were combined and were subsequently purified using the Qiaquick PCR Purification Kit (Qiagen) and Qiagen MinElute PCR Purification Kit. PCR products were quantified and pooled at equimolar amounts before Illumina barcodes and adaptors were ligated on using the Illumina TruSeq Sample Preparation procedure. The completed library was sequenced on an Illumina Miseq platform per the Illumina recommended protocol.

16S Bioinformatics Analysis

For 16S MiSeq sequencing, paired-end reads were joined, demultiplexed, filtered for quality using maximum expected error (Emax = 1), and dereplicated. Sequences were grouped into operational taxonomic units (OTUs) of 97% distance-based similarity using UPARSE33. Potentially chimeric sequences were removed using both de novo and reference-based methods (where the Gold database was used for the latter)34. Taxonomic assignments were made using BLASTN35 against the NCBI refseq_rna database with custom scripts36. Our approach allows for the identification of the top 30 taxa associated with a particular OTU, thus the taxonomic nomenclature that we use for 16S is versatile. This OTU calling data is available in Supplementary Tables 2 and 3: Supplementary Table 2 has OTU BLASTN results for the LTBI (treatment control) and Treatment cohorts, and Supplementary Table 3 has OTU BLASTN results for LTBI (cured control) and Cured cohorts. A biological observation matrix (biom)37 file, a taxonomy file, reference sequence file, and tree file were constructed using QIIME commands. These files were imported into R38 and merged with a metadata file into a single Phyloseq object39. Phyloseq was used for all downstream analysis of 16S taxonomic data, and plots were made with the ggplot2 package40.

Shotgun Metagenomic Sequencing

Between 150 and 200 ng of DNA isolated from stool (vide supra) was sheared acoustically. Hiseq sequencing libraries were prepared using the KAPA Hyper Prep Kit (Roche). PCR amplification of the libraries was carried out for 6 cycles. Samples were run on a Hiseq 4000 in a 125 bp/125 bp paired end run, using the TruSeq SBS Kit v3 (Illumina). There were an average number of read pairs per sample of around 11 million.

Shotgun Bioinformatics Analysis

For the analysis of shotgun metagenomic reads, sequences were first trimmed and removed of host contamination using Trimmomatic41 and Bowtie242. Host-decontaminated reads were then profiled for microbial species abundances using Metaphlan243, and for abundance of Uniref gene and KEGG orthologs, and functional pathways (Metacyc pathways, KEGG pathways, and KEGG modules) using the software pipeline HUMAnN222 and in-house written scripts (available upon request). Normalized taxonomic, gene, and pathway abundances were then used for downstream statistical analysis in R. All intestinal microbiome samples were sequenced using 16S rDNA sequencing, however, only a subset of controls were sequenced using metagenomics. Due to sample size limitations, for the metagenomic DNA sequencing comparisons, we combined both Mtb uninfected and LTBI individuals into a healthy control group which was used as the comparator for metagenomic analyses.

Statistical Analysis

The ability to detect differentially abundant OTUs between groups of people is critical for comparison between groups, and various methods exist and have been validated for this sort of analysis. For 16S rDNA sequencing, we employed the tools available within the Phyloseq package to manipulate the data and metadata for downstream analysis. Raw counts with taxonomy and metadata were piped into the DESeq2 package for differential abundance analysis using the negative binomial distribution assumption with zero inflation19. This method assumes that for many OTUs, the variance in abundance (i.e., read count) between samples or groups exceeds the mean read count (often zero). When this is true, the DESeq method can be used to transform the data so that between sample or between group differences may be compared more accurately. Homoscedastic abundance data was used to generate heatmaps in Fig. 3c and d, by applying a variance stabilizing transformation from fitted dispersion-means to transform the count data. We additionally employed the microbiome-friendly linear discriminant analysis, effect size (LEfSe) tool20 to detect statistically significant differences between clinical groups. This technique first employs the non-parametric Kruskal-Wallis (KW) sum-rank test between different groups of people (i.e., healthy [comprised of Mtb uninfected and LTBI], on HRZE treatment, or cured), followed by linear discriminant analysis to estimate the size of the effect (i.e., the degree of significant differential abundance between a particular OTU, taxa, gene, or pathway between groups). We attempted to employ both the DESeq2 and LEfSe methods, and try to emphasize where there is overlap. All figures in the paper that are related to 16S sequence analysis are plotted using the normalized and transformed abundances from the DESeq2 package. For the statistical analysis of the results from shotgun metagenomics reads, data were imported into R and converted to Phyloseq objects with custom scripts. Custom code implementing non-parametric tests (Wilcoxon-signed rank) with FDR correction (Benjamini & Hochberg method) as well as LEfSe20 were used to test for differential abundances for taxa, and functional pathways. For the LTBI-Treatment and LTBI-Cured comparisons p-value threshold was kept at 0.05 for both the initial Kruskall-Wallis test and the subsequent sex-matched subclasses Wilcoxon-signed rank tests. We additionally employed the Permanova and Betadisper tests using the adonis function in the Vegan package in R. Adonis partitions a distance matrix of OTU count data and runs an analysis of variance between groups of samples. Betadisper further supports this conclusion by determining if the variance between the two groups is similarly distributed. All box-and-whisker plots were generated with the ggplot240 function geom_boxplot, which shows the first and third quartiles of the dataset and the median of the data in the box, the whiskers show 1.5 times the value of the interquartile range of the box hinge, and outliers are shown as dots. All other plots were made using Prism 7.

Data Availability

All sequencing data and computer code, as well as metadata supporting the findings of this study are available from the corresponding authors upon request.