The 46 preliminary targets identified from literature and available clinical tests are comprised of 15 genera and 31 species. To optimize the bioinformatics pipeline for accurate detection of the maximum number of targets, the following performance metrics were evaluated based on the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) detected in a manually curated amplicon database (described in S1 Doc ): specificity = TN / (TN + FP); sensitivity = TP / (TP + FN); precision = TP / (TP + FP); and negative predictive value (NPV) = TN / (TN + FN). After optimization, 28/46 preliminary targets passed our stringent threshold of 90% (red vertical line) for each of the parameters, resulting in the accurate detection of all genera (light blue) except for Pseudoflavonifractor, and 14/31 species (dark blue).

The bioinformatics annotation pipeline developed for this method was specifically designed to have high prediction performance. To this end, we implemented a taxonomy annotation based on sequence searches of 100% identity over the entire length of the 16S rRNA gene V4 region from the preliminary targets in our database ( S1 Doc ). Curated databases were generated for each of the taxa in our preliminary target list using the performance metrics sensitivity, specificity, precision, and negative predictive value as optimizing parameters. In other words, the bioinformatics pipeline was optimized to ensure that a positive result truly means the target is present in the sample and a negative result is only obtained when no target is present in the sample. After optimizing the confusion matrices for all preliminary targets, 28 out of 46 targets passed our stringent threshold of 90% for each of the parameters ( Fig 2 ). The resulting target list is composed of 5 known pathogens, 3 beneficial bacteria, and 20 additional microorganisms related to various gut afflictions ( S2 Table ), including commensal bacteria and one archaeon. On average the sensitivity, specificity, precision, and negative prediction value of the microorganisms on our target list are 99.0%, 100%, 98.9%, and 100%, for the species, and 97.4%, 100%, 98.5%, and 100% for the genera.

To derive a preliminary target list of bacteria and archaea to include in our assay, we first identified clinically relevant microorganisms present in the human microbiome. We performed an extensive review of the literature and clinical landscape, and obtained evidence supporting the importance of hundreds of microorganisms known to inhabit the human gut. We included these in our initial list, along with organisms that are commonly interrogated in clinical tests. This initial list was further evaluated for positive and negative associations with several indications, including flatulence, bloating, diarrhea, gastroenteritis, indigestion, abdominal pain, constipation, infection, inflammatory bowel syndrome, ulcerative colitis, and Crohn's disease-related conditions. Ultimately, we compiled a preliminary target list containing 15 genera and 31 species of microorganisms associated with human health status ( S1 Table ), including pathogenic, commensal, and probiotic bacteria and archaea.

Healthy participant stool microbiome data were analyzed to determine the empirical reference ranges for each target. The boxplot displays the relative abundance for each of 897 self-reported healthy individuals, revealing the healthy ranges of abundance for the taxa in the test panel. The healthy distribution is used to define the 99% confidence interval (red line). Boxes indicate the 25th–75th percentile, and the median coverage is indicated by a horizontal line in each box. Even in this healthy cohort, many of the bacteria that are associated with poor health conditions are present at some level. As most taxa are absent in a significant number of individuals most boxes expand to 0%, the healthy lower limit (not shown).

Many clinically relevant microorganisms associated with health and disease are present at some level in the gut of healthy individuals. The clinical significance of microbiome test results is determined not only by the identity, but also the quantity of distinct species and genera within the context of a healthy reference range. To determine the healthy reference range for the 28 targets, we established a cohort of 897 samples from self-reported healthy individuals from the uBiome microbiome research study (manuscript in preparation). Microbiome data from this cohort were analyzed to determine the empirical reference ranges for the 14 species and 14 genera. For each of the 897 samples, we determined the relative abundance of each target within the microbial population. This analysis gave rise to a distribution of relative abundance for each target in the cohort ( Fig 3 , S3 Table ). These data were used to define a central 99% healthy range with confidence intervals for each target. Many of the targets show significant spread, emphasizing the importance of microbiome identification in the context of a reference range. For example, the pathogen C. difficile is found in ~2% of the healthy cohort, and thus we define a healthy range for it from 0% to 0.18% relative abundance. Although C. difficile is an opportunistic pathogen that can cause severe diarrhea, especially among antibiotic-treated hospitalized patients [ 29 ], our results confirm that asymptomatic C. difficile colonization is not uncommon in healthy individuals [ 30 ]. Although all taxa were present in at least one of the healthy individuals, the upper limit of the reference range of the relative abundance was found to be quite high for some taxa (e.g., 63% for Prevotella and 49% for Bifidobacterium). Two species are not represented at all within the central 99% of the healthy cohort: Vibrio cholerae and Ruminococcus albus. The absence of V. cholerae is suggestive of its pathogenic nature and its relatively rare occurrence in the developed world. However, R. albus, has previously been found to be enriched in healthy subjects in comparison to patients with Crohn’s disease [ 31 ].

Commercially available verification samples (Luminex) containing real or synthetic stool samples positive for at least one control taxon from the target panel were tested using the DNA extraction, amplification and bioinformatics pipeline described in this paper. Of the 35 samples on this panel, 33 yielded 10,000 or more reads. Together, these 33 samples contained the 5 pathogenic taxa in our target list, all of which were accurately identified at a level above the maximum value of the healthy range (red lines). All 33 control samples tested within the healthy range for the remainder of the taxa on our panel (not shown), and thus were considered negative for the pathogenic taxa shown here. Five samples positive for Yersinia, a genus that is not present in our target list, were included as additional negative controls. These samples are visualized for the Escherichia-Shigella genus as they contained DNA for this taxon within the healthy range.

After establishing our ability to detect all 28 targets using synthetic DNA at relative abundances of 0.03% or more ( S2 Doc , S4 Table ), we tested 40 reference isolates from Luminex’s xTAG Gastrointestinal Pathogen Panel to establish the clinical relevance of our pipeline. These verification samples comprise real or synthetic stool samples with live or recombinant material of known composition. Two of the samples were excluded due to poor sequencing depth. The remaining samples were positive for 1 of 8 different bacterial strains corresponding to 5 of our clinical targets: V. cholerae (5), S. enterica (5), Escherichia-Shigella (13), Campylobacter (5) and C. difficile (5). All of these verification samples were correctly identified as having a relative abundance of the clinical target well above our defined healthy reference range ( Fig 4 ). Five samples containing Yersinia were tested as a negative control. Although Yersinia was included in our preliminary target list, it did not pass our stringent bioinformatics QC thresholds for accurate identification. As expected, the relative abundance of the 28 clinical targets was in the healthy range for the Yersinia positive samples, as shown for Escherichia-Shigella ( Fig 4 ).

Clinical relevance

Accurate detection of microorganisms in the context of a healthy reference range can be of great use to physicians. All of the 28 microorganisms successfully identified using 16S rRNA gene sequencing are associated with specific health conditions. For example, 2 of the microorganisms on our panel, Escherichia-Shigella and Ruminococcus, are associated with Crohn’s disease [32–37], while 5 other organisms, Akkermansia muciniphila, Bifidobacterium, Dialister invisus, Odoribacter and Roseburia, are inversely associated with Crohn’s disease [32,35–38] (Fig 5, S2 Table). To help diagnose and monitor this condition and distinguish it from other conditions with other microbial associations, it is essential to sequence a panel of microorganisms. The combinatorial information of which organisms are outside of the healthy range can be used by a physician to augment a treatment plan. For example, reducing the intake of animal based diets and diets high in resistant starches to reduce Ruminococcus [39–41] and the consumption of probiotics, inulin and oligofructoses to increase levels of Bifidobacterium [42,43].

PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Fig 5. Human health associations of the 28 targets microorganisms. All of the 28 taxa on the test have been associated with human health in the gut microbiome. Here we show the associations for 13 specific conditions. 13 of the taxa are associated with health conditions, meaning that these microorganisms have been shown to be elevated in patients suffering from these conditions. The 11 microorganisms that are inversely associated were found to be less abundant in people who have this condition in the scientific literature (S2 Table). 4 taxa are associated with some and inversely associated with other conditions. Interestingly, both elevated and reduced levels of Lactobacillus have been associated with obesity [44–46]. https://doi.org/10.1371/journal.pone.0176555.g005

The accurate detection of a great number of microorganisms within a stool sample is critical to initiate the appropriate treatment in a clinical setting. Here we have shown that 16S rRNA gene sequencing can accurately detect and quantify clinically relevant levels of 28 target bacteria and archaea. We demonstrate that many prokaryotic targets identified from the literature as associated with human health can be consolidated in an assay, and further that relating the relative levels of bacteria and archaea to a healthy reference range enables the reporting of positive results only when clinically relevant.

The selection of microorganisms for this panel was based on studies in medical journals and peer-reviewed articles. While all targets are relevant on their own, there is some overlap in the consolidated test. For example, while the Salmonella genus is unquestionably clinically relevant, testing for the genus when the test already includes the Salmonella enterica species might be clinically redundant. The only other species of Salmonella is Salmonella bongori, a species that rarely infects humans and is mostly relevant to lizards [47]. In our dataset of nearly 900 stool samples from healthy individuals, eight samples tested positive for the genus-level Salmonella target (S3 Table). In 6 of these, the relative Salmonella-genus abundance was less than 0.01%, the clinical relevance of which remains unclear. In one of the two remaining subjects, both Salmonella-genus and S. enterica abundance values were 0.674%, suggesting the same target was detected. In the remaining sample, Salmonella-genus was present at 1.84% but S. enterica was not detected, suggesting that this individual might have been colonized with S. bongori. Of note, none of these individuals reported having gastrointestinal problems. It remains to be determined whether these low counts of Salmonella are suggestive of the presence of clinically irrelevant, yet-uncharacterized strains, as has been reported in cattle [48].

While medical diagnosis has traditionally been focused on pathogens, research on the whole microbiome and its correlations with gut health continues to emerge [6,20]. The test panel presented here reports on some microorganisms that are not usually interrogated in the clinic but provide additional insight into the overall gut health of a patient in a clinical setting (S2 Table). Because our detection method is based on DNA sequencing, the target panel can readily be expanded if new information about clinically important microorganisms arises. Because 16S rRNA gene sequencing identifies and quantifies the bacteria and archaea in a sample, relevant microbial metrics such as a microbiome diversity score can also be obtained, in addition to the information about individual targets, to provide a comprehensive overview of gastrointestinal health [49,50].

As any rRNA gene based test, this assay has limitations. The test only detects and analyzes a short, specific genomic region, and taxonomic resolution or functional inference is therefore limited. For example, this assay cannot recognize the different serovars within S. enterica, or detect toxin genes that could distinguish pathogenic C. difficile or Escherichia strains from nonpathogenic strains, or resolve species within some of the genus-level targets. The correlation—or lack thereof—of 16S rRNA-based phylogenetic sequence identities with taxonomic levels such as genus or species has been extensively discussed elsewhere [51–54].

16S rRNA gene sequencing as a clinical screening tool for gut-related conditions has many advantages over traditional culture-based techniques, including ease of sampling, scalability of the test, no need for human interpretation, and the ability to provide additional information about gut health. Most importantly, it can determine the relative abundances of multiple microbial targets, and can therefore be used to detect potential deviations of one or many taxa from that of a healthy cohort. Defining the healthy ranges for gut microbes with known clinical relevance, as done in this study, is likely to bring the analysis of the composition of the gut microbiome one step closer to being part of routine health care analysis [55–57]. Thus, this method of detection for multiple clinically relevant microbial targets is a promising addition to current diagnostic techniques and treatment options.