LEfSe is an algorithm for high-dimensional biomarker discovery and explanation that identifies genomic features (genes, pathways, or taxa) characterizing the differences between two or more biological conditions (or classes) (Figure 1). It emphasizes statistical significance, biological consistency and effect relevance, allowing researchers to identify differentially abundant features that are also consistent with biologically meaningful categories (subclasses; see Materials and methods). LEfSe first robustly identifies features that are statistically different among biological classes. It then performs additional tests to assess whether these differences are consistent with respect to expected biological behavior; for example, given some known population structure within a set of input samples, is a feature more abundant in all population subclasses or in just one? Specifically, we first use the non-parametric factorial Kruskal-Wallis (KW) sum-rank test [49] to detect features with significant differential abundance with respect to the class of interest; biological consistency is subsequently investigated using a set of pairwise tests among subclasses using the (unpaired) Wilcoxon rank-sum test [50, 51]. As a last step, LEfSe uses LDA [52] to estimate the effect size of each differentially abundant feature and, if desired by the investigator, to perform dimension reduction.

Figure 1 LEfSe mines a wide range of high-throughput genetic data to find biologically relevant features characterizing one or more experimental conditions. The inputs to the system are the specifications of the biological hypothesis under investigation (conditions and inter-condition sample groupings), the high-dimensional data obtained experimentally, and, optionally, prior knowledge from literature or databases used to define known relationships between features (used for meaningful hierarchical organization of the discovered biomarkers) or samples (used for testing biological consistency of potential biomarkers). LEfSe is a three-step algorithm (detailed in Figure 6). (a) LEfSe first provides the list of features that are differential among conditions of interest with statistical and biological significance, ranking them according to the effect size. (b) For problems with known hierarchical structure, either phylogenetic or functional, we then provide a mapping of the differences to taxonomic or functional trees. (c) Finally, the system produces a histogram visualizing the raw data within the specified problem structure for each relevant feature. While LEfSe has been developed primarily for metagenomic data containing taxon or gene abundances, it can be used for biomarker discovery in any setting where prior biological knowledge regarding the structure of a comparison is coupled with statistically significant differences in high-dimensional genomic features. KEGG, Kyoto Encyclopedia of Genes and Genomes; WGS, whole genome shotgun. Full size image

We have specifically designed LEfSe for biomarker discovery in metagenomic data. We thus summarize our results here from applying the tool to 16S rRNA gene and whole genome shotgun datasets to detect bacterial organisms and functional characteristics differentially abundant between two or more microbial environments. These include body sites within human microbiomes (mucosal surfaces and aerobic/anaerobic environments), adult and infant microbiomes, inflammatory bowel disease status in a mouse model, bacterial and viral environmental communities, and synthetic data for quantitative computational evaluation.

Taxa characterizing body sites within the human microbiome

Microbial community organization at multiple human body sites is an area of active current research, since both low- and high-throughput methods have shown both differences and overlaps among the microbiota of multiple body sites [53, 54]. We examined these differences in the 16S-based phylometagenomic dataset from 24 individuals enrolled in the Human Microbiome Project [13, 55]. A minimum of 5,000 16S rRNA gene sequences were obtained for 301 samples from 24 healthy subjects (12 male, 12 female) covering 18 body sites, including 6 main body site categories: the oral cavity (9 sub-sites sampled), the vagina (3 sub-sites sampled), the skin (2 sub-sites sampled), the retroauricular crease (2 sub-sites sampled), the nasal cavity (1 sample) and the gut (1 sample). We validated LEfSe by contrasting mucosal versus non-mucosal body site classes and by comparing three levels of aerobic environments (anaerobic, microaerobic, and aerobic). In both cases, the sub-sites within each class of body site were used as a biological subclass.

Mucosal surfaces are colonized by diverse bacteria; non-mucosal microbiomes are strongly enriched for Actinobacteria

Our first analysis focused on differences in microbiota composition between mucosal and non-mucosal body sites. The oral cavity, gut, and vaginal sites were classified as sources of mucosal communities and the anterior fossa (skin), nasal cavity, and retroauricular crease as non-mucosal. Mucosal environments differ greatly from the other body sites, characterized primarily by interaction with the human immune system, oxidative challenge, and hydration [56].

LEfSe provides three main outputs (Figure 2), describing the effect sizes of differences observed among mucosal/non-mucosal communities, the phylogenetic distribution of these differences based on the Ribosomal Database Project (RDP) bacterial taxonomy [57], and the raw data driving these effects. LEfSe detected 15 bacterial clades showing statistically significant and biologically consistent differences in non-mucosal body sites (Figure 2a).

Figure 2 LEfSe results on human microbiomes. (a-c) Mucosal body site analysis. Mucosal microbial communities are diverse, while non-mucosal body sites are characterized by several clades, including the Actinobacteria. The analysis reported here is carried out on initial data from the Human Microbiome Project [55, 56] assigning the main body sites to mucosal and non-mucosal classes, and using the body sites as subclasses. These graphical outputs were generated by the publicly available LEfSe visualization modules applied on the analysis results and integrating microbial taxonomic prior knowledge [58]. (a) Histogram of the LDA scores computed for features differentially abundant between mucosal and non-mucosal body sites. LEfSe scores can be interpreted as the degree of consistent difference in relative abundance between features in the two classes of analyzed microbial communities. The histogram thus identifies which clades among all those detected as statistically and biologically differential explain the greatest differences between communities. (b) Taxonomic representation of statistically and biologically consistent differences between mucosal and non-mucosal body sites. Differences are represented in the color of the most abundant class (red indicating non-mucosal, yellow non-significant). Each circle's diameter is proportional to the taxon's abundance. This representation, here employing the Ribosomal Database Project (RDP) taxonomy [58], simultaneously highlights high-level trends and specific genera - for example, multiple differentially abundant sibling taxa consistent with the variation of the parent clade. (c) Histogram of the Actinomycetales relative abundances (in the 0[1] interval) in mucosal and non-mucosal body sites. Subclasses (specific body sites) are differentially colored and the mean and median relative abundance of the Actinomycetales are indicated with solid and dashed lines, respectively. (d,e) Aerobiosis analysis. The cladograms report the taxa (highlighted by small circles and by shading) showing different abundance values (according to LEfSe) in the three O 2 -dependent classes as described in Results; for each taxon, the color denotes the class with higher median for both the small circles and the shading. (d) The strict (all classes differential) version of LEfSe detects 13 biomarkers whereas (e) the non-strict (at least one class differential) version of LEfSe detects 60 microbial biomarkers with abundance differential under aerobic, anaerobic, or microaerobic conditions. Additional file 2 reports the non-strict version of LEfSe focused on the Firmicutes phylum, highlighting several low-O 2 specific genera within Ruminococcaceae and Lachnospiraceae. Full size image

The most differentially abundant bacterial taxa in non-mucosal body sites belong to phyla with prevalent aerobic members: Actinobacteria, Firmicutes, and Proteobacteria, including environmental organisms from the Betaproteobacteria and Gammaproteobacteria clades. Non-mucosal overrepresented genera include Propionibacterium, Staphylococcus (found exclusively in non-mucosal samples), Corynebacterium, and Pseudomonas. Also of note is the relevant representation of plastids from plant organisms (chloroplasts), for which the distribution of associated taxa varies, as some are limited to non-mucosal surfaces (environmental exposure and potentially cosmetic products) and others to the digestive track (ingested food). No clades are consistently present in all mucosal body sites, demonstrating the β-diversity of these communities (that is, differences among their population structure), but many taxa within Actinobacteria, Bacillales, and several other clades are relatively abundant at all non-mucosal sites. The within-subject β-diversity at all phylogenetic levels is highlighted in Additional file 1, quantifying the extent to which distances among different mucosal body sites are larger than the equivalent distances among non-mucosal sites. This leads to a lack of taxa common to all mucosal body sites, and therefore no taxa are determined by LEfSe to be characteristic of the mucosa as a whole.

The Actinomycetales are usually the most abundant phylogenetic unit (order level) in non-mucosal communities, with percentages higher than 90% in several skin samples and at most 20% in the great majority of the oral mucosal samples and substantially lower in the vagina and gut (Figure 2c). From a quantitative viewpoint, the taxonomic order Actinomycetales makes up essentially all of the detected members of the phylum Actinobacteria, except in the vaginal site, which reported a substantial Bifidobacteriales presence. Bifidobacteriales themselves are not detected as differentially abundant between mucosal and non-mucosal body sites, since this is a feature only of the vaginal samples and not of all mucosal body sites. The contrast of many clades' abundance versus distribution is striking; for example, the genera Alloscardovia, Parascardovia and Scardovia are present in all body sites at very low abundances, while Gardnerella is overrepresented only in vaginal samples, with over three orders of magnitude difference in abundance. A similar commonality of distribution was found for the Bacillales at an even lower abundance. At the genus level, Propionibacterium, Staphylococcus, Corynebacterium and Pseudomonas are differentiated by both distribution and abundance. The Staphylococcus genus in particular is detected by LEfSe with a very high LDA score (more than five orders of magnitude), reflecting marked abundance in non-mucosal sites (mean 10%, 18% and 21% in the skin, retroauricular crease and anterior nares body sites, respectively) and consistently low abundance in mucosal sites (mean less than 0.001%).

Classes with multiple levels: distinct aerobic, anaerobic, and microaerobic communities in the human microbiome

The roles of anaerobic metabolism in the commensal human microbiota have not yet been fully investigated due to the difficulty of studying these communities in culture. We thus further investigated the aerobicity characteristics of human microbial communities at a high level by grouping body sites into three classes with distinct levels of available molecular oxygen. The high-O 2 exposure class includes body sites directly and permanently exposed to oxygen: skin, anterior nares and retroauricular crease. The mid-O 2 exposure class includes the oral and vaginal body sites that can be directly, but not permanently, atmospherically exposed, and the low-O 2 exposure class (the gut) is mainly anaerobic. The body sites included in the three classes may have other distinguishing features in addition to different oxygen exposure and, in general, these confounding factors can cause features unrelated with aerobiosis to be detected as biomarkers. However, the LEfSe biological consistency step assures that the detected biomarkers are characteristic of all the subclasses of a given class and with respect to all subclasses of the other classes. For example, the high-abundance of a bacterial clade in the mouth due to an oral-specific niche is not detected as a biomarker unless the same niche is also present in the vaginal samples (the other body site in the mid-O 2 class) and not present in any high-O 2 or low-O 2 single body sites. So LEfSe will detect biomarkers more confidently connected with the aerobiosis characteristics than traditional methods that do not incorporate subclass information. Moreover, LEfSe is specifically able to analyze ordinal classes with multiple levels, and in agreement with established microbiology, we observed specific microbial clades ubiquitous within and characteristic to each of these three environments, detailed as follows (Figure 2d).

LEfSe allows ordinal classes with more than two levels to be analyzed in two different stringencies. The first requires significant taxa to differ between every pair of class values (that is, aerobicity in this example; see Materials and methods); the discovered biomarkers must accurately distinguish all individual classes (high-, mid-, and low-O 2 ). In this example (Figure 2d; strict version), we detected 13 clades with LDA scores above 2, showing three distinct abundance levels. Alternatively, LEfSe can determine significant taxa differing in at least one (and possibly multiple) class value(s) (non-strict version); in other words, biomarkers that distinguish at least one individual class. Using this method (Figure 2e), we find 60 clades with LDA scores of at least 2.

Using either approach, each oxygen level is broadly characterized by a specific clade. The overall abundances of the Actinobacteria phylum are higher in body sites directly exposed to molecular oxygen with several members of the Actinomycetales order that colonize the skin. Actinomycetales includes the Propionibacterium genus, which is highly abundant on the skin, low in moderate-O 2 environments, and absent from the gut. The Lactobacillales (primarily Bacilli) are specific to moderate O 2 exposure levels, with conversely lower presences in the high-O 2 exposure class, and are again absent from the gut. The Bacteroidaceae (particularly Bacteroides) are ubiquitous in anaerobic samples; interestingly, however, members of this family are more abundant in high oxygen availability conditions (particularly in skin and retroauricular crease) than in medium oxygen availability, showing the niche diversity within the phylogenetic branching. This is in agreement with observations that the microenvironment of many microbial consortia shows extreme biogeographical variation with respect to nutrients, metabolites, and oxygen availability [58, 59].

Bifidobacteria and additional clades are underrepresented in a mouse model of ulcerative colitis

Rodent models have been established to provide a uniquely accurate and tractable model for studying the gut microbiota, including the molecular and cellular mechanisms driving chronic intestinal inflammation [60–63]. In particular, mouse models of inflammatory bowel disease [63] facilitate a mechanistic evaluation of the contribution of the gut microbiota to the initiation and perpetuation of chronic intestinal inflammation, as occurs in human Crohn's disease and ulcerative colitis [64]. One host molecular mechanism known to maintain the balance between immune regulation and the commensal microflora is T-bet, a transcription factor expressed in many immune cell subsets. Its loss in the absence of an adaptive immune system results in a highly penetrant and aggressive form of ulcerative colitis [65] that is specifically dependent on and transmissible through the gut flora. We thus sought to investigate the characteristics of the fecal microbiota in a mouse model of spontaneous colitis that occurs in a colony of Balb/c T-bet-/- × Rag2-/- mice using 16S rRNA gene metagenomic data [66, 67].

LEfSe was applied to the microbiota data of 20 T-bet-/- × Rag2-/- (case) and 10 Rag2-/- (control) mice (dataset provided in Additional File 10), finding 19 differentially abundant taxonomic clades (α = 0.01) with an LDA score higher than 2.0 (Figure 3). These differentially abundant clades were consonant with both our prior 16S rRNA-based sequence analysis using complete linkage hierarchical clustering and quantitative real time PCR-based experiments performed on the same fecal DNA samples [67]. More specifically, the marked loss in Bifidobacteriaceae and Bifidobacterium associated with T-bet-/- × Rag2-/- we observed here may explain the positive responsiveness of this colitis to a Bifidobacterium animalis subsp. lactis fermented milk product validated with low-throughput approaches [67].

Figure 3 Comparison between Rag2-/- (control) and T-bet-/- × Rag2-/- (case) mice highlighting that, at the phylum level, Firmicutes are enriched in T-bet-/- × Rag2-/- mice, whereas Actinobacteria are enriched in Rag2-/- mice. In agreement with previous culture-based studies, Bifidobacterium species are underabundant in T-bet-/- × Rag2-/- mice [68], and LEfSe highlights several additional genus-level clades, including the specifically depleted Roseburia and Papillibacter within the otherwise overabundant Firmicutes. Full size image

At the family level, the Rag2-/- enrichment of Bifidobacteriaceae, Porphyromonadaceae, Staphylococcaceae and the T-bet-/- × Rag2-/- enrichment of Lachnospiraceae confirm our reports in [68] using culture-based and quantitative real time PCR techniques. LEfSe's LDA score more informatively reorders these taxa relative to the P-values found for these families in our previous work, highlighting the Bifidobacteria and, interestingly, several clades within the Clostridia. These include the Rag2-/--specific Roseburia and Papillibacter genera belonging to T-bet-/- × Rag2-/--specific families (Lachnospiraceae and Ruminococcaceae). The significant presence of Metascardovia (Bifidobacteriaceae) in Rag2-/- mice is also interesting, as it may have a role similar to Bifidobacterium and because Metascardovia has been previously observed primarily in the oral cavity [68]. This analysis both highlights the agreement of LEfSe's effect size estimation with respect to low-throughput confirmations and suggests additional clades to be further investigated experimentally.

A comparison with current metagenomic analysis tools using viral and microbial pathways from environmental data

We applied LEfSe to the environmental data of [69], a dataset with the goal of characterizing the functional roles of viromes (that is, viral metagenomes) versus microbiomes (that is, bacterial metagenomes). This task was used in [45] to characterize the Metastats algorithm on the same raw data. Among the 29 high-level functional roles (including unclassified roles) in the subsystem hierarchy of the SEED [70] and NMPDR [71] frameworks, LEfSe identifies only the 'Nucleosides and nucleotides' subsystem to be strictly differentially abundant among all environmental subclasses, specifically with higher levels in viromes than microbiomes. This is an accurate characterization of exactly the protein function most commonly encoded in viral genomes, whereas bacterial genomes of course encode a wide range of less specifically enriched functionality. When LEfSe is relaxed to detect significant variations consistent for at least one, rather than all, environmental subclasses, we additionally determine the 'Respiration' subsystem to be significantly enriched in microbiomes with respect to viromes, likely reflecting the uniformly aerobic bacterial metabolism captured by these data.

In addition to the Nucleosides and nucleotides and Respiration subsystems, Metastats [45] reports five other high-level functional roles as differentially abundant (P = 0.001). However, when taking the subclass structure into account across the sampled environments, these additional differences show much less consistent variation. This is demonstrated in Figure 4, which reports histograms of raw data for these cases and the different results of LEfSe, Metastats and the KW test alone. Moreover, since the subsystem framework is hierarchical (three levels), LEfSe's results include a cladogram showing the significant differences on each level (see Figure 4 for a two-level cladogram, and Additional file 2 for a three-level cladogram).

Figure 4 LEfSe highlights pathways consistently differential between bacterial microbiomes and viromes within diverse environmental subclasses. (a) Using the SEED [71] catalog of functional pathways, LEfSe reports Nucleoside and nucleotide metabolism and Respiration to differ consistently between bacterial microbiomes and viromes across environmental samples described in [70]. The former is significant using the strictest all-subclasses test, the latter in the more lenient one-subclass test. (b) A two-level cladogram reporting the significant pathway differences as visualized using the SEED hierarchy (see Additional file 3 for the three-level cladogram and detailed differences). (c) Metastats [45] reports four additional pathways differential among these data (Carbohydrates, DNA metabolism, Membrane transport and Nitrogen metabolism). Using only the KW test portion of LEfSe (α = 0.05), we obtain results consonant with Metastats (excluding Nitrogen metabolism). However, as shown here, an overview of the abundance histograms of these subsystems demonstrates them to be less consistent across environments (for example, Coral and Hyper-saline subclasses in the Carbohydrates, Membrane transport and Nitrogen metabolism) and to lose significance within individual subclasses (as for the DNA metabolism subsystem). Full size image

Considering all three levels of SEED functional specificity, LEfSe reports 59 subsystems to be more abundant in microbial metagenomes and only 7 that are more abundant in viral metagenomes (Additional file 3). Bacterial genomes encode a much greater quantity and diversity of biomolecular functionality than most viral genomes, and these differences are thus to be expected. However, they also highlight a consideration specific to most metagenomic (and, more generally, ecological) analyses, which typically analyze relative abundances. A few very common subsystems in viromes (that is, Nucleosides and nucleotides) will force the relative abundance of all other subsystems to decrease, resulting in apparent under-abundance. The subsystems detected to be virus-specific may thus show this trend in part due to the normalization of abundances in each sample. This issue is specific to neither LEfSe nor Metastats, however, and must be taken into account during interpretation of any relative abundance data, metagenomic or otherwise [72].

Functional activity within the infant and adult microbiota indicates post-weaning microbial specialization

Just as LEfSe can determine whether organisms or pathways are differentially abundant among several metagenomic samples, it can also focus on individual enzymes or orthologous groups. Kurokawa et al. [73] analyzed 13 gut metagenomes from nine adults and four unweaned infants in terms of the functions of orthologous gene families. They originally did this by comparing the COGs [74, 75] found in each metagenome to a reference database; later, White et al. [45] applied the Metastats algorithm to directly detect differences between infant and adult microbiomes. Using significance α values of 0.01 due to the low cardinality of the classes (in particular the infant class), LEfSe detected 366 COGs to be enriched in either adult or infant metagenomes, 17 of which have a LDA score higher than 3 (Additional file 4).

Among the 17 COG profiles with LEfSe scores higher than 3, 11 are also detected by Metastats. The six COGs not detected by Metastats (Additional file 5) are Outer membrane protein (COG1538) and Na+-driven multidrug efflux pump (COG0534), enriched in adults, and Transposase and inactivated derivatives (COG2801, COG2963), Transcriptional regulator/sugar kinase (COG1940) and Transcriptional regulator (COG1309), enriched in infants. All six COGs possess abundance profiles that are completely non-overlapping between infant and adult individuals (apart from COG1538, in which the lowest level in adults is slightly lower than the highest in infants) and are thus nominally quite discriminative. On the other hand, among the 192 COGs found by Metastats, 9 are not detected by LEfSe even at the lowest LDA score threshold (Additional file 6). All possess overlapping abundance values between infant and adult classes (at least two, and often more, of the highest samples in the less abundant class overlap the putatively more abundant class). This lack of discriminatory power precludes LEfSe from highlighting the differences as significant between adults and infants, particularly given the low number of infant samples.

Intriguingly, LEfSe's distinct list of functional activities in the core infant and adult microbiomes is suggestive of 'generalist' microbial activity during early life and specialization over time [76]. In fact, inspecting the five differentially abundant COGs with the highest effect sizes for each class, we find for infants very high-level functional groups related to broad transcriptional regulation (COG1609, COG1940, COG1309 and COG3711). In adults, all five represent more specialized orthologous groups, including COG1629 (Outer membrane receptor proteins, mostly Fe transport), COG1595 (DNA-directed RNA polymerase specialized sigma subunit, sigma24 homolog), and COG4771 (Outer membrane receptor for ferrienterochelin and colicins). Since the number of differentially abundant COGs is very high (366), this observation was only highlighted at the top of the candidate biomarker list due to LEfSe's effect size quantification, which allows the most characteristic differences among classes to emerge. For the same reason, we can easily confirm that sugar metabolism plays a crucial role in the infant gut and iron metabolism in adults, as already stated in [45, 73]; the COGs with the highest LDA scores indeed possess sugar and glucose functional activities for infants and iron-related functionality for adults.

LEfSe achieves a very low false positive rate in synthetic data

We further investigated the ability of LEfSe to detect biomarkers using synthetic high-dimensional data (see Materials and methods for the description of the dataset) in comparison with the KW test alone (a non-parametric adaptation of the analysis of variance (ANOVA)) and with Metastats [45]. The LDA effect size step of LEfSe is not considered here for simplicity, and the artificial data are detailed in Figure 5.

Figure 5 Comparison of LEfSe and the KW test alone for false positive and negative rates in synthetic data. Both tests used α = 0.05 in all cases, and the three artificial datasets comprise 100 samples, each in two classes, each with two subclasses of cardinality 25. The samples consist of 1,000 synthetic features taking the place of microbial taxa, pathways, and so on; half are negative (not biomarkers) and the other half positive. (a) LEfSe and KW false positive and negative rates at increasing values of the difference between class means. Negative features are normally distributed with parameters (μ = 10,000, σ = 100) across classes; positive features contain classes with increasingly different means. (b) Performance as standard deviation varies within classes (rather than the difference between means, fixed at 2,000). (c) Performance as standard deviation increases within inconsistent subclasses. Negative features have subclasses sampled from the same normal distribution (and thus not representing consistent biomarkers). Positive features are distributed as in (b). In all cases, LEfSe sacrifices a small number of false negatives in order to achieve a false positive rate near zero, with the goal of ensuring that biomarkers of large effect size will be both reproducible and biologically interpretable. Full size image

Theoretically, the settings of the first two experiments (Figure 5a,b) exactly match the application conditions for the KW test. The false positive rate (mean 2.5%, regardless of the distance between feature means and of the standard deviation of the normal distribution) is in fact consistent with the α value of 0.05, given that the negative features are half of the total. LEfSe behaved qualitatively very similar to KW, but with a considerably lower false positive rate (less than 0.5% in the great majority of the cases against a mean value of 2.5%) and a higher false negative rate. In biology, false positives are often perceived as more dramatic than false negatives [77–79]; this is often attributable to the fact that it is undesirable to invest in expensive experimental follow-up of false positives, whereas in high-throughput settings, a few true positives outweigh the false negatives that are left uninvestigated. With this motivation for minimizing false positives, we conclude that LEfSe performs at least as well as KW when no meaningful subclass structure is available. On the other hand, when subclasses can be identified internally to the classes and some of them do not agree with the trend among classes, LEfSe performs qualitatively and quantitatively much better than KW (Figure 5c). The false positives are in fact always substantially lower than KW, whereas the false negatives are higher only for very noisy features. Metastats [45] seems to achieve results very similar to KW (Additional file 7) with the same disadvantages with respect to LEfSe.