Sampling strategy

With the specific aim to cover as much of the breadth of vertebrate hosts animal diversity as possible, we collected fresh fecal samples from the five host classes Mammalia, Aves, Reptilia, Amphibia, and Actinopterygii. Sampling was mostly restricted to animals living in the wild, with some additional samples originating from domesticated livestock and pets (Supplementary Data 1). We generally excluded samples from zoo animals (20 of the 39 samples from captive animals) because artificial habitat, diet, and medication may have strong confounding effects on the natural intestinal communities. No samples were collected from aquariums. The majority of the samples were collected in Central Europe and supplemented with samples from other regions to cover phylogenetic groups lacking extant members in this region (e.g., Afrotheria, Marsupialia, Primates, or Cetacea). To ensure sample origin, samples were gathered by specialized wildlife biologists doing research on the host species in the field. In total, the dataset includes 213 samples from 128 species, each with detailed diet, habitat, and additional metadata (Fig. 1). The number of samples per species varied from 1 to 11 (mean = 1.7), with 50 species having ≥2 samples (Supplementary Fig. 2).

Fig. 1 Phylum-level grouping of microbiome diversity by host phylogeny and host metadata. a The dated host phylogeny was obtained from http://timetree.org, with branches colored by host class (purple = Actinopterygii; orange = Amphibia; green = Reptilia; red = Aves; blue = Mammalia). From inner to outer, the data mapped onto the tree is host diet (general), host diet (detailed breakdown), host habitat, host captive/wild status, the microbiome sample type, and the relative abundances of microbial phyla in each host. Relative abundances are an estimated average generated via subsampling operational taxonomic units from all samples for each host species (subsampling to 5000 for each host species). Note that “Diet (detailed)” information varies among some individuals, and the values shown are averages of the binary yes/no values (no = 0; yes = 1) for each individual. For example, the Giraffa camelopardalis samples are from two captive and two wild individuals, so the dietary information somewhat differs, resulting in intermediate values (orange). b, c show the number of samples or host species per class colored by captive/wild status or diet, respectively. Source data are provided as a Source Data file Full size image

Low prevalence and limited representation of isolates

We sequenced the 16S rRNA V4 region from feces or gut contents of all 213 samples and generated OTUs (resolved at 100% sequence identity) with the DADA227 pipeline, which produced a total of 30,290 OTUs. Most OTUs (98%) were only detected in ≤5% of samples (Supplementary Fig. 3), which may be due to the high taxonomic and ecological diversity of the hosts. Therefore, we utilized presence–absence for all subsequent OTU-based analyses unless noted otherwise (e.g., for abundance-based beta-diversity metrics). At the phylum level, two clades were found in at least one individual per species: Firmicutes (mainly Clostridia) and Proteobacteria (mainly Betaproteobacteria and Gammaproteobacteria). The next most prevalent phyla were Actinobacteria and Bacteroidetes, which were found in 87% and 86% of host species, respectively (Supplementary Fig. 3).

Mapping phylum-level relative abundances onto the host phylogeny revealed some clustering of microbiome composition by host clade and diet (Fig. 1). Notably, hosts from the same species generally showed similar phylum-level abundances (Supplementary Fig. 2). We quantified this clustering of microbiome composition on the host tree by calculating beta-dispersion (beta-diversity variance within a group) at each host taxonomic level (class down to species), and indeed we found beta-diversity to be constrained (more clustered) at finer taxonomic resolutions regardless of the beta-diversity metric (Supplementary Fig. 4).

Many of the phylum-level distributions resembled observations from other studies. For instance, Actinopterygii (i.e., ray-finned fishes) samples were mostly dominated by Proteobacteria (Fig. 1), which is consistent with a meta-analysis of fish gut microbiomes28. Proteobacteria and Firmicutes were dominant in the Chiroptera species, as seen previously21. Fusobacteria abundance ranged from 6% to 35% among the Crocodylus species, which is reflective of high Fusobacteria abundance previously observed in alligators29. Spirochaete showed high clade specificity for Perissodactyla, Artiodactyla, and Primates, which matches previous observations30,31,32. The CKC4 phylum, which lacks cultured representatives, was markedly abundant in many Actinopterygii samples, reflecting its previous observation in marine species33,34.

Given the potential for observing novel cultured and uncultured microbes among the phylogenetically diverse and mostly wild hosts, we assessed how many OTUs in the dataset were closely related to cultured and uncultured representatives in the SILVA database. We found that the vast majority (~67%) lacked a BLASTn hit to a cultured representative at a 97% sequence identity cutoff (Supplementary Fig. 5A). Even at a 90% cutoff, ~27% of OTUs lacked a representative. Most OTUs lacking a representative were Bacteroidetes or Firmicutes (46% and 12%, respectively; Supplementary Fig. 5B). Mammalia hosts possessed the majority of OTUs lacking closely related cultured representatives, but still hundreds of OTUs, mainly belonging to Actinobacteria, Proteobacteria, and Verrucomicrobia phyla, were associated with non-mammalian hosts (Supplementary Fig. 5C). In regard to completely novel diversity, ~22% of the OTUs lacked any representative sequence in the entire SILVA r132 database at a 97% sequence ID cutoff. These novel OTUs showed a similar taxonomic composition and distribution among host classes as those OTUs lacking cultured representatives (Supplementary Fig. 5).

Altogether, our assessment of OTU distribution and taxonomy in our dataset revealed that (i) OTUs are sparsely distributed, (ii) host phylogeny constrains beta-diversity, (iii) taxonomic compositions of many host species in our dataset correspond with findings from other studies, and (iv) many OTUs in our dataset, especially those observed in non-mammals, lack cultured representatives.

Host phylogeny and diet explain microbiome diversity

We utilized multiple regression on matrices (MRMs) to test how well gut microbiome diversity could be explained by host phylogeny, diet, habitat, geographic location, and technical variation. We chose MRMs because host phylogeny and geographic location can be directly represented as distance matrices (patristic distance and great circle distance, respectively) and measuring host phylogenetic similarity as a continuous variable (patristic distance) versus a discrete variable (taxonomic groupings) alleviates imbalances in representation for specific host taxonomic groups (e.g., Mammalia was highly represented). Host metadata that could not inherently be described as a distance matrix (e.g., the diet components of each species) were converted to distance matrices by various means (see “Methods”). We had no data on the genetic similarity of individuals within host species, and thus we conducted our analysis at the species level. To estimate the effects of intra-species variation in host microbiome and metadata on our MRM analysis, we performed the analysis on 100 subsampled datasets, each comprising one randomly selected sample per species. Unless noted otherwise, we used this sensitivity analysis approach for all hypothesis testing in this study (see “Methods”).

Each of our four MRM models (one per diversity metric) had a significant overall fit (p < 0.005 for all MRM models). Host diet and phylogeny were the only significant explanatory variables (Fig. 2). Diet explained a substantial amount of alpha- and beta-diversity variation (~20–30%) and was significant for all diversity metrics tested (i.e., Shannon index, Faith’s PD, unweighted Unifrac, and weighted Unifrac). However, host phylogeny was only significant for unweighted Unifrac and explained approximately 15% of the variance. Intra-species variance was lower for weighted versus unweighted Unifrac, so this likely did not cause the lack of association with host phylogeny (Supplementary Fig. 6). Instead, we postulate that host phylogeny mainly dictates community composition but not OTU abundances. Our MRM results were supported by principal component analysis (PCoA) ordinations of weighted and unweighted Unifrac values, which displayed clustering by host taxonomy and diet (Supplementary Fig. 7).

Fig. 2 Host phylogeny and diet significantly explain the aspects of microbiome diversity. The plots show the BH-adjusted p values (Adj. p value) and partial regression coefficients (Coef.) for multiple regression on matrix (MRM) tests used to determine how much alpha- or beta-diversity variance was explained by host diet, geographic location, habitat, phylogeny, and technical parameters (see “Methods”). The boxplots show the distribution of values obtained when running MRM on each of the 100 random dataset subsets, with each subsample comprising just one sample per species. The boxplots show the MRM rho coefficient and p value for each subsample. See “Methods” for a description of how each distance matrix for the MRM models was generated. Asterisk denotes significance (Adj. p < 0.05 for ≥95% of dataset subsets; see “Methods”). Box centerlines, edges, whiskers, and points signify the median, interquartile range (IQR), 1.5× IQR, and >1.5× IQR, respectively. Source data are provided as a Source Data file Full size image

Neither host habitat nor geographic location were significant, likely because these variables were strongly coupled. However, we must note that the experimental design was not directly designed to test this hypothesis (Supplementary Fig. 1). Importantly, the “Technical” covariate, which comprised sample type (feces versus gut contents) and captivity status (wild versus captive) also lacked significance for all models, suggesting no substantial effect of technical variation in our dataset. Also, we did not detect any major outlier samples in our dataset that may be skewing our results (Supplementary Fig. 8). We obtained similar results to our initial MRM analysis when we randomly selected one sample per family instead of per species (Supplementary Fig. 9), which reduced the mammalia:non-mammalia bias from 64% of samples being mammalian to 42%. However, phylogeny was not quite significant (MRM, p = 0.12), likely due to the reduced sample size. We found that these results did not substantially change when only including wild animal samples (total samples = 170; total host species = 119) (Supplementary Fig. 10), suggesting that the minor number of captive animals in this study did not substantially contribute to the observed patterns. We no longer observed a significant phylogenetic signal when including just mammals (total samples = 160; total host species = 82), which may be due to the reduced number of host species in the analysis (Supplementary Fig. 11).

We examined whether these patterns change when grouping microbes at coarser taxonomic levels (Supplementary Figs. 12, 13, 14, and 15). The results were mostly consistent with the OTU level; however, weighted Unifrac became significantly associated with host phylogeny, regardless of the taxonomic level. Also, both weighted Unifrac and the Shannon index were no longer significantly associated with host diet at the phylum level.

Further resolving the effects of host phylogeny and diet

Our MRM analyses suggest that host phylogeny and diet explain gut microbiome diversity, but this is only one line of evidence, and it does not finely resolve which particular aspects of diversity (e.g., particular OTUs) correspond with host diet and phylogeny. Therefore, we employed complementary tests to our MRM analyses to support and further investigate our findings. While animal host phylogeny is somewhat correlated with diet, our dataset comprised a highly taxonomically diverse set of species with substantially varying diets, which often did not correspond to phylogenetic relatedness (Fig. 1). We exploited this lack of complete correspondence between host phylogeny and diet to decouple the effects of each variable on microbial community diversity.

We used phylogenetic generalized least squares (PGLS) to quantify the association of diet with microbial diversity while accounting for host phylogeny. In support of our MRM results, both alpha- and beta-diversity were significantly explained by host diet (Fig. 3). We also conducted the analysis on individual OTUs and found only 2 OTUs to be significant (Fig. 3c). These OTUs belonged to the Ruminococcaceae and Bacteroidaceae families, respectively. Mapping the distribution of these 2 OTUs onto the host phylogeny revealed that the Ruminococcaceae OTU was associated with many hosts in the herbivorous Artiodactyl clade and also in the southern white-cheeked gibbon (Nomascus siki), which is an herbivore in the distantly related primate clade (Supplementary Fig. 16). In contrast, the Bacteroidaceae OTU was predominantly present among multiple distantly related herbivorous clades. The ability of diet to explain overall community alpha- and beta-diversity but only two OTUs support a hypothesis where diet predominantly selects for functional guilds of microbes (e.g., cellulolytic consortia) rather than specific OTUs.

Fig. 3 After accounting for host phylogeny, diet significantly explained alpha- and beta-diversity components but could only explain the prevalence of two operational taxonomic units (OTUs). The boxplots are distributions of phylogenetic generalized least squares R2 and Adj. p for 100 random subsamples of the datasets (one per species for each subsample). a Both alpha-diversity measures were found to be significant. b Some principal component (PC) analysis PCs were significantly explained by diet (asterisk denotes Adj. p < 0.05). The percentage of variance explained for each unweighted Unifrac PC is 18.1, 6.9, 4.2, 3.6, and 2.1 and each weighted Unifrac PC is 27.2, 10.6, 9.6, 6.4, 6.0, and 5.5. c Only two OTUs were found to be significant. Box centerlines, edges, whiskers, and points signify the median, interquartile range (IQR), 1.5× IQR, and >1.5× IQR, respectively. Source data are provided as a Source Data file Full size image

To assess the effects of host phylogeny while controlling for diet, we utilized tests for phylogenetic signal after regressing out diet. More specifically, we utilized the local indicator of phylogenetic association (LIPA) to assess whether OTU prevalence (i.e., percentage of samples where present) was similar among closely related hosts. We found very little phylogenetic signal of alpha-diversity, which contrasts the substantial association with diet, as observed via the PGLS analysis (Supplementary Fig. 17). This finding is consistent with the MRM analysis results. Also, in contrast to the PGLS analysis, we identified 121 OTUs with significant local phylogenetic signal in the host tree (Fig. 4a). These “LIPA-OTUs” differed greatly in which host clades they were associated with. More specifically, the number of LIPA-OTUs per host species ranged from 1 to 34, with only 21 hosts possessing at least 1 LIPA-OTU. OTU-specific phylogenetic signal was only associated with Mammalia species, suggesting weak or no effects of evolutionary history for non-mammalian hosts. Herbivorous species possessed the majority of LIPA-OTUs, but a minority of these OTUs were associated with some omnivorous and carnivorous species (Fig. 4a). LIPA-OTU composition varied among host clades, regardless of whether they shared the same diet (Fig. 4b), which indicates that the phylogenetic signal is indeed a result of host evolutionary history and not contemporary diet. LIPA-OTUs were most predominant among Artiodactyla species, with Primates and Perissodactyla ranked a distant second and third (Fig. 4b). This finding suggests that the effects of host evolutionary history within Mammalia are most pronounced for Artiodactyla. Interestingly, there was no OTU-specific phylogenetic signal for any macropods, even though they are foregut fermenters similar to the Artiodactyla. The same is true of Carnivora species, except for 2 members of the Felidae clade (Felis catus and Panthera pardus). Altogether, these findings support the hypothesis that mammalian evolutionary history dictates the prevalence of certain OTUs.

Fig. 4 Many operational taxonomic units (OTUs) display a local phylogenetic signal in specific host clades after accounting for diet. a The phylogeny is the same as shown in Fig. 1. The heatmap depicts local indicator of phylogenetic association (LIPA) values for each OTU–host association, with higher values indicating a stronger phylogenetic signal of OTU presence (with diet regressed-out). White boxes in the heatmap indicate non-significant LIPA indices. The dendrogram on the top of the heatmap is a cladogram based on the SILVA-derived taxonomy for each OTU (see Supplementary Fig. 18 for the full taxonomy). The dendrogram is colored by phylum. The bar plots in b and c show the number of OTUs with a significant LIPA index per host (OTUs are colored by phylum; the number of OTUs per host ranges from 1 to 34). b The bar plots summarize the number of significant OTUs per host order and diet. The bar plots in c are the same as b except the data are grouped by OTU phylum. Source data are provided as a Source Data file Full size image

The LIPA-OTUs belonged to seven bacterial phyla and one archaeal phylum (Fig. 4c; Supplementary Fig. 18). Firmicutes was dramatically more represented than other phyla, with Bacteroides the second-most common. Members of Bovidae consistently had the highest numbers of these two phyla; this finding is supported by Sasson and colleagues13, who only identified Bacteroides and Firmicutes to be heritable in cattle. The majority of the Firmicutes OTUs were members of the Ruminococcaceae, and while most of Ruminococcaceae OTUs were associated with Artiodactyla hosts, some were also observed in certain members of the Primates, Rodentia, and Perissodactyla. Other OTU clades with significant phylogenetic signal included the genera Christensenella, Blautia, and Methanobrevibacter, which were all found to be consistently heritable among multiple human cohort studies12,35. Interestingly, while humans are represented in this dataset, and a few OTUs were associated with some of the primate species, no OTUs showed a phylogenetic signal with humans (Fig. 4a). Among some very closely related OTUs, we observed that host clade specificity differed, suggesting that these taxa have diversified via adaptive specialization for particular hosts (Supplementary Data 2; Supplementary Fig. 18).

A stronger pattern of cophylogeny in Mammalia

Our finding that only Mammalia possessed OTUs with local phylogenetic signal suggests that the effects of evolutionary history on intestinal microbiome diversity may be stronger for Mammalia versus non-mammalian species. We investigated this finding by performing cophylogeny analyses, which determines whether the phylogenies of the host and symbiont (microbe) correspond in their branching patterns. While a positive correlation can be the result of other processes besides co-cladogenesis36, the pattern is consistent with a model of host–symbiont coevolution. We first utilized Procrustean approach to cophylogeny (PACo37), which performs Procrustes superimposition to infer the best fit between host and symbiont phylogenies based on symbiont occurrences in the hosts. This permutation-based approach does not rely on distribution assumptions. Moreover, the analysis generates residuals of the Procrustean fit, which describes the contribution of each individual host–symbiont association to the global fit (smaller residuals means a better fit).

The PACo analysis showed a significant global fit, regardless of intra-species heterogeneity (PACo, p < 0.002 for all dataset subsets). Host–microbiome residuals decreased in the order of Actinopterygii > Amphibia > Reptilia ≥ Aves > Mammalia, with the most dramatic decrease between Aves and Mammalia (Fig. 5), indicating that Mammalia show the strongest signal of cophylogeny. The residuals significantly differed by both host class and diet (analysis of variance, p = 1e−16 for both), but the effect size was much larger for class versus diet (F-value of 972.3 versus 536.3). Thus, while diet may somewhat confound the signal of cophylogeny, it is likely not the main driver. Conducting PACo on just mammalian species still showed a significant global fit (PACo, p < 0.002), and we found that Artiodactyla have the smallest distribution of residuals (Supplementary Fig. 19A). Excluding all Artiodactyla samples did not substantially change the results (PACo, p < 0.003); neither did sub-sampling just one sample per family in order to decrease the imbalance of host species per clade (PACo, p < 0.003; Supplementary Fig. 19B, C).

Fig. 5 Procrustean approach to cophylogeny (PACo) and Parafit show a stronger cophylogeny signal for Mammalia versus non-mammals. a Boxplots of PACo residuals between hosts and operational taxonomic units (smaller residuals means a stronger cophylogeny signal), with residuals grouped by host class and diet. b Boxplots of significant host–symbiont links as determined by Parafit analysis, with links grouped by host class and diet. For both PACo and Parafit, 1000 permutations were performed on each of the 100 dataset subsets. Box centerlines, edges, whiskers, and points signify the median, interquartile range (IQR), 1.5× IQR, and >1.5× IQR, respectively. Source data are provided as a Source Data file Full size image

We additionally evaluated patterns of cophylogeny with the Parafit analysis, which is also a permutational method but assesses similarity of principal coordinates derived from the host and symbiont phylogenies. As with PACo, the global Parafit test was significant (Parafit, p < 0.001), and Mammalia showed the strongest signal of cophylogeny (Fig. 5). Altogether, these data support a model of host–microbe coevolution, with Mammalia displaying the strongest cophylogeny signal.

The stronger signal of cophylogeny among mammals may be the result of more transient environmental microbes in the guts of non-mammals. We assessed this possibility by mapping taxa to the Earth Microbiome Project (EMP)38 16S rRNA dataset (No. of samples: Animal = 317, Human = 206, Sediment = 259, Soil = 193, Water = 242) and using the indicator value analysis39 (IndVal) to assess the specificity of taxa to (i) the guts of mammals versus non-mammals in our dataset and (ii) biomes in the EMP dataset. We found 32 bacterial genera and 1 archaeal genus to show significant specificity for mammals or non-mammals in our dataset and also significant biome specificity in the EMP dataset (IndVal, Adj. p < 0.05; Supplementary Fig. 20A). Moreover, the non-mammal associated taxa had a significantly higher specificity for environmental EMP biomes (Wilcox, p < 0.006); Supplementary Fig. 20B), which corroborates our hypothesis. Genera did not contribute equally to this signal, and actually many non-mammal and mammal specific genera were only specific to human and/or animal biomes in the EMP dataset. Still, more non-mammal-specific genera (e.g., Desulfolobus and Hyphomicrobium) were strongly associated with environmental biomes relative to mammal-specific genera. The only mammal-specific genus to show a strong environmental association was Paludibacter (Bacteroidetes phylum).

Environment filtering and microbe–microbe interactions

Our findings that diet and host evolutionary history significantly explain microbiome diversity indicate that environmental filtering plays a substantial role in microbial community assembly. In order to further test this notion and to assess how environmental filtering may differ among host clades, we utilized two ecophylogenetics analyses: mean phylogenetic distance (MPD) and mean nearest taxon distance (MNTD). These tests assess the degree of phylogenetic clustering within each sample (host) relative to a permuted null model. Assuming phylogenetic niche conservatism (i.e., closely related taxa overlap along niche axes), then host diet or gut physiology may select for phylogenetically clustered taxa with overlapping niches, while in the absence of such strong selection, competition via niche conservatism would lead to phylogenetic overdispersion40. Phylogenetic overdispersion may also result from facilitation (i.e., beneficial microbe–microbe interactions), such as when distantly related taxa form consortia to break down complex plant polymers40. MPD is more sensitive to overall patterns of phylogenetic clustering and evenness, while MNTD is more sensitive to patterns at the tree tips41.

We found that the majority of host species showed significant clustering for MNTD, with close to half for MPD (Fig. 6). Very few species showed phylogenetic evenness. Of those that did, all belonged to the Artiodactyla, except for the long-eared owl (Asio otus; Fig. 6). In support of these findings, Gaulke and colleagues42 also found lower signals of phylogenetic clustering in the Artiodactyla relative to other mammalian clades. These findings suggest that community assembly differs between Artiodactyla and non-Artiodactyla mammals, with microbe–microbe competition and/or facilitation surpassing gut environmental filtering among Artiodactyla species.

Fig. 6 Microbial communities are generally phylogenetically clustered versus evenly distributed. a The phylogeny is the same as shown in Fig. 1. From inner to outer, the data mapped onto the tree is host diet, mean standardized effect sizes for mean phylogenetic distance (MPD) and mean nearest taxon distance (MNTD), and samples with significant phylogenetic clustering or evenness based on MPD or MNTD. The animal species possessing microbial communities that were phylogenetically evenly distributed were the long-eared owl (Asio otus), fallow deer (Dama dama), red deer (Cervus elaphus), cattle (Bos taurus), and sheep (Ovis aries). b The bar charts depict the fraction of host species for each host class/diet where microbial taxa are more phylogenetically clustered (clustered) or evenly distributed (even) than expected from the null model or those that did not deviate from the null model (NA). Source data are provided as a Source Data file Full size image

We next tested how microbes co-occur among hosts, which can be influenced by selective pressures or microbe–microbe interactions. Specifically, we conducted a co-occurrence analysis to determine which OTUs significantly positively or negatively co-occurred relative to a permuted null model. Our analysis revealed that almost all significant co-occurrences were positive (Fig. 7a; Supplementary Fig. 21A). The co-occurrence network consisted of four sub-networks, each with differing taxonomic compositions and existence of “hub” OTUs (Fig. 7d). Sub-networks 1 and 2 were dominated by Ruminococcaceae and Peptostreptococcaceae, with Ruminococcaceae OTUs acting as central hubs in both (Supplementary Fig. 22). Sub-network 3 contained an Enterobacteriaceae (Proteobacteria) OTU hub and also possessed more members of Clostridiaceae, Lachnospiraceae, and Enterobacteriaceae. Sub-network 4 did not have a strong hub OTU and contained the most taxonomic diversity (Fig. 7d). Interestingly, Methanobrevibacter OTUs were only found in sub-network 1 and significantly co-occurred with Christensenellaceae OTUs as previously seen in a large human cohort study35. The presence of OTUs from each sub-network differed substantially among host clades (Fig. 7b). Sub-networks 3 and 4 were generally most prevalent in many host orders, with only one of the two networks being highly prevalent. Sub-network 1 was only prevalent in the Artiodactyla, suggesting strong host specificity of this microbial consortium. In support of this finding, the network contained a substantially higher proportion of OTUs with local phylogenetic signal among hosts relative to the other sub-networks (Fig. 7d). Sub-network 2 was only prevalent in four mammalian orders: Artiodactyla, Diprotodontia, Pilosa, and Primates. The sub-networks showed significant distributional shifts among diets (Kruskal–Wallis, p < 2.2e−16; pairwise Wilcox test, Adj. p < 0.05 for all tests), with sub-networks 1 and 2 being most prevalent among herbivores, sub-network 4 dominating in omnivores, and sub-networks 3 and 4 showing similar prevalence among carnivores (Fig. 7c).