Sample collection

All samples were collected from wild non-human primates by collaborators at field sites around the world (Table S1). In every case, bulk fecal samples were collected immediately after defecation with a sterile utensil (e.g., plastic spoon) and stored in a collection tube with either 95% ethanol or RNALater. Samples were stored and transported to the United States by collaborators. Table S1 lists the responsible collaborator, sampling site, sample size, and preservation method for each non-human primate species. Appropriate government permits and IACUC protocols were obtained by each collaborator independently.

Sample processing for 16S rRNA gene amplicon sequencing

We began our analyses by describing the microbial taxonomic composition of all samples. To do this, we followed the Earth Microbiome Project protocol [19]. We extracted microbial DNA from all samples using the MO BIO PowerSoil DNA extraction kit. PCR targeting the V4 region of the 16S rRNA bacterial was performed with the 515F/806R primers, utilizing the protocol described in Caporaso et al. [20]. We barcoded and pooled amplicons in equal concentrations for sequencing. We then purified the amplicon pool with the MO BIO UltraClean PCR Clean-up kit and sequenced on the Illumina MiSeq sequencing platform (MiSeq Control Software 2.0.5 and Real-Time Analysis software 1.16.18) at the BioFrontiers Institute Next-Generation Genomics Facility at the University of Colorado, Boulder, USA. Samples were pseudo-randomly assigned to three different MiSeq runs as they were accumulated so that samples representing a given host clade or diet type were never all sequenced on the same run. In several cases, samples from the same host species were assigned to different runs.

Quality filtering and OTU-picking

The single-end sequencing reads from the 515f primer were quality-checked using the default settings for the split_libraries_fastq.py function in QIIME v1.9.0 [21]. After quality filtering we obtained 12,178,012 reads associated with these samples with an average of 23,152 reads/sample (range: 0–80,714 reads/sample).

Following common practice in microbiome research, sequences were initially clustered into representative bacterial operational taxonomic units (OTUs) using the sortmerna/sumaclust implementation of open-reference OTU-picking at 97% sequence similarity [22]. Sequences were aligned [23], and taxonomy was assigned using UCLUST [24] and the Green Genes 13_8 database [25, 26]. Sequences representing chloroplasts and mitochondria were filtered out, and any OTUs representing <0.00005% of the total dataset were filtered out as recommended for Illumina-generated sequencing data [27]. A subset of samples were randomly selected for analysis for each host species (Table S1). The data for these samples were rarefied to 15,012 reads/sample (single_rarefaction.py).

To increase our ability to describe patterns of microbial community structure at finer taxonomic resolution, we also processed sequences using Deblur [28], which bypasses the OTU clustering algorithm described above. Briefly, this algorithm uses Illumina error profiles to obtain putative error-free sequences that describe microbial community composition at the sub-OTU (sOTU) level. To place deblurred sequences into a phylogenetic context, we used SEPP [29] to insert unique deblurred V4 fragments into the most recent available Greengenes phylogeny of representative 97% clustered full-length 16S sequences (Greengenes v. 13_8). SEPP was run with an alignment subset of 100 and placement subset set of 500. Reference sequences were then trimmed from the tree, leaving the subsequent phylogeny for downstream applications, including alpha- and beta-diversity calculations and balance tree analysis. Taxonomy was also inferred from the SEPP insertions. For each deblurred sequence, SEPP returns a set of (at most) seven highest-likelihood candidate placements in the reference phylogeny, each of which includes an attaching branch from the reference tree along with a probability. For each branch in the reference tree, a taxonomic label was assigned at each rank if and only if at least 95% of the leaves below that branch share the same label. The root of the tree was located on a branch that splits the kingdoms Bacteria and Archaea perfectly, so every rank of the taxonomy is well contained on one side of the root or the other. For a given deblurred sequence at a given taxonomic rank, each candidate placement inherits the label of its attaching branch and a label is assigned to the sequence if candidate placements representing at least 80% cumulative probability share that label. Effectively, labels were assigned to internal branches of the Greengenes tree by a de-facto voting of child leaves with a quorum of 95%, and query sequences were labeled by SEPP if it assigned at least 80% probability to branches with a common label.

All subsequent statistical analyses were performed on both the OTU-clustered dataset and the deblurred dataset, and results were consistent across methods. The analyses of microbial community taxonomic composition presented in the main text utilize the deblurred data unless otherwise noted.

Analysis of sample taxonomic composition

Once processed, we used sequence data to compare richness, diversity, and microbial community composition among samples. We generated beta diversity distance matrices using QIIME (beta_diversity_through_plots.py), and we visualized clustering patterns among samples using principal coordinates analysis (PCoA, Emperor v 0.9.51 [30] and non-metric multi-dimensional scaling (vegan package, R software, version 3.0.2)). We calculated pairwise distances between samples using unweighted UniFrac and weighted UniFrac similarity indices [31]. We tested for significant differences in sample clustering patterns and microbial community composition across host clades (Old World monkey, New World monkey, ape, lemur) and diet type (folivore, non-folivore, controlling for host clade) for each species using permutational analysis of variance (PERMANOVA, adonis function in the vegan package, R software, version 3.0.2). Because PERMANOVA is sensitive to differences in dispersion between groups, we also tested for these differences. Host phylogenetic groups exhibited significantly different dispersion (F 3,150 = 13.4, p < 0.01) while host diet groups did not (F 3,150 = 3.5, p = 0.06). However, visual inspection of clustering plots suggests that differing dispersion is not driving patterns of significance across host phylogenetic groups. We calculated the number of observed sOTUs and the Faith’s phylogenetic diversity [32] to describe the alpha diversity in each sample using QIIME (alpha_rarefaction.py). We then identified core sOTUs shared by 90% of the samples for each host clade and in 80% of the samples for each host diet type (compute_core.py).

To detect sOTUs that were significantly different in relative abundance among the four clades of primates we utilized a linear discriminant analysis of effect size (LEfSe; [33]). We assigned primate families as the class vector and kept features with a logarithmic LDA score of >3 using default parameters. We reran this analysis using diet as the class vector and primate family as the subclass vector to detect sOTUs different in abundance between folivores non-folivores; however, no features were detected even with a low LDA score cutoff of 0.8.

Because, we observed high levels of variation in the distribution of bacterial taxa among primate clades, we also used a more sensitive method for detecting differences in microbial community composition using a concept known as balance trees [34]. These balances are the log-ratios of phylogenetic clades and analyzing these balances alleviates the common problems associated with compositionality in microbial sequence data [35]. The specific methodology used for constructing and analyzing can be found at https://github.com/biocore/gneiss. Briefly, a pseudocount of 1 was added to all of the values in the deblurred sequence table to account for zeroes and then transformed using the isometric log-ratio calculation [36]. Using the microbial phylogenetic tree built during processing, balances were calculated by computing the log-ratio of proportions between adjacent phylogenetic clades at each internal node of the tree. A linear mixed effects model was then run on each balance, to test for significant differences in the ratios of bacterial taxa among folivorous and non-folivorous lineages while accounting for phylogeny and variability among individuals as random effects.

Testing the effect of host geography

Because host phylogeny, geographic location, and local diet are often confounded, we wanted to more closely explore the potential influence of host geography and local diet on our dataset. Therefore, we created two additional sets of PCoA plots (based on 97% OTUs) with new samples included. First, we examined only New World monkeys but utilized additional howler monkey samples collected from different sites with different forest types [37]. Specifically, we included Alouatta pigra samples from a semi-deciduous forest (El Tormento, Mexico) and Alouatta palliata samples from an evergreen rainforest (La Suerte, Costa Rica), which represent markedly distinct environments and diets. Additionally, because the Alouatta diet varies in leaf intake seasonally, we also included samples collected in the same forest during both periods of high fruit intake and high leaf intake when possible.

We also examined the effect of host geography on a larger scale by comparing the gut microbiota of African and Asian colobines. While all of these colobines have similar gut morphology and dietary niches, they inhabit distinct continents with different environments and local diets. To perform this comparison, we integrated published data from twelve wild Asian colobines (red-shanked doucs, Pygathrix nemaeus) with our original data [38]. In both cases, open-reference OTUs were re-picked for the entire dataset using the same methods described above. The resulting data were filtered, rarefied, and analyzed the same way as well.

Cophylogenetic analyses

Given that codiversification of hosts and gut microbes has been emphasized as an important process contributing to the composition of the primate gut microbiome [13, 14], we wanted to explicitly explore the relationship between the host phylogeny and diversity of microbial 16S sequences in this dataset. Therefore, we performed two analyses: one, a reimplementation of the beta-diversity clustering sensitivity analysis in Sanders et al. [39] to assess whether patterns of microbial community similarity that are correlated with host phylogeny are likely to indicate codiversification; and two, an application of the permutation test of cophylogeny from Hommola et al. [40] to sequence diversity within OTUs to test for codiversification in individual bacterial lineages. Both analyses were implemented in a Snakemake [41] workflow available at https://github.com/tanaes/snakemake_codiversification. Briefly, deblurred 16S sequences were clustered using the USEARCH [24] pipeline in QIIME 1.8.1 at similarity thresholds of 85, 88, 91, 94, 97, and 99% identity. For beta-diversity clustering sensitivity analysis, beta-diversity distances among four randomly selected samples per species were calculated from 100 OTU tables jackknifed to 7200 sequences. Each jackknifed distance matrix was UPGMA-clustered, and the resulting similarity dendrogram compared against the actual host phylogeny. A summary figure was then created to illustrate the number of times each actual host clade was recovered from each parameter combination.

For per-lineage codiversification analysis, the deblurred 16S rRNA sequences composing each 97% OTU were realigned using MUSCLE, a phylogeny estimated with FastTree [42], and the pairwise distances among unique bacterial sequences compared to the pairwise patristic distances of their host taxa using an adaptation of the Hommola et al. [40] permutation test of cospeciation, with 10,000 permutations. This test is an extension of the Mantel test of distance matrix correlation, modified to allow multiple symbionts per host (and vice versa). p-values were corrected using the Benjamini–Hochberg False Discovery Rate, and OTUs estimated to be significantly codiversifying with their hosts illustrated by mapping host information onto the intra-OTU phylogeny.

Sample processing for shotgun metagenomic sequencing

In addition to describing the taxonomic composition of the sampled primate gut microbiomes, we also wanted to assess the functional capacity of these microbiomes. To do this, a subset of 95 samples was randomly selected for shotgun metagenomic sequencing (Table S1). Sequencing libraries were robotically prepared with the Kapa Hyper Library Preparation kit (Kapa Biosystems) at the Roy J. Carver Biotechnology Center at the University of Illinois at Champaign-Urbana. Library insert sizes ranged from 80 to 700 bp. Libraries were combined into four pools, each of which was sequenced on one lane of the Illumina HiSeq2500 using TruSeq SBS sequencing chemistry version 4. A total of 160-nt paired-end reads were generated using 161 cycles for each end of the fragment. Fastq files were generated and demultiplexed using the bcl2fastq v1.8.4 Conversion Software (Illumina). The run produced a total of 1,472,869,654 reads (average: 7,671,196 ± 2,966,770 reads/sample) with average quality scores of 32 and above.

Gene ortholog group and pathway relative abundance

Shotgun metagenomic data were quality filtered with Trimmomatic v.0.32 with parameters ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36. This yielded 94 metagenomes with sufficient sequencing depth for quantitative analysis, which were subsampled to the size of smallest one (3,589,870 reads) using seqtk v.1.0. Forward and reverse reads were concatenated (seqtk was run with the same seed for both to maintain read pairs). The 94 metagenomes were analyzed for the relative abundance of gene ortholog groups and biochemical pathways using HUMAnN2 v.0.2.2 with the following workflow to ensure comparable results across samples: MetaPhlAn2 was run on each metagenome using MetaPhlAn2 database v.20. Lists of matched strains in 94 results were merged to a single ‘bugs list’. A single HUMAnN2 run was done using this bugs list with the sole purpose of generating a custom Bowtie2 database from the subset of the ChocoPhlAn v.0.1.1 centroid genomes corresponding to the bugs list. HUMAnN2 (http://huttenhower.sph.harvard.edu/humann2) was then run on all 94 samples using this pre-compiled database (first step: Bowtie 2 on all reads) and then with the default UniRef50 database (second step: DIAMOND translated search on leftover reads) with options--bypass-prescreen--bypass-nucleotide-index. Each of the three types of HUMAnN2 output tables (genefamilies, pathabundance, pathcoverage) were then merged across the 94 samples (humann2_join_tables), then normalized to counts per million (cpm) and relative abundance (relab) (humann2_renorm_table). The genefamilies tables were regrouped (humann2_regroup_table) from UniRef50 families to KEGG Orthology (KO), Gene Ontology (GO), MetaCyc reaction (rxn), and Enzyme Classification number (EC). Finally, tables were split into two versions: stratified by taxonomy, and unstratified (sum of all strains).

Analysis of sample functional composition

Once sequence data were processed, we used them to compare richness, diversity, and gene composition among samples. Beta diversity distance matrices were generated using QIIME (beta_diversity_through_plots.py), and clustering patterns among samples were visualized using principal coordinates analysis (PCoA, Emperor v 0.9.51 [30]) and non-metric multi-dimensional scaling (vegan package, R software, version 3.0.2). Pairwise distances between samples were calculated using Bray-Curtis similarity indices. We tested for significant differences in sample clustering patterns and microbial community composition across host clades (Old World monkey, New World monkey, ape, lemur) and diet type (folivore, non-folivore, controlling for host clade) for each species using permutational analysis of variance (PERMANOVA, adonis function in the vegan package, R software, version 3.0.2).

To investigate whether taxonomic and gene abundance patterns were similar in our samples, we compared beta-diversity patterns between 16S rRNA (unweighted UniFrac) and shotgun metagenomics Metacyc reaction pathway (Bray-Curtis) datasets. We used a Procrustes analysis (least-squares orthogonal mapping) to transform the first three principal coordinates for each dataset (QIIME, transform_coordinate_matrices.py) and estimate m2 value (sum of the squared deviations). We then shuffled sample identifiers and recalculated m2 999 times and reported the p-value as the proportion of m2 values lower than the actual m2 value. We also directly compared the distance matrices using a mantel test with 999 permutations.

In addition to assessing overall functional capacity, we also wanted to examine differences in the relative abundances of specific enzymes associated with functions such as cellulose degradation and plant secondary metabolite degradation. To do this, we extracted information about CAZyme relative abundances from the metagenomic dataset. The HUMAnN2 analyses described above were reimplemented, but using the dbCAN [43] version of the CAZyDB [44] as a custom translated alignment database, skipping the Bowtie2 nucleotide alignment step. Substrate specificities of particular CAZyme families used to produce Figure S8 were derived from Table S1 of [45] (Cantarel et al. 2012). Summed counts for CAZymes in each of these categories were compared using a 2-way ANOVA in R, with diet category and host phylogenetic group modeled as additive effects. p-values for an effect of diet were corrected for multiple hypothesis tests using the Bonferroni method.

We also examined whether other microbial metabolic pathways differed in relative abundance among samples. We used a linear mixed effects model to assess the abundance of specific MetaCyc pathways in folivorous versus non-folivorous lineages. Model comparisons were performed between one that simply accounted for the random effects of each species nested within the four phylogenetic groups and a second which incorporated diet as an additional fixed effect. Pathways were identified as associated with diet category when inclusion significantly improved the fit of the second model over the first with a p-value of <0.01.

For similar reasons, we also utilized a linear discriminant analysis of effect size (LEfSe; [33]) to detect pathways that were significantly different in relative abundance among clades of primates. We assigned primate phylogenetic group as the class vector and diet as the subclass vector. Features with a logarithmic LDA score of >3.0 using default parameters were kept.