emi14276-sup-0001-AppendixS1.pdfTIFF image, 4.1 MB

Fig. S1. A species phylogeny based on the 16S ribosomal rRNA sequences for all the nine Negativicutes with aci1 plus other Negativicutes without aci1 that have complete NCBI Refseq genome assemblies. We found that aci1 is widespread across Negativicutes (strains positive for aci1 are marked by red wedges) and not confined to a particular sub‐clade, indicating possible horizontal gene transfer. Incomplete genome assemblies with aci1 are shown (black wedges) but incomplete assemblies without aci1 are not included since they may lack aci1 due to their fragmentated assembly. Branch lengths show the nucleotide substitution rates and bootstrap support values for the major nodes are indicated. Fig. S2. Annotated genes for Prophage 2 upstream of Prophage 1 show in Fig. 2. We found that the prophage includes tail, head‐tail joining, head, packaging, replication and lysogeny modules, but that the prophage lacks a head/tail gene and there is a gap between the packaging and replication units, indicating the prophage is likely not able to induce. Fig. S3. The evolutionary relationships among the aci1 sequences found in human (black) and animal (light green) microbiomes. The phylogeny was built at the DNA level to maximize the divergence, but the branch lengths remain very short because the sequences are nearly identical and therefore not all the nodes have high bootstrap values. The human numeric ids correspond to those in Supporting Information Table S1. Fig. S4. Alignment across the nine Negativicutes of the ACI‐1 protein region from start to stop codons (red), the flanking regulatory aci1 regions (blank lines mark their boundaries), and the surrounding transposons (tn). The coding sequence is highly conserved with mutated protein sites relative to the A. intestini reference marked as non‐synonymous (amino acid changing) and synonymous for point mutations (yellow), and a frame‐shift mutation for insertions/deletions (brown). The flanking aci1 regions, and particularly the transposons, are more divergent across the nine sequences. The strain names are given for each sequence and rev indicates that the reverse complement is shown. Table S1. Meta data on the human gut metagenome scaffolds with the aci1 gene. For each of the scaffolds in Fig. 4, the corresponding assembly, scaffold, population, gene location, method of detection and bacterial host are shown. Nuc indicates a BLASTn hit with 99–100% nucleotide identity and coverage to the aci1 nucleotide sequence while prot indicates a tBLASTn hit with 99–100% amino acid identity and coverage to the ACI‐1 protein sequence. Table S2. The prevalence of the aci1 gene in gut microbiomes stratified by human population. The p‐values and fold changes were calculated as explained in the Experimental Procedures. Table S3. Animal metagenome datasets containing copies of the aci1 gene. The sheep and rumen datasets were already assembled into contigs and for the unassembled dataset from cattle, reads were trimmed with trimmomatic (Bolger et al., 2014) with the following parameters: SE ‐phred33 [inputfile] [outputfile] ILLUMINACLIP:[illumina adapters]:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100, and then assembled into contigs with Spades version 3.9 (Bankevich et al., 2012) with default parameters. For all datasets, copies of the ACI‐1 protein were identified with BLAST as described in the Experimental Procedures plus three additional partial copies of ACI‐1 from cattle. Table S4. The Negativicute NCBI Refseq genomes with the aci1 gene. The Ribosomal Database Project (RDP) ids were the sequences used for building the strain phylogeny in Fig. 6. Table S5. Alignment statistics for sequences from Negativicutes with a prophage next to ACI‐1. The high nucleotide sequence identities across the alignments indicate that it is likely the same prophage present in all three strains, although the presence of gapped columns suggests there have been insertions and deletions occurring within the prophages. The sequences were selected that corresponded to the regions from the first to the last annotated proteins shown as coloured arrows in Fig. 6 for each of the three strains.