ERV loci can be used to reconstruct the natural history of the ancient, exogenously replicating retroviruses. Previous studies examining retroviral macroevolution via the ERV fossil record have cast an wide net, focusing primarily on the highly conserved RT as a phylogenetic marker and using it to characterize a broad swath of diversity within the Retroviridae family (Jern et al., 2005; Hayward et al., 2013; 2015). However, focusing on RT excludes additional sources of phylogenetic signal available to resolve relationships between closely related taxa, and may overlook the potential role that recombination plays in retroviral evolution. Thus, we sought to examine the deep evolutionary history of a single retrovirus lineage – that which produced the ERV-Fc family of sequences – by collecting and analyzing endogenous retroviral sequence information for all three of the canonical retroviral genes (gag, pol, and env). Doing so allowed us to identify ERV-Fc sequences in 28 of the 50 mammalian genomes examined. Furthermore, we determined that as many as 26 independent cross-species transmission events produced the distribution of identified ERV-Fc elements. This included several species whose genomes appeared to have been independently colonized by two evolutionarily distinct ERV-Fc lineages. These results indicated that the distribution of ERV-Fc among modern mammals is predominately the result of interspecies spread and emergence of the related exogenous forms of the virus.

ERV sequences present in the genomes of different species can be related either due to vertical inheritance (as genomic loci) or due to independent colonization by an exogenous, infectious virus. The two scenarios differ primarily due to differences in the rate at which exogenously replicating virus sequences and endogenous sequences evolve, as well as any differences in the selective pressures affecting parasitic genomic elements versus those affecting replicating viruses. We found that the patterns of amino acid diversification between ERV-Fc sequences were consistent with selection to maintain functions essential for exogenous viral replication. For example, the critical structural subunits of Gag (MA and CA) displayed the least diversity, and within CA, the most conserved residues were in regions involved in essential intrahexameric interactions. In contrast, primary sequence conservation in the non-structural subunits p12 and NC was significantly lower. In spite of this diversity, these regions retained their critical, canonical motifs, instead the number and location of these motifs varied significantly between viral isolates. This is consistent with selection to maintain essential motifs in a system that otherwise lacks structural constraint.

The abundance of ERV-Fc sequence information allowed us to explore the evolutionary relationship between, and infer the history of, the individual ERV-Fc lineages uncovered. Our analyses point to a complex life history for the ERV-Fc retroviral lineage. This history began >30 MYA and exogenous replication continued for many millions of years and involved multiple cross-species transmission events. Recent studies have found evidence for cross-species transmissions in examinations of endogenous gammaretroviruses that are similar to extant MLV, and some of the jumps that these viruses made appear to have involved distantly related host-species (Hayward et al., 2013; 2015). Taken together, gamma-like retroviruses appear to have had a rich history of cross-species transmissions that contrasts to the life histories of other retroviral genera. For example, exogenous foamy viruses are known to co-speciate with their hosts and the endogenous record suggests that long-term associations between foamy viruses and their hosts are likely to be an ancient feature of this retroviral genus (Han and Worobey, 2012; Katzourakis et al., 2009; Switzer et al., 2005).

Furthermore, our analyses revealed that recombination played an important role in the life history of ERV-Fc with instances of acquisition of pol and env sequences from HERV-H or HERV-W-like viruses, as well as evidence that an ERV-Fc-related virus provided env sequence to a betaretrovirus. In this regard, the recombination observed within the carnivore ERV-Fc1 clade of viruses is noteworthy. In this lineage, an ERV-W env gene replaced the ancestral ERV-Fc env; subsequently, this chimeric virus was involved in at least two, and possibly as many as five, cross-species transmission events, giving rise to the endogenous sequences found in the genomes of modern dogs, ferrets, and giant pandas. The Pol and TM regions of the chimeric virus, ERV-Fc1, form monophyletic clades, clearly indicating a shared ancestry for the viruses in the different species. However, within the dog and separately the ferret genome, the Gag sequences ERV-Fc1 and ERV-Fc2 are more closely related to one another than they are to sequences of the same lineage from the heterologous species. The recombinant ERV-Fc1 lineages were also observed to be younger than the majority of ERV-Fc2 loci in both dog and ferret genomes. Thus, the data revealed a scenario whereby after cross-species transmission the ERV-Fc/ERV-W env chimera acquired the ERV-Fc2 gag present in the genome of its new host species, in this case an ancestor of modern ferrets. Such a scenario would be consistent with the virus acquiring the ability to either interact with positive acting host proteins or avoid host restriction factors, or both.

Our analysis suggests that the origins of ERV-Fc date back at least as far as the beginning of the Oligocene epoch (~33.9 MYA). This was a time period of dramatic global change marked by the fusion of the African to the European as well as the Indian to the Asian continental plates (Briggs, 1995), climatic cooling, development of vast expanses of grasslands, and the emergence of large mammals as the world’s predominate fauna (Prothero and Berggren, 1992). Continental mergers in the Old World along with a continued Asian-North American connection allowed for significant mammalian migrations throughout the Oligocene. However, we found evidence for ERV-Fc being present in species with little or no known geographic overlap at this early time in the viral life history, including musteloids, canids, Platyrrhini, and Tarsioidae. This makes it difficult to pinpoint a geographic region for the origin of the ERV-Fc viral lineage, as the ancestors of modern species whose genomes harbor ERV-Fc were geographically isolated from each other at the time. The fossil record provides solid evidence that during the Oligocene epoch canids were restricted to North America (Munthe, 2005), musteloids were present in Asia (Sato et al., 2012), and Platyrrhini were likely restricted to South America (Bond et al., 2015). The previously widespread distribution of Tarsioidae, which were found in Africa, North America, Europe, and Asia, was contracting to its current geographic isolation in southeast Asia (Gingerich, 2012). The geographic separation of these host species, coupled with the clear phylogenetic relationships between their viral sequences, provides strong evidence for a rapid global spread of the exogenous forms of ERV-Fc. Based on current phylogeographic knowledge of these early ERV-Fc hosts, and evidence for limited faunal exchange between these continents, we find it unlikely that musteloids, canids, Platyrrhini, or prosimians were solely responsible for this global viral spread. Importantly, the ERV-Fc genomic record in modern mammalian genomes likely represents only a fraction of the total exogenous viral spread: for example, exogenous infections may simply have failed to leave an endogenous footprint in some species, and some unknown proportion of lineages bearing ERV-Fc insertions will have eventually become extinct (and the corresponding ERV-Fc record lost). Thus, it is likely that the ERV-Fc “fossil” record is incomplete, and that either extinct species or species lacking ERV-Fc sequences helped facilitate the worldwide spread of the exogenous virus.

Finally, our results indicate that after the birth of ERV-Fc, replication, cross-species transmission, and endogenization continued for approximately another 15 million years. Our data, as well as other published reports (Bénit et al., 2003; Barrio et al., 2011), indicate that active ERV-Fc reinfection may have continued until very recently in some lineages, indicating that at least one ERV locus has retained functional gag and pol coding potential in those species. In ferret and canine, we found evidence that ERV-Fc1 acquired gag sequence from an older, pre-existing endogenous lineage (ERV-Fc2). Indeed, LTR dating indicated that in both species the oldest ERV-Fc1 locus pre-dates the end of active reinfection of the genome by ERV-Fc2. Thus, it is plausible that there existed in the genomes of the ancestors of these species at least one functional ERV-Fc2 gag ORF that the newly introduced ERV-Fc1 could have acquired. Observations in laboratory mice as well as in vitro and in vivo experiments provide several well characterized examples of recombination involving ERV sequences giving rise to replication-competent viruses with novel biological properties (Chong et al., 1998; Coffin et al., 1989; Paprotka et al., 2011; Patience et al., 1998; Telesnitsky and Goff, 1993). Thus, ERV loci could contribute to adaptive evolution of exogenous viruses by providing a reservoir of novel sequences that can be tapped into by co-packaging and recombination.