Recent discoveries on the origins of modern humans from multiple archaic hominin populations and the diversity of human papillomaviruses (HPVs) suggest a complex scenario of virus-host evolution. To evaluate the origin of HPV pathogenesis, we estimated the phylogeny, timing, and dispersal of HPV16 variants using a Bayesian Markov Chain Monte Carlo framework. To increase precision, we identified and characterized non-human primate papillomaviruses from New and Old World monkeys to set molecular clock models. We demonstrate specific host niche adaptation of primate papillomaviruses with subsequent coevolution with their primate hosts for at least 40 million years. Analyses of 212 HPV16 complete genomes and 3582 partial sequences estimated ancient divergence of HPV16 variants (between A and BCD lineages) from their most recent common ancestors around half a million years ago, roughly coinciding with the timing of the split between archaic Neanderthals and modern Homo sapiens, and nearly three times longer than divergence times of modern Homo sapiens. HPV16 A lineage variants were significantly underrepresented in present African populations, whereas the A sublineages were highly prevalent in European (A1-3) and Asian (A4) populations, indicative of viral sexual transmission from Neanderthals to modern non-African humans through multiple interbreeding events in the past 80 thousand years. Remarkably, the human leukocyte antigen B*07:02 and C*07:02 alleles associated with increased risk in cervix cancer represent introgressed regions from Neanderthals in present-day Eurasians. The archaic hominin-host-switch model was also supported by other HPV variants. Niche adaptation and virus-host codivergence appear to influence the pathogenesis of papillomaviruses.

Epidemiologic studies have demonstrated that persistent infection of select oncogenic human papillomaviruses (HPVs) is the main cause of cervix precancer and cancer. Nevertheless, our knowledge of the underlying evolutionary mechanisms driving the divergence and emergence of viral oncogenicity in specific types of HPVs is incomplete. To better understand the molecular evolution of oncogenic HPVs, we isolated viruses from non-human primates, evaluated papillomavirus molecular clock models, and estimated the divergence times of HPV16 and other HPV type variants from their most recent common ancestors. Primate PV-host tissue tropisms indicated niche adaptation of viruses to host ecosystems as the first stage of the evolution of oncogenic HPVs. The data also provided evidence of ancient codivergence of HPV variants with archaic hominins and recent viral transmission from Neanderthals to modern non-African humans through sexual intercourse. Understanding the evolution of papillomaviruses should provide important biological insights and suggest mechanisms underlying HPV-induced cervical cancer, since niche adaptation rather than oncogenicity drives viral fitness.

Funding: This work was supported in part by the National Cancer Institute (CA78527) (RDB), the Einstein-Rockefeller-CUNY Center for AIDS funded by the NIH (P30 AI-124414) (RDB), and the Einstein Cancer Research Center (P30CA013330) from the National Cancer Institute (RDB). RD acknowledges the Korein Foundation and the Sackler Institute for Comparative Genomics at the AMNH for their continued support. ZC was partially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 24100716). Work at IARC was supported by a grant from the Institut National du Cancer (INCa), France (SHSESP 16-006) (GMC). The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Strict coevolution of a host and its pathogen is more likely if the pathogen is transmitted vertically and there is little or no cross-species acquisition. Persistent infection by pathogens generally indicates that they are well adapted to their host and that extinction will be rare so long as the host survives. Hence, in scenarios of coevolution, the evolutionary history of a pathogen should mirror that of its host, both in divergence times and phylogenic history (Fahrenholz’s rule) [ 15 , 16 ]. These criteria have been shown to hold for feline PVs within the genus Lambdapapillomavirus isolated from oral lesions [ 17 ]. On the other hand, horizontal transmission of pathogens through host switching without restricted species specificity will produce a very different evolutionary history between host and pathogen. In hosts harboring many different types of PVs (e.g., bovines, humans, and macaques), the selection pressure exerted by PVs on their hosts appears negligible in comparison with what the hosts exert on the PV pathogens. Within human populations, for example, the ancient dispersal of HPV variants (e.g., HPV16 and HPV58) challenges a simple evolutionary pattern of viruses migrating with modern Homo sapiens [ 18 ], and instead indicates codivergence of viruses with archaic hominins and transmission to modern humans [ 19 , 20 ]. The genetic heterogeneity of PVs implies a complex evolutionary history with many interacting factors, including but not limited to virus-host codivergence, tissue tropism, lineage sorting, transmission, recombination, and natural selection [ 21 , 22 ]. Understanding the capacity for, and history of, viral adaptation to host ecological environments is essential for understanding the genetic basis of HPV carcinogenicity [ 23 ]. However, the origin and evolution of oncogenic PVs remains poorly understood.

Papillomaviruses (PVs) are ubiquitous, non-enveloped, small double-stranded circular DNA viruses that cause proliferation of epithelial cells in a wide range of vertebrate host species, from reptiles to mammals [ 1 , 2 ]. Currently, over 200 PVs infecting primate hosts (human and non-human) have been characterized and shown to group predominantly within 3 highly divergent genera—Alphapapillomavirus, Betapapillomavirus, and Gammapapillomavirus [ 3 ]. All oncogenic PVs associated with the development of cervical carcinoma, including human PV (HPV) types 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, and 59 and Macaca fascicularis PV type 3 (MfPV3), share a common ancestor within the Alphapapillomavirus [ 4 – 7 ]. Among these oncogenic types, which are sexually transmitted primarily through intercourse [ 8 , 9 ], HPV16 is globally the most prevalent HPV type detected, suggesting an increased fitness [ 10 – 12 ]. Moreover, HPV16 is also the most common HPV type detected in cervical cancer, which is the fourth most common cancer among women worldwide [ 13 ]. Nevertheless, most exposures to HPV types are transient, and many PVs appear to be more commensal than pathogenic [ 14 ].

The model was based on HPV16 variant divergence time estimation, phylogenetic topology, and geographic distribution that superimposes an ancestral viral transmission between Neanderthals/Denisovans and modern human populations. The early divergence event among deeply separated HPV16 variant lineages (A vs. BCD) suggests ancient virus-host codivergence following the speciation of modern humans and archaic hominins (e.g., Neanderthals and Denisovans) from their most recent common ancestors. The gene flow through host interbreeding between archaic hominins allowed viral transmission from Neanderthals/Denisovans to modern humans. t h-n/d denotes the splitting time between Neanderthals/Denisovans and modern humans, t h represents the speciation of modern humans. t af indicates the era of population expansion of modern humans walking out-of-Africa. t g indicates the time of gene flow (f) that may have occurred between modern humans and Neanderthals/Denisovans. t n estimates the extinction of Neanderthals/Denisovans. The arrows indicate the out-of-Africa migration events of archaic and modern human populations. The broken lines indicate potential extinction of viral variants. Branch lengths and widths are not drawn to scale.

We observed a similar divergence timeframe for other HPV variants, splitting from their MRCAs approximately 300–600 kya and showing a strong correlation between evolution times and genomic diversities ( Fig 7 , Table 5 ). In all cases, the deep separation between HPV16 variant lineages A and BCD (and the deepest lineage separations of other HPV variants) suggests an ancient virus-host codivergence, coinciding with the split between archaic Neanderthal/Denisova and modern human ancestors from their MRCA ( Fig 8 ). Neanderthals spread out over Eurasia with at least two populations splitting approximately 77–114 kya from each other based on analysis of archaic genomes from Vindija, Mezmaiskaya (Caucasus), and Denisova (Siberia) [ 48 ]. This time period corresponds to the diversion of HPV16 A sublineages and in particular the split of A4 from A1/2/3 and the emergence of HPV16 A4 in Asia, likely representing independent transmission of A4 from archaic hominins to modern humans in the east.

A Bayesian MCMC method was used to calculate the divergence times of HPV16 complete genome variants from their most recent common ancestors, as described in the methods. The nodes highlighted with red circles indicate divergence times of the split between HPV16 A and non-A lineages, between A1-3 and A4 sublineages, and between C and D lineages. Branch lengths are proportional to divergence times scaled in thousands of years (K). Grey bars indicate the 95% highest posterior density (HPD) for the corresponding divergence age (see details in S10 Fig ). Colors in branches represent distinct HPV16 variant lineages/sublineages. The plot on top of the tree is a Bayesian skyline estimation based on 311 present-day human mtDNA sequences (without the loop region) from geographically diverse populations and 212 HPV16 complete genome variants with a similar geographical distribution. The median posterior estimates (the product of the effective population size N e and the generation length g in years) throughout the given time period are illustrated with lines in black. The dark blue (humans) and dark red (HPV16) areas give the 95% HPD interval of these estimates.

The molecular clock models used to estimate the divergence times of primate PVs support a scenario of virus-host codivergence after the virus has adapted to a specific host ecosystem. Using a similar Bayesian Markov chain Monte Carlo (MCMC) framework, we initially applied six combinations of clock models to estimate the divergence of HPV16 variants from their MRCA, without any prior assumption of virus-host codivergence ( Table 4 , no calibration). Interestingly, a combination of the relaxed lognormal molecular clock and coalescent Bayesian skyline models indicated that HPV16 A and BCD had divided around 618.5 thousand years ago (kya) (95% HPD: 331.5–996.1). This estimation is within the time span of the separation between Homo sapiens and archaic hominins (e.g., Neanderthal/Denisova) but around two-five times longer than the estimated modern Homo sapiens divergence time (ca. 150–200 kya) [ 45 ] indicative of an ancient divergence of HPV16 variants prior to the emergence of modern human ancestors. Based on the geographic distribution of HPV16 variants above, we then used an archaic hominin-host-switch (HHS) scenario to calibrate the divergence time between HPV16 A and non-A variants (500 kya, 95% HPD: 400–600), and a modern-out-of-Africa (MOA) scenario between BC and D variants (90 kya, 95% HPD: 60–120). When time calibrations were introduced into the phylogenetic tree, the HHS scenario showed the strongest support for time inference and estimated an initial divergence of HPV16 variants at approximately 489 kya (95% HPD: 394–581), predating the out-of-Africa migration of modern humans (ca. 60–120 kya) ( Fig 6 and S10 Fig ) [ 46 , 47 ]. In addition, the demographic model of the Bayesian skyline plot for the population function through time showed a recent exponential expansion of the effective population size of present-day HPV16 occurring in the last 25 kya, lagging behind the growth of modern human populations (starting from the last 40–50 kya) (see the top panel of Fig 6 ). This plot most likely reflects the concurring increase and mobility of modern human populations and present-day virus populations in the last epoch.

Based on single-nucleotide polymorphism (SNP) patterns and phylogenetic tree topologies, we assigned 3256 HPV16 partial sequences from 22 countries/studies into variant lineages and sublineages using maximum likelihood methods ( Table 3 ). As shown in the summarized charts of HPV16 phylogeography ( Fig 5A ), isolates from Asians and Caucasians (Australians/Europeans, and North Americans) were predominantly represented by A variants, with abundances of 92% and 83%, respectively. The majority of A4 variants (352/357, 99%) were from Asian individuals. Within the African population, 90% of HPV16 infections were B and C lineages. HPV16 variants in South/Central Americans were equally assigned as A1-3 (50%) and D (48%). Using a weighted UniFrac algorithm, variants were well clustered into groups (African, Eurasian, and South/Central American) corresponding to the geographic origin of the isolates ( Fig 5B ). Globally, A1-3 sublineages were the most widespread; whereas, the D lineages were detectable at low prevalences in many populations outside of South/Central Americans, such as in Caucasian (11%), African (7%), and Asian (6%) individuals ( Fig 5C ). In contrast, A4 and B/C lineages were rarely found outside of Asian and African populations, respectively.

Next, we constructed a phylogenetic tree of HPV16 variants based on 212 complete genomes to classify variant lineages and sublineages ( S3 Table ). The tree topology shows two deeply separated clades corresponding to the previously classified Eurasian and African lineages ( S8 Fig ), with a mean nucleotide sequence difference of 1.72% ± 0.09% ( S4 Table ). The African lineage variants were more than twice as diverse (intragroup mean difference of 0.77% ± 0.04%) as the Eurasian variants (0.32% ± 0.02%). Since geographic nomenclature systems suffer from sampling biases and preconceived notions about virus ancestry, we utilized an agnostic alphanumeric nomenclature based on HPV16 phylogeny and complete genome nucleotide differences to assign HPV16 variants into four lineages designated A, B, C, and D. Each lineage could be divided into four sublineages (A1-4, B1-4, C1-4, and D1-4), based on previously described criteria ( S9 Fig ) [ 43 ]. The previously named Asian (As) and North American 1 (NA1) variants are designated sublineages A4 and D1, respectively [ 44 ]. The maximum pairwise difference between the most diverse isolates, from sublineages A1 and D3, was 2.23%.

(A) A schematic topology of representative primate papillomaviruses. The branch colors represent viruses with specific host niche adaptation (brown–isolated from mucosal tissues, blue–isolated from cutaneous tissue). (B) Model of phylogeny and divergence of primate papillomaviruses. In this model, one or more primate papillomavirus ancestors evolved to colonize distinct host ecosystems prior to the speciation of a primate ancestor. A process of further viral adaptation to colonize more specific host ecosystems (represented by black circles at the nodes) may have followed upon host speciation, resulting in the radiation observed in the extant primate papillomavirus tree. The broken lines in grey (starting from open circles) represent clades for which specific HPV species lack detectable non-human primate counterparts. The 3-dimentional structure represent host phylogeny.

The divergence times and tree topologies support a model of intrahost divergence of primate PVs in which ancient viruses diverged and adapted to specific host ecosystems (e.g., tissue tropism or different types of epithelial cells) within an ancestral host animal lineage (e.g., the MRCA of primate animals) ( Fig 4 ). Following periods of host speciation, continuing intrahost viral divergence events occurred as distinct but phylogenetically related viral types were transmitted to similar host ecosystems by the closely related host animals. This pattern of ancient viral divergence coupled to niche adaptation may explain, for example, the differences in the prevalence of HPV16 and HPV18 between squamous cell carcinomas and adenocarcinomas of the cervix [ 41 ]. This difference might represent the emergence of further viral adaptation to different ecological niches within the cervix, one dominated by stratified squamous epithelium the other by columnar epithelium, respectively [ 42 ]. The fact that we do not observe similar or parallel diversity of NHP-PVs compared to HPVs (broken lines in right panel of Fig 4B ) could be due, in part, to reduced sampling effort, limited population size of NHPs, bottlenecks of viral transmission, and/or restricted host migration.

Similar virus-host codivergence events were observed between Old World monkey PVs and their closest HPV relatives, and were estimated to approximately 14–31 mya ( Fig 3 , S5 Fig , S6 Fig and S7 Fig ). For example, the species Alpha-12 (PVs mainly isolated from genital lesions of macaques) split from a MRCA with the species Alpha-9 (represented by oncogenic genital HPV16) around 27 mya coincided with the time span of the speciation between macaques and apes/humans that occurred approximately 25 mya [ 38 , 39 ]. An enigmatic observation in these data is the clustering of macaque PVs (e.g., MfPV3) and baboon PV (Papio hamadryas PV 1, PhPV1) within the species Alpha-12 group, suggesting either a recent viral transmission between macaque and baboon monkeys, or a more complex phylogeny of the sub-family Cercopithecinae. The majority of distinct human PV types arose during the end of the Miocene and/or the beginning of the Pliocene epoch coincident with the divergence of humans and chimpanzees occurring around 6–8 mya ( Fig 3 ) [ 40 ].

A Bayesian MCMC method was used to estimate divergence times as described in the methods. Times were calculated separately for each genus, Alpha- (A) , Beta- (B) and Gamma-PVs (C) . Branch lengths are proportional to divergence times. The branches in red refer to non-human primate papillomaviruses. Numbers above the nodes with circles are the mean estimated divergence time in million years (M) between human and non-human papillomavirus clades. The bars in grey represent the 95% highest posterior density (HPD) interval for the divergence times (see details in S5 Fig , S6 Fig and S7 Fig , respectively). Panels B and C show time on the Y-axis and phylogeny on the X axis.

To estimate the divergence times of primate PVs from their MRCAs, we used a Bayesian statistical framework employing previously established PV evolution rates [ 17 ]. Infectious PVs have been shown to have a slow mutation rate based on the observations that these double-stranded DNA viruses use the host cell DNA replication machinery, characterized by high fidelity, proofreading capacity, and post-replication repair mechanisms [ 37 ]. Since primate PVs, taken together, do not follow strict viral-host codivergence, each genus was evaluated separately to estimate divergence times. A combination of relaxed lognormal molecular clock and coalescent constant population models provided the best performance using the phylogenetic tree as shown in Fig 3A . The Alphapapillomavirus–Dyoomikronpapillomavirus split from a MRCA around 39.9 million years ago (mya) (95% highest posterior density (HPD), 36.4–43.7 mya) ( Fig 3A , S2 Fig and Table 2 ) is consistent with the time frame of the split between New World and Old World primate ancestors [ 26 ].

Strict virus-host codivergence requires the evolutionary history of the pathogen to mirror that of its hosts. Clustering of viruses according to the host from which they were isolated should be observed. In addition, the divergence times of hosts and parasites should be similar (different colors highlight viruses infecting different primate host ancestors). Intrahost divergence can be defined according to specific phylogenetic criteria, such as niche-adaptation prior to coevolution in primate papillomaviruses, as opposed to clustering by hosts.

Fahrenholz’s proposal for strict codivergence of host and parasites states that the “parasite phylogeny mirrors that of its host,” indicating that specific pathogens isolated from an individual host species should be monophyletic to the exclusion of viruses from other host species (reviewed in de Vienne et al.) [ 35 ] ( Fig 2 ). In the case of primate PVs, however, viruses infecting a given host species do not always cluster together, implying an ancient viral divergence model in which viral ancestors may have first split into separated viral clades corresponding to niche adaptation to specific host ecosystems (i.e., tissue tropism). Following host ancestor speciation, distinct but homophyletic viruses were transmitted to similar ecosystems (e.g., mucosal or cutaneous sites) between closely related host animals, resulting in the radiation observed in the extant primate PV tree where viruses sort by tissue tropism and not host species. This prediction was evaluated with a permutational multivariate analysis of variance (PERMANOVA) test [ 36 ] using primate PV nucleotide sequence pairwise distances, which revealed that tissue tropism (here defined by different genera) contributed to more of the variability of viral divergence (accounting for 26% of the total variance, p<0.001) than that of the host (6%, p<0.001) ( Table 1 ).

Papillomaviruses have been identified in a wide range of NHP species, including Old World monkeys and apes (e.g., macaque, chimpanzee) and New World monkeys (e.g., squirrel monkey, brown howler) [ 24 , 27 – 33 ]. Using a maximal likelihood algorithm and a nucleotide sequence alignment of the concatenated E1-E2-L2-L1 ORFs for 141 PV types representing each species or unique host ( S2 Table ), we found that the majority of primate PVs phylogenetically clustered into Alphapapillomavirus, Betapapillomavirus, or Gammapapillomavirus genera, corresponding predominantly to the anatomical sites where the viruses were originally isolated (e.g., mucosal or cutaneous epithelium), which was independent of the host species ( Fig 1 , S1 Fig and S2 Table ). For example, MmPV1 is a rhesus macaque PV type (within the species Alpha-12) isolated from cervicovaginal cells that shares a most recent common ancestor (MRCA) with oncogenic mucosal HPV16 (within the species Alpha-9) but is distantly related to MmPV4 (within the genus Gammapapillomavirus), which was also isolated from a rhesus macaque. Since topological incongruence has been noted in the phylogenies of HPVs when trees are constructed with either late or early regions of the viral genomes [ 22 , 34 ], we also examined the topologies of such trees. Although there was some incongruence, the majority of the primate PVs maintained their topological positions (see S2 Fig , S3 Fig and S4 Fig ).

We focused on HPV16 because it is the most prevalent and potent carcinogen among the oncogenic HPVs [ 5 ]. To interrogate HPV16 evolution using a molecular clock, we utilized HPVs and NHP-PVs characterized in our labs and by others where the host species separation times have been well established [ 25 , 26 ]. This step is essential in order to validate a vertical mutation rate model suitable for HPV variants. This model estimates the mutation rate for infectious PVs over long periods of time and might differ from horizontal mutation rates not measured in this study.

A maximum likelihood phylogenetic tree was inferred from the concatenated nucleotide sequence alignment of 4 open reading frames (E1-E2-L1-L2) of 141 papillomavirus types representing 132 species (see PV list with hosts in S2 Table ). The majority of analyzed primate papillomaviruses cluster into three distinct clades, Alpha-, Beta- and Gamma-PV genera, corresponding predominately to the anatomical sites (e.g., mucosal vs. cutaneous epithelium) where the viruses were originally isolated, rather than to the distinct host species. The branches represented by non-human primate papillomaviruses are highlighted in red. Non-primate papillomaviruses are collapsed and joined by grey lines (see comprehensive tree in S1 Fig ). The dot sizes are proportional to the bootstrap percentage supports from RAxML.

In an effort to study the diversity of non-human primate PVs (NHP-PVs) to better understand the evolution of oncogenic HPVs, we screened cervicovaginal specimens from 10 adult female squirrel monkeys (Saimiri sciureus), and the paired oral, perianal, and genital samples from 8 adult rhesus monkeys (Macaca mulatta) (4 females and 4 males). Three novel Saimiri sciureus PV types (SscPV1, 2 and 3) and three novel Macaca mulatta PV types (MmPV2, 3 and 4) were isolated and characterized and had genomes ranging in size from 7424 bp to 8051 bp ( S1 Table ). All genomes contained five early genes (E6, E7, E1, E2, and E4), two late genes (L2 and L1), and an upstream regulatory region (URR) between L1 and E6 genes. Phylogenetic trees based on the nucleotide sequence alignment of the concatenated four open reading frames (ORFs) (E1, E2, L2, and L1) ( Fig 1 and S1 Fig ) or individual genes, e. g., E1 or L1 ORFs ( S2 Fig , S3 Fig and S4 Fig ) support a monophyletic clade grouping SscPV1/2/3 and howler monkey Alouatta guariba PV 1 (AgPV1, KP861980) [ 24 ] within the genus Dyoomikronpapillomavirus. MmPV2 and MmPV3 cluster into the genus Alphapapillomavirus, with the closest HPVs being HPV54 (within the species Alpha-13) and HPV117 (within the species Alpha-2), respectively. MmPV4 shares <70% of L1 ORF similarity with members of the species Gamma-10 (e.g., HPV121 and HPV130) and may represent a novel species within the genus Gammapapillomavirus.

Discussion

In this work, we used a Bayesian MCMC framework to estimate the divergence times of primate PVs and propose an early ancient intrahost viral divergence model (i.e., niche adaptation) followed by viral-host coevolution. This form of viral evolution has been documented for polyomaviruses [49], herpesviruses [50], and some retrovirus genera [51]. With the assumption of host niche adaptation as a fundamental process, the estimation of primate PV divergence times within niche-specific clades mirrors that of the primate host evolutionary history (Fig 4). It is clear that the evolutionary history of these well adapted, slowly evolving PVs may be significantly more complex than previously appreciated [37]. The implication of host niche adaptation of primate PVs preceding virus-host codivergence suggests a critical role for viral genetic heterogeneity and natural selection. The origin of viral genetic determinants of cervical niche adaptation further supports the hypothesis that a group of well-evolved viral genotypes also contain the determinants for cervical cancer, since this phenotype cannot exert selective pressure, as it does not support the production of infectious virus. It may also explain why a large set of cervicovaginal macaque PVs (within the species Alpha-12) associated with cervical neoplasia shares a common origin with the high-risk clade of human PVs (e.g., Alpha-9) (Fig 3A) [6, 27]. Our findings provide a framework for studying the past evolution of primate PVs infecting the genital tract niche and support a molecular clock based on phylogeny, since the generation time of PVs can only be extrapolated from empiric data based on coevolution models [17, 52].

We used this well-supported molecular clock model to estimate the divergence times of HPV16 variants. HPV16 is the most common oncogenic HPV type and shows diversity in persistence and carcinogenicity [53–55], suggesting further biological differences between variant lineages. We observed specific geographic/ethnic dispersals of HPV16 variants, such as A4 predominance in Asian populations and BC predominance in African populations. The estimated divergence times between HPV16 A and BCD variants largely predated that of the out-of-Africa migration of modern human populations, consistent with a previously reported archaic hominin-host-switch scenario [19, 20]. One interpretation of the data implies that the present-day Eurasian HPV16 A variants were probably the products of multiple interactions between Neanderthals/Denisovans and modern Homo sapiens established during sexual contact after a long period of separation (e.g., 400–600 kya). This notion of viral sexual transmission between groups is reflected in the recent genetic admixture (e.g., 80 kya) between groups [48, 56–59], with evidence of 2–4% of nuclear DNA in Eurasians that can be traced to Neanderthals [48, 58]. This assumption is likely ubiquitous in a number of Alpha-HPV variants (Fig 7, Table 5), although their pathogenesis, evolution, and epidemiology warrant further study.

Recent evidence indicates that Neanderthals spread out over the Eurasian continent and also admixed with ancestors of the present-day East Asian population [60, 61]. Since HPV16 A4 lineage is exclusively found in East Asians (approximately 40% of HPV16) and presents a higher risk of cervix cancers in Asian populations [62, 63], we speculate that a subset of Neanderthals heading east into Asia over more than 100 thousand years of existence in Eurasia could have interbred with East Asian modern humans and transmitted the HPV16 A4 sublineage and introgressed specific gene alleles that provided a selective advantage to the HPV variants coevolving with them [59, 64]. Overall, HPV16 BCD variants have higher genomic diversity than A isolates (see S4 Table), which may imply a potential population bottleneck of horizontal transmission reducing the diversity of current day A lineage isolates. In contrast, BCD variants have accumulated more genetic mutations, consistent with the observations that African populations and their pathogens have deeper origins reflected in greater diversity [65]. This idea supports one theory that both HPV16 BCD and modern humans arose in Africa (Fig 8). Following a relatively recent out-of-Africa migration, the modern humans acquired the A variant from sex with archaic hominins and possibly carried D variants into Eurasia under conditions of a small population size. The ancestors of East Asian people crossed the Bering Strait and were early populators of the Americas (based on historical records and genetic relatedness) [66]. Surprisingly, the D lineage is phylogenetically rooted in the African clade, but we did not find a major reservoir of the D lineage in the present-day African populations. This interesting observation suggests either an advantage of niche colonization and expansion of HPV16 D variants in Native Americans or a bottleneck of HPV16 variants present in people populating the Americans. Alternatively, the lack of A4 and the high proportion of D lineages in the Americans could be the result of an early colonization of the Americas by an unknown group from Africa. More data is needed to sort out the evolutionary history of the HPV16 D lineage and might provide clues to new features of the populating of the Americas.

Sexual interactions between archaic hominins and modern human ancestors likely occurred over multiple time- and space-scales. For example, viral transmission might have also occurred from modern humans to Neanderthals/Denisovans, based on the evidence of ancient gene flow from early modern humans into Eastern Neanderthals [57]. Since PVs usually establish infections at the basal layer of epithelial cells, it will be impossible to detect viruses from fossil bones of archaic hominins and document the presence of HPVs in archaic hominin populations [20]. The evolutionary histories and origins of modern H. sapiens are undergoing dramatic revisions with the introduction of advanced sequencing techniques and methods to analyze genomic samples from archaic hominin specimens [67–69]. Since the reproductive success per copulation between H. sapiens and archaic hominins is predicted to have lower viability than that of modern human reproductive events, high levels of sexual interaction were likely present facilitating HPV transmission, in addition to genetic introgression observed in modern non-African populations [70]. For example, the human leukocyte antigen (HLA) B*07:02 and C*07:02 alleles associated with increased risk in cervix cancers appear to be introgressed regions in present-day Eurasians and Melanesians from Neanderthals or Denisovans [71–73]. This also suggests that adaptive introgression of modern humans from archaic hominins influences the pathogenic outcome of these infections by as yet unknown mechanisms [70, 74]. However, it can be speculated that introgressed genes providing some selective advantage to hybrid human-archaic hominin offsprings could also make them more susceptible to HPV variants adapted to archaic hominins over hundreds of thousands of years of coevolution. The introgressed genes are most likely related to immunity against infections, whatever the pathogens might be and HPV was along for the ride, since HPV is not known to affect reproductive fitness of the host.

This study has its strengths and limitations. We expand the current understanding of HPV16 evolution beyond the recent description of HPV transmission between archaic and modern humans that used existing data [20] in important ways. We have expanded the understanding of HPV16 in the context of human and non-human primate PV evolution by characterizing additional New World and Old World monkey PVs and using the known divergence times of specific primate species to establish a valid molecular clock. This approach was used to establish the times of Neanderthal divergences [48]. We demonstrate that niche adaption had to proceed viral-host coevolution, and suggest that subsequent niche adaptation might underlie the difference in prevalence of HPV16 and HPV18 in cervical squamous and glandular lesions. We have identified and characterized additional HPV16 variants enabling us to establish the HPV16 variant taxonomy that includes subvariants that have unique biological characteristics [53]. Moreover, we propose that evolution of HPV16 A in Neanderthals over time led to allopatric emergence of the HPV16 A4 lineage as Neanderthals moved east and interbred with modern humans in Asia. We have also expanded the number of HPV16 isolates from around the world to establish the global distribution of HPV16 variants. Lastly, we provide new interpretations and questions on the HPV16 D lineage that is part of the African clade, but is highly prevalent in South/Central America. Nevertheless, there are also limitations to the current study and interpretations. The understanding of human evolution is constantly being challenged with new data and it is possible the models of human evolution used in this study will change [75]. We have not sampled every population and it is possible that additional HPV16 isolate data could change our interpretations. The data obtained on the geographic locations of the HPV partial sequences could be incorrect resulting in underestimating the true associations between variants and historic origins. Lastly, it is possible that very low population sizes of humans migrating out of Africa carried HPV16 A lineage variants leaving no traces in Africa, but expanding throughout Eurasia. This unlikely possibility would influence the interpretations of both our work and that of previous studies analyzing the evolution of HPV16 [20].

In conclusion, the biology and natural life cycle of oncogenic HPVs that results in infectious viral particles (i.e., vegetative virus life cycle) is highly adapted to the differentiation program of epithelial cells [76]. Poorly differentiated precancerous and cancerous cells in the cervix do not support the HPV vegetative life cycle, and thus viral-associated transformation does not contribute to the fitness of HPVs. Viral phenotypes that serve to adapt to a specific ecological niche, evade host immune mechanisms, and support persistent viral production, however, should contribute to viral fitness. Therefore, further investigations of viral-host interactions and the underlying mechanisms of viral oncogenicity, should continue to focus on features of viral evolution and niche adaptation that contribute to fitness, since the oncogenic outcome of HPV infections appear to be “collateral damage” affecting host morbidity and mortality. The current data provides a framework to unravel the mysteries of oncogenic HPV genomes as we expand our understanding of viral-host evolution.