Abstract Hepatitis B virus (HBV) is a ubiquitous viral pathogen associated with large-scale morbidity and mortality in humans. However, there is considerable uncertainty over the time-scale of its origin and evolution. Initial shotgun data from a mid-16th century Italian child mummy, that was previously paleopathologically identified as having been infected with Variola virus (VARV, the agent of smallpox), showed no DNA reads for VARV yet did for hepatitis B virus (HBV). Previously, electron microscopy provided evidence for the presence of VARV in this sample, although similar analyses conducted here did not reveal any VARV particles. We attempted to enrich and sequence for both VARV and HBV DNA. Although we did not recover any reads identified as VARV, we were successful in reconstructing an HBV genome at 163.8X coverage. Strikingly, both the HBV sequence and that of the associated host mitochondrial DNA displayed a nearly identical cytosine deamination pattern near the termini of DNA fragments, characteristic of an ancient origin. In contrast, phylogenetic analyses revealed a close relationship between the putative ancient virus and contemporary HBV strains (of genotype D), at first suggesting contamination. In addressing this paradox we demonstrate that HBV evolution is characterized by a marked lack of temporal structure. This confounds attempts to use molecular clock-based methods to date the origin of this virus over the time-frame sampled so far, and means that phylogenetic measures alone cannot yet be used to determine HBV sequence authenticity. If genuine, this phylogenetic pattern indicates that the genotypes of HBV diversified long before the 16th century, and enables comparison of potential pathogenic similarities between modern and ancient HBV. These results have important implications for our understanding of the emergence and evolution of this common viral pathogen.

Author summary Hepatitis B virus (HBV) exerts formidable morbidity and mortality in humans. We used ancient DNA techniques to recover the complete genome sequence of an HBV from the mummified remains of a child discovered in the 16th century from Naples, Italy. Strikingly, our analysis of this specimen resulted in two contrasting findings: while the damage patterns lend credence to this HBV sequence being authentically 16th century, phylogenetic analysis revealed a close relationship to recently sampled viruses as expected if the sequence were a modern contaminant. We reconcile these two observations by showing that HBV evolution over the last ~450 years is characterized by a marked lack of temporal structure that hinders attempts to resolve the evolutionary time-scale of this important human pathogen.

Citation: Patterson Ross Z, Klunk J, Fornaciari G, Giuffra V, Duchêne S, Duggan AT, et al. (2018) The paradox of HBV evolution as revealed from a 16th century mummy. PLoS Pathog 14(1): e1006750. https://doi.org/10.1371/journal.ppat.1006750 Editor: Siobain Duffy, Rutgers University, UNITED STATES Received: July 26, 2017; Accepted: November 13, 2017; Published: January 4, 2018 Copyright: © 2018 Patterson Ross et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability: The complete genome sequence of NASD24 has been submitted to GenBank and assigned accession number MG585269. Funding: ECH and HNP are funded by grant GNT1065106 from the National Health and Medical Research Council, Australia (https://www.nhmrc.gov.au/). ECH is funded by grant GNT1037231 from the National Health and Medical Research Council, Australia (https://www.nhmrc.gov.au/). ZPR is supported through an Australian Postgraduate Award (https://education.gov.au/australian-postgraduate-awards). HNP is supported by McMaster University (http://www.mcmaster.ca/), and through a National Sciences and Engineering Research Council of Canada Canada Research Chair (http://www.nserc-crsng.gc.ca/index_eng.asp). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction The comparative analysis of viral genomes provides a wide and informative view of evolutionary patterns and processes. In particular, viruses often evolve with sufficient rapidity to inform on evolutionary processes over the timespan of direct human observation (weeks and days) [1–3]. Importantly, recent technical developments in next generation sequencing [4, 5] and ancient DNA (aDNA) recovery [6] have enabled the rigorous study of nucleotide sequences from increasingly older historical, archaeological and paleontological samples. Consequently, aDNA sequences can now facilitate the study of more slowly evolving pathogen populations, from which recent samples display limited sequence diversity, by permitting an expansion of the indirectly observable timespan to that of centuries [7, 8]. Hence, viral and bacterial genomes recovered from such ancient samples have the potential to reveal the etiological agents associated with past pandemics, as well as important aspects of the long-term patterns and processes of evolutionary change within pathogen populations. To date, investigations of ‘ancient’ viruses have been limited in number and scope. Those focusing on pathogens sampled prior to 1900 have considered four human viruses: variola virus (VARV, the agent of smallpox) [8, 9], human papillomavirus [10], human T-cell lymphotropic virus [11], and hepatitis B virus (HBV) [12, 13]. Similarly, aDNA techniques have been used in studies of major 20th century epidemics of influenza virus and human immunodeficiency virus [14, 15]. Taken together, these studies have helped to clarify the causative agents of specific outbreaks, whether ancient strains differ markedly from recent ones, and the evolutionary and epidemiological processes that have likely shaped virus diversity. Moreover, they have provided key information on the dynamics of evolutionary change, including the ‘time-dependent’ nature of viral evolution in which estimates of evolutionary rates are routinely elevated toward the present and decline toward the past due to a combination of unpurged transient deleterious mutations in the short-term and site saturation in the long-term [16]. A major challenge for any study of aDNA is the accumulation of post-mortem damage in the genome of interest. This damage includes fragmentation, nucleotide deamination, and polymerase-blocking lesions, such as molecular cross-linking, resulting from enzymatic and chemical reactions [17–19]. Certain environmental conditions, including desiccation of organic material and low ambient temperatures, can inhibit the activity of some endonucleases and environmental microorganisms, although oxidative and hydrolytic processes will continue to occur in all conditions at variable rates depending on the preservational context [20]. However, the predictably recurring forms of damage, such as the tendency for cytosine deamination to occur more often near the 3’ and 5’ ends of fragmented DNA molecules, also provide a means of addressing questions of contamination and inferring the authenticity of a recovered sequence through statistical pattern analysis [21]. For rapidly evolving genomes, such as those from viruses, phylogenetic analysis can provide another means of establishing provenance. In particular, as the rate of evolutionary change is high in many viruses [22], “ancient” viruses should generally fall closer to the root of a tree than their modern relatives. However, a complicating factor is that any phylogenetic inference that aims to determine evolutionary rates or time-scale is only meaningful when the virus in question exhibits clear temporal structure in the phylogeny, such that it is evolving in an approximately clock-like manner over the time-scale of sampling. Because DNA viruses generally exhibit lower rates of evolutionary change than RNA viruses [22], such populations tend to require larger sampling time-frames to discern temporal structure and clock-like behavior [16]. Hepatitis B virus (HBV) (family Hepadnaviridae) presents a compelling case of the complexities of analyzing the evolution of DNA viruses. Despite considerable effort, the evolutionary rate and time of origin of this important human pathogen remain uncertain, even though it is chronically carried by approximately 350 million people globally, with almost one million people dying each year as a result [23]. Particularly puzzling is that although HBV utilizes an error-prone reverse transcriptase (RT) for replication, estimates of its evolutionary rate are generally low, yet also highly variable. For instance, mean rate estimates of 2.2 × 10−6 nucleotide substitutions per site per year (subs/site/year) have been derived from long-term studies utilizing internal node calibrations on phylogenetic trees built using conserved regions of the viral genome [24], while rates of up to 7.72 × 10−4 subs/site/year have been recorded within a single patient [25] and pedigree-based studies have returned mean rates of 7.9 × 10−5 subs/site/year [26]. As the highest evolutionary rates are observed in the short-term, this pattern is consistent with both relatively high background mutation rates and a time-dependent pattern of virus evolution in which rates are elevated toward the present due to incomplete purifying selection [16]. HBV is also genetically diverse, comprising ten different genotypes designated A–J, as well as additional subgenotypes within genotypes A–D and F [27]. Intra-genotypic sequence differences average 8%, while subgenotypes differ by an average of 4% [28, 29]. The HBV genotypes differ in their geographic distributions. Genotype A is most prevalent in northwestern Europe and the United States, while genotypes B and C predominate in Asia, and genotype D in the Mediterranean basin, including Italy, as well as the Middle East and India. Similarly, genotype E is mostly seen in west Africa, genotype F in South and Central America, genotype G in the USA and France, and genotype H in Mexico and South America [30]. The more recently described genotype I has been identified in Vietnam [31] and Laos [32], while the one example of genotype J was isolated from a Japanese sample [33]. With over half of the nucleotide coding for more than one protein, the physical constraints of the HBV genome are likely to have a major impact on evolutionary dynamics. Partially double-stranded, the relaxed-circular DNA genome averages ~3200 bp in length for the longer strand and ~1700–2800 for the shorter, comprised of four overlapping open reading frames (ORFs). These ORFs encode seven proteins; the pre-core and core proteins, three envelope proteins (small, medium, and large S), the reverse transcriptase (RT, the polymerase), and an X protein that is thought to mediate a variety of virus-host interactions [34, 35]. As noted above, the replication of HBV involves the use of RT, an enzyme that has no associated proofreading mechanism, such that mutational errors are expected to be frequent [25, 36]. However, due to the overlapping ORFs, many mutations are likely to be non-synonymous and therefore purged by purifying selection. Considerable uncertainty remains as to when HBV entered human populations and when it differentiated into distinct genotypes. Given its global prevalence and the presence of related viruses in other mammals including non-human primates, it is commonly believed that the virus has existed in human populations for many thousands of years [37]. Further, recent studies have shown that the long-term evolutionary history of the Hepadnaviridae is shaped by a complex mix of long-term virus-host co-divergence and cross-species transmission [38]. Hence, it seems reasonable to conclude that HBV diversified within geographically isolated human populations following long-term continental migrations [24]. Ancient DNA has the potential to provide a new perspective on the evolutionary history of HBV. Notably, HBV has been sequenced from a Korean mummy radiocarbon dated to 330 years BP (±70 years) which translates to ca.1682 (with an error range of 1612–1752) [12]. Phylogenetic analysis of this sequence (GenBank accession JN315779) placed it within the modern diversity of genotype C, which is common to Asia. This phylogenetic position is compatible with low long-term rates of evolutionary change in HBV such that the virus has existed in human populations for many thousands of years, with genotypes diversifying over this time-scale [24]. However, the authenticity of this sequence, and hence the evolutionary time-scale it infers, remains uncertain, as deamination analysis and read length distribution were not reported as the data were generated using PCR based methodologies. In addition, as genotype C is common in modern Asian populations [37] and the mummy sequence (JN315779) clusters closely with modern sequences, there is no phylogenetic evidence to support the historical authenticity of this sequence. For aDNA to resolve the origins of HBV it is of paramount importance to determine whether ancient samples can be used to calibrate the molecular clock to provide more accurate estimates of the time-scale of HBV origins and evolution. To this end, we report the detailed study of a complete HBV genome sampled from a 16th century Italian mummy.

Discussion We have enriched and sequenced a complete HBV genome from the remains of a mummified child estimated to have died in 1569 CE ± 60 years [39]. The cytosine deamination patterns occurring preferentially at termini in both the viral and mitochondrial DNA fragments support the ancient authenticity of these sequences. We have also subtyped the mitochondrial DNA from the mummy to haplogroup U5a1b, a common European haplogroup [46], and the HBV to genotype D, a genotype predominant in the Mediterranean region today [30]. The nearly identical fragment length distributions, deamination patterns and same geographically recovered haplotypes (mitochondrial and HBV) argue for the authenticity of the sequences. The identification of consistent HBV reads in multiple (5) tissue samples (distal femur, fronto-parietal bone with skin, thigh muscle, temporo-maxillary skin and leg skin; Table 2), suggests that the virus is distributed throughout the mummy and not in one location, as might be expected with contamination. Further, other mummies from the same site, excavated at the same time and processed in the same facilities, did not show any HBV reads in shotgun sequencing data. Thus, if the mummy was contaminated, it was specific to this one sample alone. Although hepadnaviruses like HBV exhibit strong tropism for liver cells (hepatocytes), hepadnaviral DNA has been shown to exist in other somatic cells, including mononuclear cells, which are protected by the hydroxalite matrix of the bone [55–57]. HBV particles produced in the bone marrow and protected by the matrix may explain why we recovered the majority of our viral sequences from a femur sample. DNA isolated from ancient bone matrix has also been shown to be better preserved and less damaged than that recovered from corresponding soft tissue from the same remains [58]. Many reports of ancient epidemics and other disease outbreaks have relied upon historical reporting and paleopathological studies of human remains. Recent advances, including next generation sequencing technology [5] and DNA enrichment methods [6], now allow recovery of ancient nucleotide sequences from these remains and the genetic verification of the pathogens responsible for disease, as well as the identification of pathogens undetectable by other means. Our study provides a strong argument for this latter approach, as mummy NASD24 was originally reported to have been infected with smallpox [40, 43]; crucially, however, shotgun sequencing following enrichment for VARV (S1 Table) and SEM analysis (S1 Fig) revealed no evidence of VARV in this mummy. This is particularly surprising given previous results in which electron microscopy studies and immunostaining indicated the presence of VARV particles in these samples [43]. Given our results, a new interpretation is that the child was not suffering from smallpox at the time of death, but rather Gianotti-Crosti syndrome caused by HBV infection [59]. Gianotti-Crosti syndrome is a rare clinical outcome of HBV that presents as a papular acrodermatitis in children between 2 and 6 years old [59]. This, in turn, illuminates the power of aDNA in providing definitive evidence or clarifying retrospective diagnoses, where etiology may be uncertain and morphology complicated for key type specimens that provide critical time points for the origins or presence of specific pathogens (e.g. smallpox). Despite the multiple streams of evidence supporting an ancient origin of NASD24SEQ, the results of the evolutionary analysis are less straightforward. In particular, our phylogenetic analysis reveals a close relationship between NASD24SEQ and modern D genotype sequences, as would be expected if the sequence were a modern (1980s) contaminant, and we note the same phenomenon with the Korean mummy sequence thought to date from the 17th C [12]. Importantly, however, data sets representing only modern HBV sequences, sampled over 50 years to the present, did not display discernible temporal structure. Clearly, without temporal structure we cannot accurately estimate the age of the ancient sequences using phylogenetic methods. Hence, the apparently paradoxical phylogenetic position of NASD24SEQ cannot automatically be taken to mean that this genome is a modern contaminant. In turn, if NASD24SEQ is indeed from the 16th century, then this phylogenetic pattern indicates that the diversification of the HBV genotypes occurred prior to 1500 and that any subsequent accumulation of diversity was either lost through strong purifying selection or masked by multiple substitutions. Our analyses of both modern and ancient HBV samples returned results consistent with the absence of temporal structure, not only within the full diversity of HBV but also within the D genotype and D3 subgenotype. Given that the genomic structure of HBV is likely to result in strong selective constraints, a likely explanation for our results is that many of the mutations that arise in the short-term, such as within chronically infected hosts or along single chains of transmission, are non-synonymous and eventually removed from the HBV population by purifying selection, yet artificially inflating evolutionary rates over this sampling period [60]. Support for this hypothesis comes from short-term studies of HBV evolution in which rates of evolutionary change are greater than those estimated from longer-term studies [24, 25]. On balance, our analysis suggests that our HBV sequence is authentically 16th century and that no temporal structure is observable in over 450 years of HBV evolution. As such, these results have a number of important implications for the study of HBV evolution. In particular, such a phylogenetic pattern implies that the currently circulating viral genotypes must have been associated with their specific host populations long before the 16th century, and hence supports a long association of HBV with human populations. In addition, the lack of temporal structure means that it is not possible to use molecular clock methods to reliably date HBV evolution over the time span of genome sequences currently available.

Materials and methods Sample history Exploration of the coffin and the autopsy of the unidentified two-year old mummy (NASD24) was completed as part of a larger study, conducted between 1984 and 1987, of mummies from the sacristy of the Basilica of Saint Domenico Maggiore in Naples, Italy. Autopsies were performed for all mummies by paleopathologists wearing sterile surgical coats, sterile latex gloves, sterile masks, headdresses and overshoes. Details of this initial investigation have been reported previously [40]. The samples used for aDNA sequencing were collected during the initial autopsies, and these samples were stored in sealed, sterile plastic bags. The sample bags were first opened in 1985 for a preliminary paleopathological examination that suggested a viral agent, thought to be smallpox, as the likely cause of an apparent skin rash [40]. The samples were handled again in 1986 for examination by immune-electron microscopy [43]. After this, the samples remained in sterile storage before subsamples were removed and sent to the McMaster Ancient DNA Centre at McMaster University (Hamilton, Ontario, Canada) in 2013. Ethics approval Ethical approval to work on this mummy sample was granted to Dr. Fornaciari by the Supervisor for the Artists and Historians of Campania in 1984. aDNA library preparation, enrichment and sequencing Eight subsamples of 75–125 mg of organic matter were excised from samples of various body parts of NASD24 in a dedicated cleanroom facility at the McMaster Ancient DNA Centre (Table 1). Tissue samples were cut into small pieces using a scalpel and bone material crushed into powder. These samples were then demineralized, digested, and extracted according to previously published protocols [44]. In brief, samples were demineralised using 1.5 mL of EDTA (0.5 M, pH 8.0) before being incubated at room temperature for 24 hours with rotation at 1000 rpm. Samples were then digested using a 1.5 mL proteinase K solution and incubated at 55°C for 6 hours with rotation at 1000 rpm. The supernatant from the demineralization and digestion steps was subjected to organic extraction using a modified phenol-chloroform-isoamyl (PCl) alcohol protocol. This means that 0.75 mL of PCl (25:24:1) was added to the demineralization/digestion supernatant, which was then vortexed, and spun via centrifuge (4000×g) for 20 minutes. The aqueous phase was transferred to a fresh tube and a further 0.75 mL of chloroform added. The solution was mixed and spun via centrifuge (4000×g) for 10 minutes. The aqueous phase was collected and concentrated using an Amicon Ultra 0.5 mL 30 kDa filter. This concentrated solution was purified over a MinElute column (Qiagen, Hilden, Germany) according to the manufacturer’s instructions, and eluted in 10 μL of 0.1 TE with 0.05% Tween-20. Reagent blanks were introduced at each step and processed alongside the samples. A library from the distal femoral sample of NASD24 was prepared according to a previously published protocol [61] that was modified to include an overnight ligation and with an input volume of 5 μL. Double indexing was performed using KAPA SYBR FAST (Kapa Biosystems) for 8 cycles of indexing amplification [62]. The library and blanks were enriched using two rounds of in-solution capture baits targeting HBV and the human mitochondrial genome (in separate reactions) according to the manufacturer’s instructions (Mycrorarray, MyBaits) with recommended aDNA modifications. Baits, of 80nt in length with 4x tiling density (10nt flexible spacing), were designed based on the sequences of 5,230 HBV sequences, representing all major viral subtyptes. Template input was 5 μL for each reaction and bait concentrations were 100 ng per reaction using the in-solution bait mix targeting HBV and 50 ng per reaction for that targeting human mtDNA. Target genetic material was reamplified for 12 cycles both between and after rounds of enrichment. The HBV-enriched library generated 1,934,624 clusters (3,869,248 raw reads) and the human mtDNA-enriched library generated 2,844,500 clusters (5,689,000 raw reads) on an Illumina HiSeq 1500 at the Farncombe Metagenomics Facility (McMaster University, Hamilton Ontario, Canada). Sequence data trimming, analysis and assemblies Reads were demultiplexed using CASAVA-1.8.2 (Illumina, San Diego, California), then adapters were trimmed and reads merged using leeHom [63] with aDNA specific settings (—ancientdna). These processed reads were mapped to an appropriate reference genome (HBV genotype D3, GenBank accession X65257; revised Cambridge Reference Sequence for human mtDNA, GenBank accession number, NC_012920) using a network-aware version of the Burrows-Wheeler Aligner [64] (https://bitbucket.org/ustenzel/network-aware-bwa) with distance, gap and seed parameters as previously described [8]. Duplicates were removed based on 5’ and 3’ positions (https://bitbucket.org/ustenzel/biohazard). Reads shorter than 30 base pairs and with mapping quality less than 30 were removed using Samtools [65]. The resulting BAM files were processed using mapDamage 2.0 on default settings with plotting and statistical estimation [47]. Haplogrep v2.1.0 [66] using PhyloTree Build 17 [67] was used to identify the haplogroup of the mtDNA as U5a1b. The complete genome sequence of NASD24 has been submitted to GenBank and assigned accession number MG585269. Data set assembly We analyzed the ancient HBV sequenced in this study in the context of modern whole-genomes of HBV. To this end we downloaded all human HBV genomes from GenBank that were over 3,000 nt in length and for which the year of sampling was available (all date information, including month and day, if available, was converted into decimal format). This initial GenBank data set comprised 3,696 sequences sampled between 1963 and 2015 (S2 Table). Sequences were aligned with the MAFFT v7 program using the FFT-NS-1 routine [68] to visually check for obvious errors in database labeling. For initial genotypic subtyping, one representative of each HBV subtype was selected (S3 Table). This data set included the purportedly ancient HBV sequence previously obtained from a 17th century Korean mummy (GenBank accession number JN315779; radiocarbon-dated to 1682 with an error range of 1612–1752 [12]). The subtype of NASD24SEQ was inferred from maximum likelihood (ML) phylogenetic trees estimated using PhyML v3.0 [69] with the GTR+Г 4 model of nucleotide substitution and employing SPR branch-swapping, with nodal support assessed by conducting 1000 non-parametric bootstrap replicates. Following this, a random subsample of HBV sequences was taken using the Ape package in R [70]. Specifically, we sampled five representatives of each genotype and subtype, or the maximum number available if this was not five. This produced a data set of 135 sequences sampled between 1963–2013 which we refer to as subset a (Table 3, S4 Table). We then added the ancient Italian HBV sequence NASD24SEQ to subset a to make data set a-i with n = 136. A third subset was built by adding the ancient Korean HBV sequence JN315779 to a-i to make subset a-ii with n = 137. We next randomly sampled only D genotype sequences from the initial GenBank data set to form subset b with n = 64 (S5 Table). To this we added NASD24SEQ, generating subset b-i with n = 65. We aligned the nucleotide sequences in each subset using the L-INS-i routine in MAFFT v7. Recombination analysis The RDP, GENECOV, and MAXCHI methods available within the RDP v4 package [71], with a window size of 100 nt (and default parameters), were used to analyze each subset for recombination. If at least two methods detected recombinant regions in a sequence, then we removed the region of recombination from the alignment. Phylogenetic analysis Following removal of recombinant regions, we inferred phylogenetic trees on each subset (of a and b) again using the ML method in PhyML v3.0 [69] with the GTR+Γ 4 model of nucleotide substitution and employing SPR branch-swapping, and with nodal support assessed by conducting 1000 non-parametric bootstrap replicates. Assessing the temporal structure of HBV We conducted a range of analyses to assess the extent of temporal structure in the data sets and to estimate the rate and time-scale of HBV evolution. To initially verify the temporal structure in the data we conducted regressions of root-to-tip genetic distance as a function of the sampling time (year) using TempEst v0.1 [49]. We then conducted a date-randomization test [52]. This involved analyzing the data using the Bayesian method implemented in BEAST v1.8.3 [50] under a lognormal relaxed clock model [72] and assuming a constant population size, with 20 replicates in which the sampling dates were randomized among the sequences. The HBV data were considered to have temporal structure if the mean rate estimate and 95% HPD intervals were not contained within the 95% HPD of any of estimates resulting from the randomized data sets [52]. All analyses were run with an Markov chain Monte Carlo chain length of 107 steps with samples from the posterior distribution drawn every 2 × 103 steps. After discarding the first 10% of steps as burn-in, we assessed sufficient sampling from the posterior by visually inspecting the trace file and ensuring that the effective sample sizes for all parameters were at least 200. We also performed more detailed Bayesian estimates of evolutionary dynamics. One method of validating the ages of ancient samples is to specify uninformative prior distributions for these and test if the tip dates or an informative rate give the postulated ages. Specifically, if the sequence data and calibrations are informative, the posterior should consist of a narrow distribution that includes the true sampling time of the ancient samples [73]. However, if the prior and posterior distributions for the ages of the ancient samples are the same, then the data are considered to have insufficient information to estimate the ages of these samples. We conducted these analyses by setting uniform distributions for the two ancient samples between 0 and 107 with a mean of 105. For the Korean sample we set a uniform prior with upper and lower bounds of 400 and 0, whereas for the Italian sample we used 507 and 0, with the maximum values in both cases reflecting their presumed sampling date. We also conducted analyses in which, for the Italian sample, we set a normal truncated prior distribution with the upper and lower values at 507 and 387, and mean of 447 and standard deviation of 10, whereas for the Korean sample we used 400 and 260 for the bounds, and 330 and 35 for the mean and standard deviation, respectively. These numbers are based on the radiocarbon dating analysis, with the upper and lower values reflecting the error margins. We employed three calibration strategies for these analyses. (i) First, we used the sampling times of the modern samples, but with a uniform prior for the rate bounded between 0 and 1. (ii) Second, we assumed that all the modern samples were contemporaneous, and specified internal node calibrations, corresponding to those used by Paraskevis et al. (2015) and Llamas et al. (2016) [36, 54] and assuming that the spread of HBV corresponds to that of early human populations [24, 36]. These consisted of setting normal priors for the time of the most recent common ancestor (tMRCA) of F and H genotypes at 16,000 years, with a standard deviation of 1000, setting the tMRCA for subgenotype B6 at 3500 years with a standard deviation of 3000, and setting the tMRCA for subgenotype D4 at 8500 years with a standard deviation of 3500. (iii) Finally, we calibrated the molecular clock by specifying an informative prior for the mean (long-term) substitution rate of HBV based on the estimate by Paraskevis et al. (2013) [24]. Accordingly, the prior was in the form of a normal truncated prior with mean of 2.2 × 10−6 sub/site/year, standard deviation of 5 × 10−7, and lower and upper bound of 1.5 × 10−6 and 3 × 10−6, respectively. We again used the GTR+Γ 4 nucleotide substitution model for these analyses, although a codon-partitioned HKY model was also considered within the internally calibrated analysis.

Acknowledgments We thank Filip Braet for advice on interpretation of electron microscopy results, Simon Ho for advice on the molecular clock analysis, Jemma Geoghegan for assistance with Fig 1. and the McMaster aDNA Centre for comments on versions of the manuscript and for providing general thoughts on possible causes of the rash in our child mummy. We also acknowledge the Sydney Informatics Hub and the University of Sydney HPC cluster Artemis for providing the HPC resources that contributed to the research results reported within this paper.