Mummy DNA extraction and quantification

A dissected lung from the Aconcagua’s child was used for DNA extraction (Fig. 1). The post-mortem time was approximately 500 years according to the anthropological and historical information obtained1. The lung was preserved at −20° C since it was exhumed in 1985.

All DNA protocols were carried out in dedicated laboratories especially designed for DNA extraction of aged samples with facilities to minimize the risk of contamination, following ISFG recommendations26. Operators used the necessary equipment to avoid contamination of the samples at every step of the analysis (full-body sterile suit, gloves, face screen, etc). Plastic-ware used was DNA-free, autoclaved and UV irradiated as an additional precaution. All steps were carried out in laminar flow cabinets. Reagent blanks accompanying the extraction procedure were processed and checked for contamination.

Using a sterile scalpel and forceps, an inner piece of the lung’s tissue sample was extracted and further placed inside a sterile disposable Petri dish. All outer surfaces were discarded to prevent contamination from contemporary DNA. Thus, a 350 mg piece of lung tissue, brown colored and showing signs of dehydration was obtained and placed inside a 50 mL sterile tube. Details on the extraction protocol are provided in Supplemental data Text S1.

The DNA extract was quantified using the AB QuantifilerTM Human DNA Quantification kit with the 7500 real-time PCR system according to the manufacturer’s protocol. Based on the quantification results, the sample was diluted for mtDNA control region PCR amplification.

Mitochondrial DNA amplification and sequencing

Sequencing of the mummy mtDNA control region was carried out in the two laboratories involved in the present study. All PCR reactions included a negative control as well as 9947A control DNA. A minimum of two amplifications were performed on the DNA extract. Replicate analysis of the same extracts was carried out to provide duplicated control region sequence consensus. There was no evidence of contamination in any reagent blanks or negative PCR controls (Figure S3).

For amplification and sequencing reactions the following primers pairs for the control region were used: 15967F-16429R, 16268F-186R, 6F-430R, 350F-639R in the Argentinean lab and 15997F-16401R, 16380F-17R, 16555F-408H, 332F-599H in the Spanish lab. The complete genome was sequenced using the primers described by Kivisild et al.27 and following protocols previously described28. PCR conditions were previously described29,30.

All PCR products obtained were checked by agarose gel electrophoresis. For purification of amplified PCR, products EXOSAPit was used.

Sequencing reactions were performed by using BDTv3.1 sequencing kit (Applied Biosystems) using 3 μl of PCR purified product template. Centrisep columns and EdgeBio Performa® DTR V3 96-Well plates were the sequencing purification method. Capillary electrophoresis was performed in an ABI Prims 3130 in the Argentinean lab and ABI 3730 Genetic Analyzer (Life Technologies, CA, USA) in the Spanish lab. Sequencing Analysis 5.2 software was used for quality sequencing analysis. Sequence edition was performed by using Sequencher (Ann Arbor, MI, USA) in the Argentinean lab and SeqScape (AB) in the Spanish lab; the sequences obtained were analyzed against the rCRS (Revised Cambridge Reference Sequence). A posteriori sequence quality was evaluated following the methods described earlier31,32.

The mtDNA control region was obtained by two independent laboratories from Argentina (Córdoba) and Spain (Santiago de Compostela). A blind test comparison of the sequencing results was carried out: the results were cross-compared and matched completely. The DNA of the mummy was well preserved as indirectly indicated by the good quality of the sequencing results obtained in both laboratories.

The mtDNA of all lab operators was sequenced in order to facilitate the identification of potential contamination from biological sources.

Table 1 shows the full variation observed in the mummy’s haplotype. Note that the alignment of the haplotype found in the mummy shows length variability in the HVS-II segment and this alignment allows for several nomenclatures such as (i) 56 T 56 + C and (ii) 56 T 57 60 + T. Although the first option seems more parsimonious, the second option fits better with the global phylogeny26.

Consideration of contamination issues

Previous studies on ancient DNA assumed that a certain amount of contamination is unavoidable even in ancient specimens that were excavated following dedicated protocols that pay special attention to the contamination issue10,33. It is also well-known that the impact of contamination is highly dependent on the degree of preservation of the original DNA34. The case of the Inca mummy is different from most previous studies on ancient DNA and it bears some similarities with the Tyrolean Iceman10 in terms of preservation with the additional advantage of the Inca mummy being much younger than the Tyrolean Iceman. Thus, the specimen analyzed in the present study is a mummified human body that underwent a spontaneous freeze-desiccation process and was preserved at a very low temperature. Several lines of evidence suggested a very good preservation of the mummy’s endogenous DNA:

a The mummy was almost completely buried at the time of its discovery. It was moved directly to a freezer and remained frozen all the time until DNA extraction. Extraction of lung tissue was done in an operating room in completely sterile conditions. In order to further prevent potential contamination, an inner portion of the lung piece was taken for DNA extraction in a dedicated laboratory. b Its entire mtDNA control region haplotype was obtained by two independent laboratories. The results were cross-compared and matched completely. c Mitochondrial DNA of all the operators was obtained and cross-checked with the haplotype of the mummy (Supplemental Data Table S2). These haplotypes were phylogenetically incompatible with the same haplogroup observed in the mummy’s haplotype. d The variation observed in the mummy fits perfectly with the expected Native American phylogeny and therefore there is no indication of sequence artifacts31,35. Moreover, the phylogenetic characteristics of private mutations observed do not consistently point to the presence of contaminant haplotypes and artificial recombination36,37,38. e Multiple heteroplasmies are not common and would often point to sequencing problems31. No point heteroplasmies were detected in the mummy’s sequences. f The mummy’s haplotype has 10 mutations on top of the of the C1b root. The fact that this haplotype identifies a new lineage that does not fall within the main extant sub-clades of the C1b and the global Native American phylogeny (represented by 201 and >1,200 complete mitogenomes, respectively), adds additional support to the authenticity of the sequence10. There are only a few control region haplotypes that fit with the mummy’s haplotype motif (n = 6 from an American database of more than 170,000 haplotypes), indicating that this haplogroup is at least very rare in present-day populations and, therefore, very unlikely to appear as contaminant in our analysis10. g The 10 mutations that characterize C1bi are private within C1b haplogroup, but only the transition T662C has not been observed previously in databases and internet searches (executed as in39) carried out on worldwide mtDNA variation. h The amount of variants accumulated in this haplotype with respect to what is observed for the many C1b clades also falls within the expected range as can be estimated from the phylogeny (Supplemental Data Figure S2).

Statistical analysis

Maximum parsimony trees were built for C1b mitogenomes. Figure 2 shows the skeleton of C1b phylogeny. TMRCA of this mitogenome phylogeny was computed using of a maximum likelihood (ML) procedure according to Phylotree 16 phylogeny (Table 2). For this purpose, the software PAML 4.440 was used assuming the HKY85 mutation model (ignoring indels, as per common practice) and using gamma-distributed rates (approximated by a discrete distribution with 32 categories) and three partitions: HVS-I (positions 16051–16400), HVS-II (positions 68–263) and the remainder. The TMRCA of entire mitogenomes and C1bi control region sequences (Fig. 2) was also computed using the averaged distance (ρ) of all the haplotypes in a clade to the respective root haplotype. This was done (i) using the whole variation observed in mitogenomes and (ii) considering only synonymous mutations. An heuristic estimate of the standard error (σ) was calculated from an estimate of the genealogy41. All TMRCA calculations were obtained using the entire mtDNA genomes but excluding hotspot mutations such as 16182C, 16183C and 16519. The ‘star-likeness’ of the trees was measured using the star index ρ/n×σ2; this index can take values between 1/n (single haplotype representing n mtDNAs) and 1 (perfect star phylogeny)19,42.

Both methods, ML and ρ, yielded very similar divergence ages (Table 2). There were anomalous behaviors on dates obtained for some sub-clades (e.g. C1b1, C1b3; Table 2); which could be explained by the under/overrepresentation of some sub-clade in the phylogeny (in a similar way as it was observed for haplogroup A2w1 in Söchtig et al.24) and/or because the star-likeness of some branches is very low (C1b1, C1b2, etc). Moreover, ages computed using ρ make no phylogenetic sense for some sub-haplogroups; for instance the age of C1b1 using synonymous mutations is older than the age of the whole C1b haplogroup. Overall, ages obtained using ML seem to be more consistent and these were the ones used for discussion throughout the text with the exception of coalescent ages computed on control region for which only ρ estimates were obtained.

Mutational distances were converted into years using the corrected evolutionary rate (and the calculator) from Soares et al.43, namely, for the entire molecule (1.16649 ×10−8 substitutions per nucleotide per year or one mutation every 3624 years) and for the control region (HVS-I: 1.64273 ×10−7; HVS-II: 2.2964 ×10−7). Standard deviations of age estimates are noted as SD throughout the text.

Bayesian skyline plots44 (BSPs) were obtained using the software BEAST v1.845 for the complete C1b sequences using (i) a relaxed molecular clock (lognormal in distribution across branches and uncorrelated between them)46 and (ii) the HKY model of nucleotide substitutions with gamma-distributed rates. Ancestral gene trees for each region were inferred using a GTR substitution model. Each MCMC sample was based on a run of 100 million generations, with samples drawn every 20,000 MCMC steps, after a discarded burn-in of 10,000,000 steps. BSPs were visualized using Tracer v.1.647. We used a strict molecular clock with a mutation rate of 1.16649 ×10−8 substitution/site/year for the entire mitogenome43. We used a generation time of 25 years, as in Fagundes et al.48.

Phylogeographic searches of mtDNA haplotypes were carried out in an in-house database containing >27,000 mitogenomes and >170,000 partial (mainly HVS-I) mtDNA sequences. Additional searches were carried out on EMPOP (http://empop.org), Familytree (https://familysearch.org/), Mitosearch (http://www.mitosearch.org) and the Sorenson (http://www.smgf.org/) databases.

The geographic representation of C1b mitogenomes was carried out using SAGA v. 2.1.1 (http://www.saga-gis.org/). We used the ordinary Kriging method for interpolating haplogroup frequencies.