Ancient DNA extraction and library preparation

All pre-amplifications steps were carried out in clean room facilities dedicated to ancient DNA work at the University of Tübingen. Before the sampling all samples were UV irradiated for 60 min to reduce modern contamination. In addition, the surface of the bone or tissue samples was removed and the teeth were sampled from inside of the tooth pulp. DNA was extracted from 50 mg bone powder for bone or tooth samples, from 100 mg tissue for soft tissue samples, respectively. A silica purification protocol was applied as described in ref. 56 using the following modifications: the Zymo-Spin V funnels (Zymo Research) were bleached and UV irradiated for 60 min and the total elution volume was raised to 100 μl. Aliquots of 20 μl extract were converted into double-stranded Illumina libraries following a well-established protocol22 and sample specific barcodes were added to both sides of the fragments via amplification22,23. Extraction and library blanks were treated accordingly.

Subsequently, the indexed libraries were amplified using 100 μl reactions for each library containing 5 μl library template, 4 units AccuPrime Taq DNA Polymerase High Fidelity (Invitrogen), 1 unit 10 × AccuPrime buffer (containing dNTPs) and 0.3 μM IS5 and IS6 primers22, and the following thermal profile: 2-min initial denaturation at 94 °C, followed by 4–17 cycles consisting of 30-s denaturation at 94 °C, a 30-s annealing at 60 °C and a 2-min elongation at 68 °C and a 5-min final elongation at 68 °C. The amplified libraries were then purified using the MinElute PCR purification kit (Qiagen, Hilden, Germany), quantified with Agilent 2100 Bioanalyzer DNA 1000 chips and were used for the enrichment of the human mitochondrial DNA.

For the nuclear capture two additional libraries for selected 40 samples using 20 μl extract were created as described above with the addition of a UDG treatment27 (see Supplementary Note 2 for details).

Mitochondrial DNA enrichment and sequencing for sample processing

All samples were enriched for human mitochondrial DNA via bead capture hybridization as detailed elsewhere33. After enrichment the libraries were amplified in 100 μl reactions with 15 μl template, 2 units Phusion High Fidelity DNA polymerase, 1 unit 5 × HF buffer, 0.25 mM dNTPs and 0.3 μM IS5 and IS6 primers22, and the following thermal profile: 5-min initial denaturation at 95 °C, followed by 16–23 cycles consisting of 30-s denaturation at 95 °C, a 30-s annealing at 60 °C and a 45-s elongation at 72 °C and a 5-min final elongation at 72 °C. Subsequently, the libraries were purified and quantified as described before and paired-end dual index sequencing was carried out on an Illumina HiSeq 2500 platform by 2 × 100+7+7 cycles following the manufacturer’s protocols for multiplex sequencing (TruSeq PE Cluster Kit v3-cBot-HS).

Mitochondrial DNA sequence processing and alignment

The resulting FastQ files have been processed using EAGER v1.92 (ref. 57). To achieve improved coverages at both ends of the mitochondrial reference, we used the CircularMapper option in EAGER. All reads with a mapping quality of at least 30 were kept for the subsequent analysis. Duplicate reads have been removed using DeDup v0.9.10, included in the EAGER pipeline. The coverage and statistics calculation has been performed inside the EAGER pipeline and indels have been realigned using RealignerTargetCreator and IndelRealigner from the GATK58. Mitochondrial haplogroups have been determined using HaploGrep 2 (ref. 59). Further details of the analysis parameters can be found in Supplementary Note 3. As can be seen in Supplementary Data 1, we achieved coverages ranging from 11-fold up to 4284-fold on the mitochondrial genome, with an average of 408-fold.

Mitochondrial DNA authentication and contamination assessment

Accompanying measures to limit contamination of the libraries in the laboratory work, in silico analysis has been done in order to authenticate samples and further determine the amount of potential contamination on the mitochondrial level. Negative controls were processed in parallel with samples. The former show no substantial mapping rates and suggest that the amount of DNA introduced during laboratory work could be kept on a minimal level. The authenticity of the samples has been further assessed by applying a number of methods and criteria. MapDamage 2.0 (ref. 60) has been used to evaluate fragment lengths and nucleotide misincorporation patterns of the provided samples, all of which showed levels that are characteristic for ancient DNA13. The degree of mitochondrial DNA contamination as well as contamination estimates based on the deamination patterns have been assessed using schmutzi15, generating consensus sequences of both contaminant and supposedly endogenous DNA simultaneously. Furthermore, only samples with less than 3% estimated contamination based on deamination and degree of mitochondrial contamination have been used for further downstream analysis. We furthermore determined whether there are inconsistencies between our haplogroup assignments of the mitochondrial and the nuclear capture respectively, but did not find any (see Supplementary Data 3 for details). As can be seen in Supplementary Table 1, our samples showed damage on both 3′ and 5′ ends of reads in the range of 5% up to 49%, with an average of 14%. Furthermore, the contamination estimation methods showed very low levels of contamination after comparison to a database of putative contaminants, as provided by the used method schmutzi. For all samples, the observed contamination estimates prove to be less than our defined threshold of 3%, except for three samples (JK2879, JK2883, JK2896) where a visual inspection of sequence assemblies was done as described in Posth et al.61 to identify potential contaminating lineages and ensure consistency of the generated consensus mitochondrial genome. As an additional measure, we used the built-in feature ‘log2fasta’ of the tool schmutzi to only incorporate bases in our final consensus sequence with a significant likelihood to be non-contaminated as defined by the method itself. In order to do this, we applied several quality thresholds (q=0,20,40,80) in our analysis and used a moderate filtering value that did not change our consensus sequence to undefined positions to a larger extent. We ultimately chose a value of q=20 for filtering with ‘log2fasta’, but even more strict filtering with q=40 preserved our haplotyping calls to be consistent. However, filtering even stricter introduced more undefined positions (‘N’) due to missing support, potentially hindering sequence-based analysis more dramatically than our frequency-based analysis, which is why we kept a quality threshold of q=20, following cutoffs that other authors have been using, too61.

Nuclear DNA capture

The non-UDG and UDG treated libraries were enriched by hybridization to probes targeting approximately 1.24 million genomic SNPs as described previously25. The target SNPs consist of panels 1 and 2 as described in Mathieson et al.41 and Fu et al.26 (see Supplementary Note 2 for details).

For each of the 40 samples, we sequenced two captured libraries: one with enzymatic damage repair (UDG), one without (non-UDG). For all samples, we used the EAGER pipeline version 1.92.15 (ref. 57), with default parameters, and with the option to keep only merged reads. We determined the sex of each sample by obtaining the average coverage on X chromosome, Y chromosome and autosomal SNPs in the capture pool using a custom script. We flagged samples as ‘male’ when the ratio of X and autosomal coverage was lower or equal than 0.75 and the ratio of Y and autosomal coverage was greater or equal than 0.25. We flagged samples as ‘female’ when the ratio of X and autosomal coverage was greater than 0.75 and the ratio of Y and autosomal coverage was lower than 0.25. For all male samples that had at least a total number of 150 SNPs on chromosome X covered twice, we obtained contamination estimates using the ANGSD software62, using the ‘MoM’ estimate from ‘Method 1’ and the ‘new_llh’ likelihood computation. Supplementary Data 2 summarizes all these results. In some cases, ANGSD finished with an error, as indicated in the table. Entries with ‘n/a’ are either female or have insufficient coverage on the X chromosome.

Three samples were selected for down-stream analysis: JK2134, JK2888 and JK2911. In all three of these samples, contamination estimates were acceptable, and similar in both UDG and non-UDG libraries as can be seen in Supplementary Data 2. Furthermore, in all three samples the non-UDG library showed DNA damage over 8% in the first base pair of reads, which is within the expected range of damage for ancient DNA of this age.

Nuclear data analysis: genotyping

We called genotypes from the UDG treated data for the three individuals by sampling a random read per SNP in the SNP-capture panel, using a custom tool ‘pileupCaller’, available at https://github.com/stschiff/sequenceTools. The resulting genotypes were merged with data from two other data sets: First, 2,367 modern individuals genotyped on the Affymetrix Human Origins Array34,35; second, 294 ancient genomes36.

Nuclear data analysis: ADMIXTURE

We used the ADMIXTURE software on the merged data set to cluster ancestry proportions using different numbers of clusters37. The lowest cross-validation error was obtained using K=16 and we show the results of that run in Supplementary Fig. 4. A subset is shown in Fig. 4b.

Nuclear data analysis: PCA

We performed PCA on the joined data set using the ‘smartpca’ software from the Eigensoft package63. For the plot shown in Supplementary Fig. 3, we used a selected set of European populations as shown in Supplementary Note 2.

Nuclear data analysis: f 3 -statistics

We used the ‘qp3pop’ tool from the Admixtools package39 to compute Outgroup f 3 -statistics of the form f 3 (Mbuti; Egyptian, X), where ‘Egyptian’ means either ancient and modern Egyptian, and ‘X’ runs over all populations in the merged data set. For the plot in Fig. 5b, we ordered all results based on the result using the modern Egyptian samples and show the top hits. For the map plot in Fig. 5a we placed all modern populations on their sampling locations obtained from Lazaridis et al.34, and added selected ancient populations that stood out from the background, as shown in Figure 5b. We then used the ‘qp3pop’ tool to compute f 3 -statistics of the form f 3 (Egyptian; Ancient Egypt, X), where X runs over all populations in the merged data set. Fig. 5c shows a similar plot as in Fig. 5a, but with the colour code indicating the Z score for this latter f 3 -statistics, where a negative Z score indicates a probable source for admixture.

Since two of the three selected samples had contamination rate estimates over 5%, we repeated this analysis using only sample JK2911, which has the highest SNP coverage and a contamination estimate of below 1%. The result is shown in Supplementary Fig. 5, with very similar results as when using all three samples, indicating no effect of contamination on our results.

Sequence-based mitochondrial analysis

In order to detect genetic similarities or distances between our three ancient Egyptian populations (n=90) and present-day populations (see Supplementary Note 4), we collated a data set of Egyptian (n=135) and Ethiopian (n=120) mtDNA sequences from the literature for the respective area in upper Egypt, the El-Hayez oasis30 and Ethiopia17. We calculated genetic distances (F ST ) based on the full mtDNA of these individuals. F ST values were calculated using Arlequin v3.5.2.2 (ref. 28), applying the Tamura and Nei substitution model64 and a respective gamma value of 0.260. To determine the most suitable parameter set and substitution method, we used jModelTest v2.1.10 (ref. 65) and selected the parameters suggested by the Akaike and Bayesian information criterion (AIC and BIC). P values for the calculated F ST values were corrected for multiple comparisons to minimize the probability of type I errors (false positives) using the Benjamini–Hochberg method66, a false discovery rate-based method implemented in the p.adjust function in R 3.2.3 (The R Project for Statistical Computing 2011, https://www.r-project.org/). We split our individuals in three groups (Pre-Ptolemaic, Ptolemaic and Roman Period) based on the 14C dates obtained from the samples (Supplementary Data 1). However, as the intra-group distances of our three ancient populations were not significantly different from each other, we merged all three ancient populations in a single set to perform F ST analyses between modern populations and the ancient meta population with more statistical power than keeping the individual populations separate. Our results can be found in Supplementary Table 2.

Sequence-based mitochondrial analysis: multidimensional scaling (MDS) analysis

To determine the relationships between our ancient samples from the Pre-Ptolemaic, Ptolemaic and Roman time periods in contrast to modern populations in the respective areas, we performed a multi-dimensional scaling (MDS) analysis of the HVR-1 sequences (Supplementary Data 4).

The genetic distances were calculated in Arlequin v3.5.2.2 using the Tamura and Nei substitution model and a gamma shape value of 0.26, determined to be the best setting for the data using jModelTest v2.1.10. We selected the best parameters suggested by the Akaike and Bayesian information criterion (AIC and BIC).

We used the linearized Slatkin’s F ST values67 based on our data set of HVR-1 sequences and visualized the calculated F ST values in a two-dimensional MDS plot with GNU R 3.2.4 using customized R script embedded in the vegan package (Fig. 3c). Our ancient Egyptian samples have been pooled here in order to provide more significant statistical evidence in the analysis, which can be justified due to their relatively small intraspecific differences between our three investigated time periods in the previous F st analysis on their full mitochondrial genomes. The closest populations on the MDS with respect to our ancient meta population (AEGY) are modern populations from Saudi-Arabia, Kuwait, the United Arab Emirates, Yemen and other Near-East populations, whereas the individuals from another ancient population from Turkey (TRO) show more relatedness to modern North-African and populations from the Levantine. For details on the geographic mapping, see Supplementary Note 4.

Sequence-based mitochondrial analysis: effective population size estimation using BEAST

We used the 90 mitochondrial genomes obtained in this study, together with 135 modern Egyptian mtDNA genomes from Pagani and colleagues17 and Kujanova and colleagues30 for Bayesian reconstruction of population size changes through time. We partitioned the alignment using the krmeans algorithm in PartitionFinder2 (ref. 68) with a search through all models available excluding I+G models as it has been argued that gamma-invariable models are not biologically meaningful for data sampled at intraspecies level69. The BIC best-fit partitions (three partitions: 7212, 2367 and 6999 nt, assigned TRN, K81uf+I and TRN+I, respectively, as the best model) were used for BEAST v 1.8.3 analysis32 with unlinked site and clock models and linked tree model. We used averages from the calibrated radiocarbon age ranges for each ancient sample as tip dates for molecular clock calibration. We conducted Bayesian inference using strict clock with an uninformative CTMC reference prior for each partition and Bayesian SkyGrid tree prior with 50 parameters (gamma prior with shape 0.001 and scale 1,000). MCMC chain was run for 300 million steps with sampling every 30,000th step and initial 10% discarded as burn-in. We inspected mixing and convergence in Tracer v 1.6 (ref. 70). Effective sample size for all parameters exceeded 100.

The obtained Bayesian SkyGrid plot indicates a fairly stable slightly decreasing effective population size for the studied population over the last 5,000 years (Fig. 3d and Supplementary Fig. 2). The average median population size over the sampled ancient period, expressed as female effective population size times generation time, was estimated to 1,625,187 (95% HPD 693,670–4,490,725), which assuming generation time of 29 years and equal male and female effective population size rescales to 112,082 (95% HPD 47,839–309,705) individuals (see Supplementary Table 4).

Frequency-based mitochondrial analysis: principal component analysis

We performed a PCA to define relationships between our three ancient Egyptian populations based on their haplogroup compositions and modern, present-day populations from Europe, the Near East, West Asia and Africa. To generate the PCA, we divided our haplogroups in the following 20 groups: H, HV, I, J, K, L0, L1, L2, L3, L4, M1, N, R, R0, T, T1, T2, U, W, X and all remaining other haplogroups (see Supplementary Data 1 for haplogroups). Subsequently, we generated a table of the respective intra-population frequencies. The PCA was performed using the prcomp function for categorical PCA implemented in GNU R 3.2.4 and plotted in a two-dimensional space, displaying the first and second principal components and shown in Fig. 3b.

Frequency-based mitochondrial analysis: test of population continuity (TPC)

Our intent was to determine whether we can detect traces of genetic continuity between our three ancient populations and two comparative modern data sets. The applied method was first used and described by Brandt et al.29. We generated counts of 22 haplogroups determined manually to be most descriptive for our three ancient populations and chose a set of priors for effective population size, generation length and furthermore evaluated further parameters (see Supplementary Note 5). Especially since we are unable to determine a real value of population size during this time period, we relied on historic records for the Fayum oasis and estimated a conservative population size from this (Supplementary Table 4). To even further ensure that these chosen values are not changing our results drastically, we evaluated ranges around these assumptions to test whether our results changed significantly.

Y-chromosomal and phenotypic analysis

We determined the Y chromosomal haplogroups for our three nuclear captured individuals by examining the state of phylogenetic relevant SNPs present in ISOGG version 11.228 (accessed 19 August 2016). The assignment was performed with reads that show a mapping quality of more than 30 only. We derived the haplogroups by identifying the most derived Y chromosomal SNPs in each individual (see Supplementary Table 3 for details).

Our analysis furthermore shows that derived alleles for the genes SLC24A5, known to be responsible for partially lighter skin pigmentation were present in both JK2888 and JK2911 (see Supplementary Note 6 for details). For further genes such as SLC45A2, LCT and EDAR we were unable to find derived alleles for both JK2888 and JK2911. For JK2134, there was no sufficient coverage after quality filtering at all the specific sites, which is why the analysis revealed no further clues.

Data availability

The mapped BAM files for the 90 mitochondrial samples and three nuclear samples are deposited in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) with the study ID ERP017224.