Custom software mentioned here is publically available on www.github.com/stschiff/sequenceTools and www.github.com/stschiff/rarecoal.

DNA extraction

Samples were first treated with UV-light (260 nm) for 20–30 min, and the surface was cleaned with bleach (3.5%) and isopropanol. The sample surface was mechanically removed using a Dremel drill and disposable abrasive discs. Samples were ground to fine powder using a Mikrodismembrator (Sartorius) and stored at 4°C until further use. DNA was extracted in clean room facilities in Adelaide using an in-solution silica-based protocol31.

Library preparation

Libraries were generated from the Hinxton individuals (n=6) with32 and without enzymatic damage repair (Supplementary Table 1), whereas partial damage repair33 was performed for the Linton (n=3) and Oakington (n=14) samples. All 29 libraries were prepared with truncated barcoded Illumina adaptors and amplified with full-length indexed adaptors for sequencing34. Protocols evolved over the course of the study with regards to the final library amplification steps. Hinxton DNA libraries were amplified by PCR in quintuplicates for an initial 13 cycles (AmpliTaq Gold, Life Technologies), followed by pooling and purification of the PCR replicates with the Agencourt AMPure XP system. DNA libraries were then re-amplified for another 13 cycles in quintuplicates or sextuplicates, followed by pooling and purification, visual inspection on a 3.5% agarose gel, and final quantification using a NanoDrop 2000c spectrophotometer (FisherScientific). The Oakington and Linton DNA libraries were amplified using isothermal amplifications using the commercial TwistAmp Basic kit (TwistDx Ltd). The amplification followed the manufacturer’s recommendations and used 13.4 μl of libraries after the Bst fill-in step, and an incubation time of the isothermal reaction of 40 min at 37 °C, followed by gel electrophoresis and quantification using a Nanodrop spectrophotometer. Following quantification, libraries were re-amplified for seven cycles using full-length 7-mer indexed Illumina primers as described34, followed by purification with Ampure and quantification using a TapeStation (Agilent).

Library screening

The 23 libraries treated with damage repair were screened for complexity and endogenous DNA on an Illumina MiSeq platform in Harvard in collaboration with David Reich (Supplementary Table 1). When the project started, we had available only the samples from Hinxton, and since all of them had high complexity and high amounts of endogenous DNA (except 12882A, which did not pass screening), we selected all five samples for deep sequencing. We then expanded the project to the other two sites, from which we screened many more samples than we could sequence deeply, so we selected the best four samples (with highest complexity and endogenous DNA) from Oakington and the best from Linton (from which we had fewer samples, and there was only one sample with acceptable complexity for deep sequencing).

Deep sequencing

We first sequenced the five DNA libraries generated from the Hinxton samples in two batches. The first batch consisted of 10 lanes of 75 bp paired end sequencing on an Illumina HiSeq 2500 platform, run in rapid mode. All five samples were multiplexed in this batch. The resulting data was processed (see below) and used to estimate complexity and endogenous DNA to decide further sequencing. The second batch consisted of 42 lanes with similar settings as the first batch, but not multiplexed. Based on the complexity and endogenous DNA estimates, we sequenced sample HI1 and HS3 on 4 lanes each, samples HS1 and HS2 on 8 lanes each and sample HI2 on 16 lanes. In the second batch, we introduced five dark cycles into read 1 to avoid low-complexity issues due to the clean room tags in the library preparation. We also included 5% Phi X sequences to increase the complexity of the first five base pairs of read 2, a common procedure for low-complexity libraries. In case of the samples from Oakington and Linton, we used the protocol used in batch 2 of the Hinxton samples (including dark cycles). We sequenced samples O2 and L on 4 lanes each, sample O4 on 6 lanes, sample O1 on 8 lanes and sample O3 on 10 lanes.

Raw read processing

We filtered out all read pairs that did not carry the correct clean room tags in the first five base pairs of read 2. In case of batch 1 of the Hinxton samples, we also sequenced the clean room tag on read 1, which we also filtered on in these cases. As a second step, we merged all reads searching for a perfect or near perfect overlap allowing at most 1 mismatch between read 1 and the reverse complement of read 2. The merging also took advantage of the fact that we typically had fragments of length 50 pb, which means that many of the 75 bp reads contained the reverse complement of the clean room tag of the other read, and the Illumina adaptors. As a last step, we removed the clean room tags and the adaptors from both ends of the merged reads. Both merging and adaptor trimming was done using a custom programme called filterTrimFastq, available on http://www.github.com/stschiff/sequenceTools.

Alignment

After merging, we ended up with single reads with variable length (on average about 50 bp) for each sample. We aligned those single reads with the programme ‘bwa aln’35 to the human reference, version GRCh37 using the parameter ‘-l 1024’ to turn-off seeding36. The alignment was done on a per-lane basis, all alignments were then sorted using ‘samtools sort’. For each individual, we then merged the sorted alignments into a single bam file per individual, using ‘samtools merge’. We then removed duplicate reads in each alignments using our custom python script ‘samMarkDuplicates.py’, available also on github. The script checks whether neighbouring reads in the sorted alignments are equal, and removes all but one read if it finds duplicates. Finally, we removed all unmapped reads from the alignments. Despite enzymatic damage repair, some low levels of DNA damage can still be found in the libraries. We used the programme ‘mapdamage2’ (ref. 37) to measure DNA degradation. For each individual, we first ran mapDamage on chromosome 20 to estimate the degradation profile. For all individuals, the DNA damage profile was found to have an excess of C->T changes at the 5′ end of reads, as expected for ancient DNA, and an excess of G->A changes was found at the 3′ end. However, because the sequencing libraries were treated with UDG, which removes damaged sites in reads, the excess was much lower than in comparable studies without UDG treatment37.

Mitochondrial and Y chromosome analysis

We called mtDNA and Y chromosome consensus sequences using samtools. Haplogroups were handcurated using public databases (Supplementary Note 2).

Contamination estimates

We estimated possible modern DNA contamination in all ancient samples using two methods. First, we tested for evidence for contaminant mitochondrial DNA38. We looked for sites in the mitochondrial genome, at which the ancient sample carried a consensus allele that was rare in the 1,000 Genomes reference panel. We then looked whether there were reads at these sites that carried the majority allele from 1,000 Genomes (Supplementary Note 2). Second, we used the programme ‘verifyBamId’39 to carry out a similar test in the nuclear genome, again using the 1,000 Genomes reference panel. Contamination estimates are summarized in Supplementary Table 3.

Principal component analysis

We downloaded the Human Origins Data set13,14 and called genotypes at all sites in this data set for all ancient samples using a similar calling method as described in ref. 14: Of all high-quality reads covering a site, we picked the allele that is supported by the majority of reads, requiring at least two reads supporting the majority allele, otherwise we call a missing genotype. If multiple alleles had the same number of supporting reads, we picked one at random. Principal component analysis was performed using the ‘smartpca‘ programme from EIGENSOFT (ref. 40), by using only the modern samples for defining the principal components and projecting the 10 ancient samples onto these components (Supplementary Fig. 3).

Rare allele sharing analysis

We compiled a reference panel consisting of 433 individuals from Finland (n=99), Spain (n=107), Italy (n=107), Netherlands (n=100) and Denmark (n=20). The Finnish, Spanish and Italian samples are from the 1,000 Genomes Project (phase 3)16, the Dutch samples from the GoNL project17 and the Danish samples from the GenomeDK project18. For the Dutch and Danish samples, only allele frequency data was available. In case of the Dutch data set, we downsampled the full data set to obtain the equivalent of 100 samples. All other reference sample variant calls were used as provided by the 1,000 Genomes Project. In addition, we filtered based on a mappability mask41,42 that is available from www.github.com/stschiff/msmc. We selected all variants up to allele count nine in this reference set and tested for each ancient individual and each of those sites whether the ancient individual carried the rare allele. We called a rare variant (always assumed heterozygous) in the ancient sample if at least two reads supported the rare allele from the reference set. While this calling method will inevitably miss variants in low coverage individuals, the relative numbers of shared alleles with different populations is unbiased.

We accumulated the total number of alleles shared between each ancient sample and each modern reference population, and stratified by allele count in the reference population, up to allele count nine (Supplementary Data 1). We found that sharing with the Dutch and the Spanish population showed the largest variability across the ancient samples. For the plot in Fig. 2a, we divided the sharing count with the Dutch population by the sharing count of the Spanish population for each allele frequency. To plot curves from the Dutch and the Spanish population itself, we sampled haploid individuals from each population by sampling with replacement at every variant site in the reference set. This was necessary because for the Dutch samples no genotype information was publically available, only allele frequency data (Supplementary Note 3).

For the 30 UK10K samples shown in Fig. 2a,b, we started from the read alignment for each individual and called rare variants with respect to the 433 reference individuals in exactly the same way as we did for the ancient samples. For Fig. 2a, the allele sharing counts were then accumulated across the 10 individuals in each group. Error bars for each allele sharing count are based on the square root of each count. For Fig. 2b we added the allele sharing counts between each ancient sample and each reference population up to allele count five, and computed the ratio NED/(NED+IBS), where NED is the sharing count with Dutch, and IBS the sharing count with Spanish (Supplementary Note 3). For the mean and variances shown in Fig. 2b, we excluded outliers as indicated in the caption of the figure. The fraction of Anglo-Saxon derived ancestry is computed for each modern UK10K sample as the relative distance of its relative sharing ratio from the Iron Age mean value compared with the Saxon era mean value, as shown in Fig. 2b, with 0% corresponding to the Iron Age mean, and 100% corresponding to the Anglo-Saxon era mean (Supplementary Note 3, Supplementary Table 4).

Rarecoal analysis

Rarecoal is a new framework to calculate the joint allele frequency spectrum across multiple populations using rare alleles. Given a certain distribution of rare derived alleles across subpopulations (here up to allele count four), and a given number of non-derived alleles, which can be arbitrarily large, we calculate the total probability of that configuration under a demographic model. The model consists of a population tree with constant population sizes in each branch of the tree and split times. To give rise to the data observed in the present, the lineages of the derived alleles must coalesce among each other before they coalesce to any non-derived lineage. We introduce a state space that contains all possible configurations of derived lineages across populations and propagate a probability distribution over this space back in time. Details and mathematical derivations are given in Supplementary Note 6.

We implemented rarecoal in a software package (available from www.github.com/stschiff/rarecoal) that can learn the parameters of a given population tree topology from the data using numerical maximization of the likelihood and subsequent Markov Chain Monte Carlo to get posterior distributions for each split time and branch population size. We did not implement an automated way to learn the tree topology itself, but use a step by step protocol to learn the best topology fitting the data, adding one population at a time (Supplementary Note 5). The outputs from rarecoal are in scaled time. To convert to real time (years) and real population sizes, we used a per-generation mutation rate of 1.25 × 10−8 and a generation time of 29 years.

We tested the method on simulated data using the sequential coalescent with recombination model (SCRM) simulator43 with the model shown in Fig. 3b with 1,000 haploid samples distributed evenly across the five populations and realistic recombination and mutation parameters. We then learned the model from the European data set as shown in Fig. 3c using an iterative protocol, adding one population at a time and maximizing parameters subsequently to ensure that we are still fitting the right topology (Supplementary Note 5).

For mapping ancient samples on the tree we used the same calling method as in the rare allele sharing analysis. We then added the ancient individual as a separate seventh population to the European tree and evaluated the likelihood for this external branch to merge anywhere on the tree. We restricted the fitting to alleles that were shared with the ancient sample and excluded private variants in the ancient sample, which have high false-positive rates. We also made sure that the age of the ancient sample was correctly modelled into the joint seven-population tree, by ‘freezing’ the state probabilities from the present up to the point where the ancient sample lived.

For testing the tree-colouring method, we used single individuals from within the reference set and used them as separate sample to be mapped onto the European tree. (Supplementary Note 5).