The genetic analysis of Himalayan populations (described in Supplementary Note 7) was approved by the Institutional Ethical Committee of the Centre for Cellular and Molecular Biology in Hyderabad, India.

Ancient DNA laboratory Work

A total of 76 skeletal samples (72 long bones and four teeth) were sampled at the Anthropological Survey of India, Kolkata. Skeletal sampling was performed for all samples in dedicated ancient DNA facilities at the Centre for Cellular and Molecular Biology (CCMB) in Hyderabad, India. A subset of samples that underwent preliminary ancient DNA screening at CCMB, including three samples that did not yield sufficient data to assign mitochondrial DNA haplogroups during preliminary screening (see Supplementary Note 1), were further processed at Harvard Medical School, Boston, USA, consistent with recommendations in the ancient DNA literature for repeating analyses in two independent laboratories to increase confidence in results22.

At CCMB, samples were prepared for processing by wiping with a bleach solution, followed by deionized water. The samples were then subjected to UV irradiation for 30 min on each side to minimize surface DNA contamination. Bone powder was then produced using a sterile dentistry drill.

We successfully generated genome-wide DNA for 38 individuals (Supplementary Data 1). For each sample, approximately 75 mg of bone powder originally prepared at CCMB was further processed in dedicated ancient DNA clean rooms at Harvard Medical School using standard protocols, including DNA extraction optimized for ancient DNA recovery23, modified by replacing the Zymo extender/MinElute column assemblage with a preassembled spin column device24, followed by library preparation with partial UDG treatment25. The quality of authentic ancient DNA preservation in each sample was assessed by carrying out a preliminary screening of all libraries via targeted DNA enrichment, designed to capture mitochondrial DNA in addition to 50 nuclear targets26. We sequenced the enriched libraries on an Illumina NextSeq500 instrument for 2 × 76 cycles with an additional 2 × 7 cycles for identification of indices. Based on this preliminary assessment, libraries that were deemed promising underwent a further enrichment using a reagent that targeted ~1.2 million SNPs6,7,8,9, and then were sequenced using an Illumina NextSeq500 instrument.

Bioinformatic processing

We used SeqPrep to trim adapters and molecular barcodes, and then merged paired-end reads that overlapped by a minimum of 15 base pairs (with up to one mismatch allowed) and aligned to the mitochondrial rsrs genome27 (for the mitochondrial screening analysis) or hg19 (for whole-genome analysis) using samse in bwa (v0.6.1)28. We identified duplicate sequences based on having the same start position, end position, orientation, and library-specific barcode, and only retained the copy with the highest quality sequence. We restricted to sequences with a minimum mapping quality (MAPQ ≥ 10) and minimum base quality (≥20) after excluding two bases from each end of the sequence. We obtained pseudo-haploid SNP calls by using a single randomly chosen sequence at SNPs covered by at least one sequence.

We subjected the resulting data to three tests of ancient DNA authenticity: (1) we analyzed the mitochondrial genome data to determine the rate of matching to the consensus sequence using contamMix, and excluded from analysis samples that exhibited a match rate less than 97%8. (2) We removed samples that exhibited a rate of C-to-T substitutions less than 3%: the minimum recommended threshold for authentic ancient DNA that has been subjected to partial UDG treatment25. (3) We used ANGSD29 to determine the degree of heterogeneity on the X-chromosome in males (who should only have one X chromosome) and excluded from analysis individuals with contamination rates greater than 1.5%.

We determined the mitochondrial haplogroup of each individual in two ways. For individuals with whole mitochondrial genome data, we determined the mitochondrial haplogroups using haplogrep230. We also determined mitochondrial haplogroups from mitochondrial DNA genotyping using multiplex PCR (see Supplementary Note 1).

We determined the genetic sex of the individuals by computing the ratio of the number of sequences that align to the X chromosome versus the Y chromosome. We searched for 1st, 2nd, and 3rd degree relative pairs in the dataset by analyzing patterns of allele sharing between pairs of individuals (we found none)10.

To identify Y-chromosome haplogroups in genetically male individuals, we used a modified version of the procedure reported in Poznik, et al.31, which performs a breadth-first search of the Y-chromosome tree. We made Y chromosome haplogroup calls using the ISOGG tree from 04.01.2016 [http://isogg.org], and recorded the derived and ancestral allele calls for each informative position on the tree. We counted the number of mismatches in the observed derived alleles on each branch of the tree and used this information to assign a score to each haplogroup, accounting for damage by down-weighting derived mutations that are the result of transitions to 1/3 of that of transversions. We assigned the closest matching Y-chromosome reference haplogroup to each male based on this score (Supplementary Data 6). We caution that males with fewer than 100,000 SNPs have too little data to confidently assign a haplogroup.

Population genetic analyses

We report data for 38 samples that passed contamination and quality control tests, with an average coverage of 0.51 × [range: 0.026–1.547] and 350088 SNPS covered at least once [range 30592–728448]. We processed the data in conjunction with published DNA obtained from ancient6,9,13,14,15,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61 and present-day groups from throughout the world62,63,64,65,66,67,68, including ~175 modern groups from the Indian subcontinent12. The resulting merged dataset included 1521 ancient and 7985 present-day individuals at 591,304 SNPs.

We used smartpca69 to perform principal component analysis (PCA) using default parameters, with the settings lsqproject:YES and numoutlier:0. We projected the Roopkund individuals onto two PCA plots designed either to reveal a cline of West Eurasian-related ancestry in South Asian populations18, or to reveal the genetic substructure in present-day West Eurasians13. The first PCA (Fig. 2a) included 1453 present-day populations12 in addition to the Roopkund individuals, while the second PCA (Fig. 2b) included 986 present-day populations13, in addition to the Roopkund individuals and two individuals from present-day Crete (population label Crete.DG). The PCA plots show that the samples cluster into three distinct groups, which we label Roopkund_A, Roopkund_B and Roopkund_C, and treat separately for subsequent analyses.

We used smartpca69 to compute F ST between the two major Roopkund groups (Roopkund_A and Roopkund_B) and all other groups composed of at least 2 individuals in the dataset, using default parameters, with the settings inbreed:YES and fstonly:YES.

We performed clustering using ADMIXTURE16. We carried out this analysis on all samples used for the PCA analyses, although we display only selected populations for the sake of clarity. Prior to analysis, SNPs in linkage disequilibrium with one another were pruned in PLINK using the parameters–indep-pairwise 200 25 0.4. We performed an ADMIXTURE analysis on the remaining 344,363 SNPs in the pruned dataset for values of k between 2 and 10, and carried out 20 replicates at each value of k. We retained the highest likelihood replicate at each k and displayed results for k (k = 4), which we chose because we observed that it is most visually helpful for discriminating the ancestry of the groups of interest.

We used qpWave18,19, with default parameters and allsnps:YES, to determine if any of the Roopkund populations was consistent with being a clade with any present-day populations. We included a base set of nine populations in each test, chosen to represent diverse ancestry from throughout the world. We include an additional 5–15 populations of either South Asian, West Eurasian, or Southeast/East Asian ancestry in tests involving Roopkund_A, Roopkund_B and Roopkund_C respectively, chosen to provide additional resolution for each group based on their position in the previous PCA. Based on the observed genetic heterogeneity in the Roopkund_A population, we modeled each individual separately (Supplementary Note 6). For each test, the Left population set included the Roopkund population or individual of interest in addition to one of the selected present-day analysis populations, while the remaining populations were included in the Right population set. In the case of individuals belonging to the Roopkund_A and Roopkund_C groups, we also used qpAdm7, with default parameters and allsnps: YES, to determine whether these populations could be considered to be the product of a two-way admixture between any of the selected present-day populations (Supplementary Note 6). In this case, the Left population set included the Roopkund individual of interest in addition to all possible combinations of two of the selected present-day analysis populations, while the remaining populations were included in the Right population set.

AMS radiocarbon dating

We subjected bone powder from 37 samples to radiocarbon dating. We dated the remaining bone powder (360–750 mg) from the same samples that were processed for ancient DNA. We were unable to generate a radiocarbon date for individual I3401, as there was not enough remaining bone powder for analysis.

At the Pennsylvania State University AMS radiocarbon dating facility, bone collagen for 14C and stable isotope analyses was extracted and purified using a modified Longin method with ultrafiltration70. Samples (200–400 mg) were demineralized for 24–36 h in 0.5 N HCl at 5 °C followed by a brief (<1 h) alkali bath in 0.1 N NaOH at room temperature to remove humates. The residue was rinsed to neutrality in multiple changes of Nanopure H 2 O, and then gelatinized for 12 h at 60 °C in 0.01 N HCl. The resulting gelatin was lyophilized and weighed to determine percent yield as a first evaluation of the degree of bone collagen preservation. Rehydrated gelatin solution was pipetted into pre-cleaned Centriprep71 ultrafilters (retaining >30 kDa molecular weight gelatin) and centrifuged 3 times for 20 min, diluted with Nanopure H 2 O and centrifuged 3 more times for 20 min to desalt the solution.

In some instances, collagen samples were too poorly preserved and were pre-treated at Penn State using a modified XAD process72 (Supplementary Data 4 shows that there were no systematic differences in the dates obtained based on the XAD and modified Longin pretreatment extraction methods.) Samples were demineralized in 0.5 N HCl for 2–3 days at 5 °C. The demineralized collagen pseudomorph was gelatinized at 60 °C in 1–2 mL 0.01 N HCl for 8–10 h. The gelatin was then lyophilized and percent gelatinization and yield determined by weight. The sample gelatin was then hydrolyzed in 2 mL 6 N HCl for 24 h at 110 °C. Supelco ENVI-Chrom® SPE (Solid Phase Extraction; Sigma-Aldrich) columns were prepped with 2 washes of methanol (2 mL) and rinsed with 10 mL DI H 2 O. Supelco ENVIChrom® SPE (Solid Phase Extraction; Sigma-Aldrich) columns with 0.45 µm Millex Durapore filters attached were equilibrated with 50 mL 6 N HCl and the washings discarded. 2 mL collagen hydrolyzate as HCl was pipetted onto the SPE column and driven with an additional 10 mL 6 N HCl dropwise with the syringe into a 20 mm culture tube. The hydrolyzate was finally dried into a viscous syrup by passing UHP N 2 gas over the sample heated at 50 °C for ~12 h.

For all bone samples that were subject to radiocarbon dating, carbon and nitrogen concentrations and stable isotope ratios of the ultrafiltered gelatin or XAD amino acid hydrolyzate were measured at the Yale Analytical and Stable Isotope Center with a Costech elemental analyzer (ECS 4010) and Thermo DeltaPlus analyzer. Sample quality was evaluated by percentage crude gelatin yield, %C, %N, and C/N ratios before AMS 14C dating. C/N ratios for all samples fell between 2.9 and 3.6, indicating good collagen preservation73. Samples (~2.1 mg) were then combusted for 3 h at 900 °C in vacuum-sealed quartz tubes with CuO and Ag wires. Sample CO 2 was reduced to graphite at 550 °C using H 2 and a Fe catalyst, with reaction water drawn off with Mg(ClO 4 ) 2 74.

Graphite samples were pressed into targets in Al boats and loaded on a target wheel with OX-1 (oxalic acid) standards, known-age bone secondaries, and a 14C-free Pleistocene whale blank. 14C measurements were performed at UCIAMS on a modified National Electronics Corporation compact spectrometer with a 0.5 MV accelerator (NEC 1.5SDH-1). The 14C ages were corrected for mass-dependent fractionation with δ13C values75 and compared with samples of Pleistocene whale bone (backgrounds, 48,000 14C BP), late Holocene bison bone (~1850 14C BP), late 1800s CE cow bone and OX-2 oxalic acid standards for calibration. All calibrated 14C ages were computed using OxCal version 4.376 using the IntCal13 northern hemisphere curve77.

Stable isotope measurements

The isotopic measurement procedure at Yale University for the 37 samples for which we performed direct radiocarbon dating are described in the previous section.

We also obtained isotopic measurements for long bone samples from 19 individuals (including data from 11 of the same individuals that were also analyzed at Yale) at the Max Planck Institute for the Science of Human History. Bone samples of 1 g were subsequently cleaned using an air abrasive system with 5 μm aluminum oxide powder and then crushed into chunks. Collagen was extracted following standard procedures78. Approximately 1 g of pre-cleaned bone was demineralized in 10 mL aliquots of 0.5 M HCl at 4 °C, with changes of acid until CO 2 stopped evolving. The residue was then rinsed three times in deionized water before being gelatinized in pH 3 HCl at 80 °C for 48 h. The resulting solution was filtered, with the supernatant then freeze-dried over a period of 24 h.

Purified collagen samples (1 mg) were analyzed at the Department of Archaeology, Max Planck Institute for the Science of Human History, in duplicate by EA-IRMS on a ThermoFisher Elemental Analyzer coupled to a ThermoFisher Delta V Advantage Mass Spectrometer via a ConFloIV system. Accuracy was determined by measurements of international standard reference materials within each analytical run. These were USGS 40,40 δ13C raw = −26.4 ± 0.1, δ13C true = −26.4 ± 0.0, δ15N raw = −4.4 ± 0.1, δ15N true = −4.5 ± 0.2; IAEA N2, δ15N raw = 20.2 ± 0.1, δ15N true = 20.3 ± 0.2; IAEA C6 δ13C raw = -10.9 ± 0.1, δ13C true = −10.8 ± 0.0. An in-house fish gelatin sample was also used as a standard in each run. Reported δ13C values were measured against Vienna Pee Dee Belemnite (VPDB), while δ15N values are measured against ambient air.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.