Interpolated contour map of present day lactase persistence

Lactase persistence frequency data was estimated from allele frequency data assuming dominant inheritance and was taken, where possible, from full sequencing of the lactase enhancer to include all 5 published known functional LP variants31 with additional data taken from genotyping of the individual SNPs, where informative, as recorded in the GLAD database http://www.ucl.ac.uk/mace-lab/resources/glad24 but revised and updated to include more recent publications through March 201425. Latitude and longitude of the data points were taken as near as possible to the collection sites where these were known. Where country alone was known these were estimated using major cities. The contour map was constructed in ‘R’ (v.3.1.0, 2014-04-10, “Spring Dance”) using the spatstat package32 and included weighting for sample size. Interpolation smoothing was conducted at the lowest non-overflowing bandwidth (value of sigma) allowable from the heterogeneous data available. Interpolation may be inaccurate where there are few data points and it should be noted that neighboring populations with different ancestry and life-style, in Africa particularly, sometimes have very different allele frequencies.

Samples and MS/MS analysis

Dental samples (n = 98) were obtained from diverse historic human populations in Eurasia, Africa and Greenland dating from the Bronze Age to the present (Supplementary Table S1). Dental calculus was removed using a dental scaler and stored in sterile 2.0 mL tubes until further analysis. Tryptic peptides were extracted from decalcified dental calculus using a filter-aided sample preparation (FASP) protocol modified for degraded samples33 according to previously published protocols18. The extracted peptides were then analyzed using shotgun protein tandem mass spectrometry (MS/MS) to detect the presence of β-lactoglobulin. MS/MS analysis of samples were performed at three independent laboratories in Switzerland, the UK and Denmark:

Functional Genomics Center Zurich at the University of Zurich and Swiss Federal Institute of Technology

Samples from Greenland and Germany (Z1, Z2, Z27, Z46 and Z28) were analyzed by tandem mass spectrometry at the Functional Genomics Centre Zürich (FGCZ) using an LTQ-Orbitrap VELOS mass spectrometer (Thermo Fischer Scientific, Bremen, Germany) coupled to an Eksigent-NanoLC-Ultra 1D plus HPLC system (Eksigent Technologies, Dublin (CA), USA). Solvent composition at the two channels was 0.2% formic acid, 1% acetonitrile for channel A and 0.2% formic acid, 100% acetonitrile for channel B. Peptides were loaded on a self-made tip column (75 μm × 80 mm) packed with reverse phase C18 material (AQ, 3 μm 200 Å, Bischoff GmbH, Leonberg, Germany) and eluted with a flow rate of 250 nl per min by a gradient from 0.8% to 4.8% of B in 2 min, 35% B at 57 min, 48% B at 60 min, 97% at 65 min. Full-scan MS spectra (300−1700 m/z) were acquired in the Orbitrap with a resolution of 30000 at 400 m/z after accumulation to a target value of 1,000,000. Higher energy collision induced dissociation (HCD) MS/MS spectra were recorded in data dependent manner in the Orbitrap with a resolution of 7500 at 400 m/z after accumulation to a target value of 100, 000. Precursors were isolated from the ten most intense signals above a threshold of 500 arbitrary units with an isolation window of 2 Da. Three collision energy steps were applied with a step width of 15.0% around a normalized collision energy of 40% and an activation time of 0.1 ms. Charge state screening was enabled excluding non-charge state assigned and singly charged ions from MS/MS experiments. Precursor masses already selected for MS/MS were excluded for further selection for 45 s with an exclusion window of 20 ppm. The size of the exclusion list was set to a maximum of 500 entries.

Proteomics Discovery Institute at the University of Oxford

Samples from Britain, Germany (Y47, Y48 and Y49), St Helena and Italy (ODN19-1, ODN98-1, ODN207-1, ODN271-1, ODN361-1, ODN424-1, ODN458-1, SCR227-1, SCR250-1, SCR264-1, SCR323-1, SCR832-1, SCR5082-1, SCR5042-1, SCR5070-1) were analyzed by tandem mass spectrometry at the Central Proteomics Facility, Target Discovery Institute, Oxford on Q-Exactive and Orbitrap Elite tandem mass spectrometers.

Q-Exactive analysis was performed after UPLC separation on an EASY-Spray column (50 cm × 75 μm ID, PepMap RSLC C18, 2 μm) connected to a Dionex Ultimate 3000 nUPLC (all Thermo Scientific) using a gradient of 2–40% Acetonitrile in 0.1% Formic Acid and a flow rate of 250 nl/min @40°C. MS spectra were acquired at a resolution of 70000 @200 m/z using an ion target of 3E6 between 380 and 1800 m/z. MS/MS spectra of up to f15 precursor masses at a signal threshold of 1E5 counts and a dynamic exclusion for 7 seconds were acquired at a resolution of 17500 using an ion target of 1E5 and a maximal injection time of 50 ms. Precursor masses were isolated with an isolation window of 1.6 Da and fragmented with 28% normalized collision energy.

Orbitrap Elite analysis was performed under similar LC conditions as above using a nAcquity UPLC (1.7 um BEH130 C18, 75 um × 250 mm). MS spectra were acquired at a resolution of 120000 @ 400m/z using an ion target of 5E5 between 300 and 1800 m/z. MS/MS spectra of up to 200 precursor masses at a signal threshold of 1000 counts and a dynamic exclusion for 15 seconds were acquired in the linear ion trap using rapid scan and an ion target of 5E4. Precursor masses were isolated with an isolation window of 1.5 Da and fragmented with 35% normalized collision energy.

Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen

Samples from Norway, Denmark, Hungary, Germany (RISE 472 and RISE 473), Italy (RISE 466 and RISE 467), Armenia and Russia were analyzed by tandem mass spectrometry at the Novo Nordisk Foundation Center for Protein Research at the University of Copenhagen, Denmark using a Q-Exactive mass spectrometer. The LC-MS system consisted of an EASY-nLC™system (Thermo Scientific, Odense, Denmark) connected to the Q-Exactive (Thermo Scientific, Bremen, Germany) through a nano electrospray ion source. 5 uL of each peptide sample was auto-sampled onto and directly separated in a 15 cm analytical column (75 μm inner diameter) in-house packed with 3 μm C18 beads (Reprosil-AQ Pur, Dr. Maisch) with a 130 minute linear gradient from 5% to 25% acetonitrile followed by a steeper linear 20 minute gradient from 25% to 40% acetonitrile. Throughout the gradients a fixed concentration of 0.5% acetic acid and a flow rate of 250 nL/min were set. A final washout and column re-equilibration added an additional 20 minutes to each acquisition. The effluent from the HPLC was directly electrosprayed into the mass spectrometer by applying 2.0 kV through a platinum-based liquid-junction. The Q-Exactive was operated in data-dependent mode to automatically switch between full scan MS and MS/MS acquisition. Software control was Tune version 2.2–1646 and Excalibur version 2.2.42 and the settings were adjusted for ‘sensitive’ acquisition. Briefly, each full scan MS was followed by up to 10 MS/MS events. The isolation window was set at 2.5 Th and a dynamic exclusion of 90 seconds was used to avoid repeated sequencing. Only precursor charge states above 1 and below 6 were considered for fragmentation. A minimum intensity threshold for triggering fragment MS/MS was set at 1e5. Full scan MS were recorded at resolution of 70,000 at m/z 200 in a mass range of 300–1750 m/z with a target value of 1e6 and a maximum injection time of 30 ms. Fragment MS/MS were recorded with a fixed ion injection time set to 108 ms through a target value set to 2e5 and recorded at a resolution of 35,000 with a fixed first mass set to 100 m/z. Normalized collision energy was 25%.

Data analysis

Raw MS/MS spectra were converted to searchable Mascot generic format using Proteowizard version 3.0.4743 using the 200 most intense peaks in each MS/MS spectrum. MS/MS ion database searching was performed on Mascot (Matrix Science™, version 2.4.01), against all available sequences in UniProt and the Human Oral Microbiome Database (HOMD)34. Searches were performed against a decoy database to generate false discovery rates. Peptide tolerance was 10 ppm and with a semi-tryptic search with up to two missed cleavages. MS/MS ion tolerance was set to 0.07 Da. Based on previous observations of ancient proteome degradation35, we set post-translational modifications were as carbamidomethylation (fixed modification) and acetyl (protein N-term), deamidated (NQ), glutamine to pyroglutamate, methionine oxidation and hydroxylation of proline (variable modifications). Mascot search results were filtered using an ion score cut-off of 25 and significance threshold of p < 0.05. BLAST was used to verify matches to β-lactoglobulin and taxonomic assignment is reported based on the consensus peptide assignments for each individual. The three-dimensional structure of bovine β-lactoglobulin protein, rendered from PDB 3NPO using Visual Molecular Dynamics software33 VMD v.1.9.1, http://www.ks.uiuc.edu/Research/vmd/current/.

Contamination exclusion

It is important to monitor and test for contamination because bovine proteins are used in some proteomics laboratories as instrument standards (e.g., bovine fetuin), among other purposes. In order to exclude such contaminants as the source of the BLG peptides in dental calculus, negative extraction controls, bovine fetuin protein standards and isopropanol wash steps were analyzed with the experimental samples in parallel. No BLG peptides were observed in any non-template negative extraction controls (n = 12), bovine fetuin standards (n = 21), or isopropanol wash steps (n = 9).

BLG protein modeling

The three-dimensional structure of bovine β-lactoglobulin protein was rendered from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) accession 3NPO (unliganded, DOI:10.2210/pdb3npo/pdb) using VMD v.1.9.136. The mapped locations of all BLG peptide sequences identified by tandem mass spectrometry within archaeological dental calculus were then visualized in red, while unmapped regions were visualized in white.

Bone collagen stable isotope analysis

Bone collagen from Tjodhilde's Church individuals KAL1052 and KAL1064 was prepared for stable isotope δ13C and δ15N analysis as previously described37. Duplicate collagen specimens (1 mg) were measured using a Sercon 20–22 Isotope Ratio Mass Spectrometer coupled to a Sercon GSL Elemental Analyser in the Department of Archaeology at the University of York. The results for KAL1052 are as follows: δ13C, −18.7352, −18.5608; δ15N, 12.8049, 12.8799; C/N, 3.31. The results for KAL1064 are as follows: δ13C, −19.1342, −19.1654; δ15N, 13.1990, 13.3332; C/N, 3.51. Carbon isotopic values are reported relative to Pee Dee Belemnite (PDB); nitrogen isotopic values are reported relative to AIR. Mean bone collagen δ13C and δ15N for each sample are presented in Figure 3. Stable isotopic data for the Sandnes samples and the remaining Tjodhilde's Church samples were obtained from the literature22,23.