Significance Modern genetic analysis has revealed genetic differentiation across the south of Britain and Ireland. This structure demonstrates the impact of hegemonies and migrations from the histories of Britain and Ireland. How this structure compares to the north of Britain, Scotland, and its surrounding Isles is less clear. We present genomic analysis of 2,544 British and Irish, including previously unstudied Scottish, Shetlandic and Manx individuals. We demonstrate widespread structure across Scotland that echoes past kingdoms, and quantify the considerable structure that is found on its surrounding isles. Furthermore, we show the extent of Norse Viking ancestry across northern Britain and estimate a region of origin for ancient Gaelic Icelanders.

Abstract Britain and Ireland are known to show population genetic structure; however, large swathes of Scotland, in particular, have yet to be described. Delineating the structure and ancestry of these populations will allow variant discovery efforts to focus efficiently on areas not represented in existing cohorts. Thus, we assembled genotype data for 2,554 individuals from across the entire archipelago with geographically restricted ancestry, and performed population structure analyses and comparisons to ancient DNA. Extensive geographic structuring is revealed, from broad scales such as a NE to SW divide in mainland Scotland, through to the finest scale observed to date: across 3 km in the Northern Isles. Many genetic boundaries are consistent with Dark Age kingdoms of Gaels, Picts, Britons, and Norse. Populations in the Hebrides, the Highlands, Argyll, Donegal, and the Isle of Man show characteristics of isolation. We document a pole of Norwegian ancestry in the north of the archipelago (reaching 23 to 28% in Shetland) which complements previously described poles of Germanic ancestry in the east, and “Celtic” to the west. This modern genetic structure suggests a northwestern British or Irish source population for the ancient Gaels that contributed to the founding of Iceland. As rarer variants, often with larger effect sizes, become the focus of complex trait genetics, more diverse rural cohorts may be required to optimize discoveries in British and Irish populations and their considerable global diaspora.

First documented in the fourth century BCE by the Greek explorer Pytheas, who describes its 3 corners (1), the archipelago of islands that includes Great Britain and Ireland has experienced an extensive history of migrations and invasions. After the initial Paleolithic settlement, there was migration of agriculturists around 4000–3000 BCE (2, 3), and then a population turnover associated with bronze and copper working and the Bell Beaker material culture (1, 3). With this establishment of the “Insular Atlantic” gene pool (2), subsequent migrations have influenced but not replaced the underlying haplotype diversity. The Anglo-Saxon invasions between 400 CE and 650 CE, for example, are associated with a higher German-related ancestry in the south of England (4, 5), and the Norse Viking incursions from the eighth to 11th centuries are associated with an increase of Norwegian-related ancestry into both Orkney (4, 6, 7) and Ireland (8, 9). In addition to these migrations, the northeast of Ireland also experienced admixture from Scottish and English sources that dates primarily to the Ulster Plantations of the 17th century (8, 9).

Previous genome-wide investigations of British (4) and Irish (8, 9) population genetics have undersampled Scotland and neighboring regions relative to England, Wales, and Ireland. Addressing this, we sought to combine samples from multiple cohorts in order to capture the majority of British and Irish diversity, including previously understudied regions, e.g., Scotland, the Hebrides, Shetland, and the Isle of Man. We combine data from both previously published sources and genotypes from Shetland, the Isle of Man, and the western coasts of Scotland, and analyze this comprehensive sample.

Using this sample, we sought to ask 3 questions: First, what is the fine-scale genetic structure of Scotland and its surrounding Isles? Second, using modern samples from Scandinavia, what is the Norse Viking contribution to these populations? Third, with the advent of ancient sampling of the Viking Period Gaelic settlers of Iceland, is it possible to trace their origins back to regions within Britain and Ireland?

Discussion Pytheas of Massalia’s famous work, On the Ocean, ca. 325 BCE, describes the 3 corners of Britain: Kantion (Kent), Belerion (Cornwall), and Orkas (Orkney). Previous sampling of the British Isles followed this lead with the 1000 Genomes Project British in England and Scotland (GBR) sampling these exact areas. With extensive representation across the archipelago, however, we describe the full extent of the northerly Norse pole of ancestry, adding to the known Saxon and Celtic poles in the southeast and west (4), respectively. The broad structure that we observe in Scotland and Orkney was also detected by the People of the British Isles study (4); however, with increased coverage across lowland Scotland, the Hebrides, and the Highlands, we reveal novel fine-scale genetic structure, and describe the genetics of Shetland and the Isle of Man. In addition, we reveal genetic structure reflecting the geography of the Isles at orders of magnitude finer scales than the mirroring of geography seen thus far in continental Europe (22). The modern genetic landscape of the Isles reflects splits in the early languages of the Isles: Q-Celtic (Scottish, Irish and Manx Gaelic) and P-Celtic (Welsh, Cumbric, Cornish, Old Brythonic, Pictish). Scotland, the main focus of our analysis, is defined by a southwest versus northeast division near the River Forth (geographically located between the Tayside-Fife and Sco-Ire clusters in Fig. 1A). This division also echoes the historical distributions of Gaels versus Picts (see SI Appendix, Fig. S9 for a distribution of Pictish place names). The entire northeast branch of fineSTRUCTURE clusters describes the boundaries of the Pictish kingdoms, with the southwest branch mapping the Dark Age kingdoms of Strathclyde (Sco-Ire) and Dál Riata (Argyll). The Borders cluster coincides geographically with the Brythonic kingdoms of the Gododdin (modern Lothian and Borders) and Rheged (modern Cumbria). The legacy of the later Norse Jarldom of Orkney and its Scandinavian admixture drives the differentiation of the Northern Isles [for distributions of relevant historical groups, see Leslie et al. (4)]. Studies of ancient genomes are required to shed further light on the links between this modern structure and these groups. We have further explored the impact of the Norse Viking migrations on Britain and Ireland and the geographic sources of ancient Gaelic settlers of Iceland. Our estimates of Norwegian ancestry in Ireland contrast starkly with previous estimates (4, 8, 9) which were much higher. While methodologies overlap and are related, we include British references in our analyses. With British and Scandinavian references, we find agreement across both ADMIXTURE and the haplotype-based methods, which employ subtly different marker information—either allele frequencies or haplotypes. Our estimates are also in better accord with Irish Y-chromosome data, which show little trace of Norse patrilineal ancestry in the modern Irish (23). Future use of rare variation from whole genome sequencing may provide a more direct and discerning method to quantify the extent of Norse admixture into Ireland. We also investigated this historical period utilizing ancient genomes of Gaelic settlers in Iceland who date to its founding. Although greater sampling with high-coverage genomic data is required to elucidate the individual origins of these settlers, and despite subsequent migration to and from Ireland and southwest Scotland since the Norse Viking period, our results cautiously suggest that the northwest peripheries of Britain and Ireland are the best modern proxies for their homeland. Our results also may have implications for rare disease variant discovery within Britain and Ireland, where disease incidence is known to vary geographically (24). The extraordinary fine-scale haplotype diversity revealed here across the archipelago, particularly in Scotland and the surrounding Isles, is unlikely to be well represented in urban studies such as UK Biobank. Equitable translation of genomic findings into medicine may require that the full gamut of populations is well represented in genomic studies, both on a global scale (25) and also more locally within countries which show significant structure. Otherwise, these important low-frequency variants, which show larger effect sizes (26), will remain undiscovered or poorly characterized. Studies correlating the distribution of rare genetic variants with the structure defined by common variation will clarify the extent of the issue. Isolated and differentiated populations such as the ones we describe in the Hebrides, Argyll, the Scottish Highlands, or Donegal in Ireland, in particular, have many characteristics of utility for understanding the genetic basis of complex traits (27⇓–29).

Materials and Methods For a full description of all of the methods and materials, see SI Appendix, Supplementary Data 1. We describe the methods and materials in brief here. We assembled a combined dataset of 2,554 individuals from 5 different cohorts of regional English, Welsh, Scottish, Manx, or Irish ancestry. We cleaned this combined dataset, controlling for missingness, minor allele frequency, relatedness, and markers from the human leukocyte antigen region. We performed fineSTRUCTURE (11) analysis on this combined dataset, as well as ADMIXTURE (14) and F ST analysis on the dataset after markers had been pruned for excess linkage disequilibrium. We explored Norwegian ancestry in northern Britain using supervised ADMIXTURE (14) analysis and SOURCEFIND (20) analysis with 2,225 additional Scandinavian individuals. Lastly, using ancient Icelanders, we explored genetic affinities between modern British or Irish regions and ancient Icelanders using D statistics (30). All participants in all studies gave written informed consent. Ethical approval for the GS:SFHS study was obtained from the Tayside Committee on Medical Research Ethics (on behalf of the National Health Service) ref: 05/S1404/89. GS:SFHS is a Research Tissue Bank, approved by the East of Scotland Research Ethics Service ref: 15/ES/0040. Ethical approval for SCOTVAR was from the Multi-Centre Research Ethics Committee for Scotland: MREC/00/0/17: Investigation of genetic characteristics of Scottish regional populations to assess their genetic ancestry and their suitability for genetic association studies. Favourable opinions are held VIKING from the South East Scotland Research Ethics Committee (12/SS/0151), for the Orkney Complex Disease Study (ORCADES) from the North of Scotland Research Ethics Committee (12 December 2003), for the Irish DNA Atlas from the Royal College of Surgeon Research Ethics Committee (REC0020563). Ethical approval for the study of the samples from the Isle of Man was given by the Isle of Man Ethical Committee on 17 January 1997.

Acknowledgments We thank all participants, research team members, and funders of the cohort studies. The Chief Scientist Office of the Scottish Government funded ORCADES and Generation Scotland. The Medical Research Council (United Kingdom) funded Generation Scotland, the Scotvar study, the Viking Health Study - Shetland, ORCADES and genotyping in the European Longitudinal Study of Pregnancy and Childhood in the Isle of Man. The Wellcome Trust funded Generation Scotland, People of the British Isles and the Wellcome Trust Case Control Consortium. Generation Scotland was also funded by the Scottish Funding Council. ORCADES was also funded by the Royal Society, Arthritis Research UK and the European Union Framework Programme 6. The Irish DNA Atlas was part funded by Science Foundation Ireland and was cofunded under the European Regional Development Fund and by FutureNeuro industry partners. For a full list of acknowledgments see SI Appendix, Acknowledgments.

Footnotes Author contributions: E.G., G.L.C., and J.F.W. designed research; E.G., G.L.C., and J.F.W. performed research; S.O., M.M., D.M., V.V., P.K.J., D.W.C., H.C., C.H., S.M.R., J.G., S.G., P.N., S.M.K., C.A., A.C., C.S.H., D.J.P., G.L.C. and J.F.W. contributed samples; E.G. analyzed data; and E.G., G.L.C., and J.F.W. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission. C.T.-S. is a guest editor invited by the Editorial Board.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1904761116/-/DCSupplemental.