Population Structure within Ireland

To investigate the extent of fine-scale population structure within Ireland, we assembled a combined SNP genotype dataset of 536 Irish (by which we mean henceforth Irish individuals from either the Irish DNA Atlas, Trinity, or PoBI datasets), 101 Scottish, 131 Welsh, 96 Orcadians and 1,239 English individuals. We performed fineStructure24 analysis on this dataset, which identified a final inferred state of k = 48 clusters. We then explored the hierarchal structure of the clustering, from the coarsest (k = 2) to the finest (k = 48) level. At k = 48 we reproduce previously reported genetic structure across Britain21, but in addition we observe seven large clusters of predominantly Irish membership (Supplementary Fig. 1), which we treat as putatively ‘Gaelic’ Irish. All seven of these ‘Gaelic’ Irish clusters are apparent by k = 30, and all clusters at this level have a minimum size of 10 individuals (Supplementary Table 1). At k = 30, we describe the finer Irish structure and also recapture previously described British structure21. In a similar fashion to previous reports, we present informative clusters with >10 individuals21,27 (see Fig. 1) so that downstream analyses had appropriate power. The results of the principal component analysis (PCA) of the fineStructure co-ancestry matrix are shown in Supplementary Data 3 and Supplementary Fig. 2. We also note the fineStructure approach outperformed Genome-wide Complex Trait Analysis (gcta64) (Supplementary Data 4 and Supplementary Fig. 3), observing that fineStructure is able to separate the ‘Gaelic’ Irish clusters at both coarse (Principal Components 1 and 2) and fine (Principal Components 7 and 8) levels better than gcta64.

Figure 1 The clustering of individuals with Irish and British ancestry based solely on genetics. Shown are 30 clusters identified by fineStructure from 2,103 Irish and British individuals. The dendrogram (left) shows the tree of clusters inferred by fineStructure and the map (right) shows the geographic origin of 192 Atlas Irish individuals and 1,611 British individuals from the Peoples of the British Isles (PoBI) cohort, labelled according to fineStructure cluster membership. Individuals are placed at the average latitude and longitude of either their great-grandparental (Atlas) or grandparental (PoBI) birthplaces. Great Britain is separated into England, Scotland, and Wales. The island of Ireland is split into the four Provinces; Ulster, Connacht, Leinster, and Munster. The outline of Britain was sourced from Global Administrative Areas (2012). GADM database of Global Administrative Areas, version 2.0. www.gadm.org. The outline of Ireland was sourced from Open Street Map Ireland, Copyright OpenStreetMap Contributors, (https://www.openstreetmap.ie/) - data available under the Open Database Licence. The figure was plotted in the statistical software language R46, version 3.4.1, with various packages. Full size image

The majority of our Irish individuals are placed in clusters grouped on one branch, which itself is grouped with the Orcadian branch. This combined branch forms an out group to the remaining fineStructure clusters (Fig. 1). The clusters found on the Irish branch are geographically stratified and we assigned labels to reflect this. To avoid confusion, a fineStructure cluster is referred to in italics (e.g. Ulster), distinct from the geographic region (i.e. the province of Ulster). Within the ‘Irish’ branch we observe three broad groups of clusters, which are indicated by the colour of the cluster’s label in the dendrogram shown in Fig. 1 cluster name colour in Fig. 1. The clusters describe; the north of Ireland (Ulster), the centre of Ireland (Connacht, Dublin, C Ireland, and Leinster), and the south of Ireland (N Munster and S Munster). The distribution of these Irish clusters follows both geographic and political borders within Ireland; particularly the boundaries of the four Irish provinces (see Fig. 1). The two clusters, N Munster and S Munster, follow the boundaries of Munster. The same can be said of the Connacht cluster for Connacht, and the Ulster cluster for Ulster. The centre of Ireland branch is predominantly found within the boundaries of the modern Irish province of Leinster, with the exception of the C Ireland cluster, which is also found within the north of Munster and the south of Connacht. In particular Leinster is found within the boundaries of the Leinster province and historical kingdom28. Finally, Dublin is mainly centred on the county of Dublin (with some members in the north in Ulster).

The fineStructure branch with the second largest proportion of Irish individuals is the North Ireland branch, containing the clusters N Ireland I, II, and III (Fig. 1A). These clusters are made up of Irish (predominantly from the north of Ireland), Scottish, and English (predominantly northern English) individuals to varying proportions. N Ireland I (n = 33) consists of 7 Irish and 26 English individuals, N Ireland II (n = 94) consists of 53 Irish, 19 Scottish, and 22 English individuals, and N Ireland III (n = 38) consists of 28 Irish, 1 Scottish, and 9 English individuals. The majority of the Irish individuals placed in a N Ireland cluster are found within N Ireland II, and their recent genealogical ancestry originates in Ulster (Fig. 1). The other Irish individuals in N Ireland I and III are predominately found within Ulster, or Dublin – though there is one individual in N Ireland I whose recent genealogical ancestry is from the south of Ireland. The mixed membership of these clusters suggests that these individuals have shared Irish and British ancestry. In order to further explore this hypothesis we utilised the genealogical data from the Irish DNA Atlas, comparing great-grandparental surnames from different clusters. We first classified surnames according to the following categories; Irish Gaelic, English (which included English or Anglo-Norman surnames), Scottish (which included Scottish or Gallowglass surnames), or other. We then compared whether Atlas individuals in the ‘Gaelic’ Irish and N Ireland clusters had significantly different surname counts of Irish or English, or Irish or Scottish origin. We tested using Fisher’s Exact Test in the computing language, R. We observe that the N Ireland clusters have both a significantly larger proportion of English surnames (p = 2.2e-16, OR: 6.34) and Scottish surnames (p = 2.2e-16, OR: 25.27) than the neighbouring ‘Gaelic’ Irish clusters.

To further compare the genetic distances between the observed Irish and British clusters, we performed F st analysis, computing the F st value between Irish and British fineStructure clusters (see Supplementary Table 2). As expected, and consistent with the fineStructure analyses and previous estimates18,21, genetic differentiation across Ireland and Britain is subtle, with the greatest genetic distances between Orcadian and non-Orcadian clusters (mean F st = 0.0032). Ulster appears to be an outlier relative to the other ‘Gaelic’ Irish clusters, consistent with its position in PCA (Supplementary Fig. 2). The Gaelic clusters exhibit fine differentiation between each other (average F st = 0.00030; average F st excluding outlier Ulster = 0.00024) which is comparable to the differentiation we see between English clusters (average F st = 0.00031; average F st excluding outlier Cornwall I = 0.00024). This level of differentiation is finer than what we observe within Wales (average F st = 0.00138), or Scotland (average F st = 0.00250). The level of differentiation we observe in the island of Ireland (F st = 0.0003), Gaelic and N Ireland clusters included, is almost an order of magnitude smaller than what we observe within clusters found across Great Britain, excluding Orkney (F st = 0.00135).

Whilst the haplotypic relationship between two published ancient Irish genomes and modern European population has been described7, we were interested in whether we could detect any haplotypic affinity from these ancient Irish genomes to groups we observe within modern Ireland. We utilised a similar procedure to the original authors,7, performed ChromoPainter analysis and compared the haplotypic affinity of our modern Irish and British clusters to the Irish Neolithic Ballynahatty (3343–3020 cal BC) and Irish Bronze Age Rathlin1 (2026–1534 cal BC) individuals. We recorded the average length of haplotypic donation from each ancient Irish individual to each k = 30 Irish or British cluster (see Supplemental Data 5). We observe that the majority of clusters within Ireland and Britain share a similar affinity with Ballynahatty, with no significant differences between individual Irish clusters (Supplementary Fig. 4a). The highest haplotypic donations for Rathlin1 are to modern ‘Celtic-speaking’ populations, i.e. Ireland, Wales, and Scotland. Though no donations to ‘Irish’ clusters appear significantly different from each other, both Connacht and Dublin do show the highest affinity with Rathlin1 (Supplementary Fig. 4b). These results suggest a homogenous contribution of these two ancient genomes to contemporary genetic structure in Ireland.

Estimated Effective Migration Surfaces

In order to identify evidence of gene flow barriers within Ireland and Britain, we performed EEMS25 analysis on the Atlas and PoBI datasets. We did not include individuals from the Trinity dataset as they lack geocoding. For more details on the analysis see Methods and Supplementary Data 2.2. We observe a number of gene flow barriers within Ireland and the British Isles (Fig. 2). The strongest barrier is found around Wales between both Ireland and England, with a separate barrier observed between Scotland and Orkney. These barriers mirror the greatest divisions in our fineStructure and F st analyses. We also observe several gene flow barriers within England. The first is in the south-west, and appears to separate out Devon/Cornwall from neighbouring English counties and Wales. The second is a region in the north of England – in the Pennine Hills – that is associated with the distribution of the fineStructure cluster N England I, with the third boundary following with the English-Scottish border. In addition to the general region of gene flow identified in south and central England, two notable corridors of gene flow are observed; the first runs along the Welsh-English border which represents the two clusters N England II and Marches I, and the second is in the north of England and represents the link between the two clusters N England III and N England IV. Within Scotland, a corridor of gene flow connects the two sampled regions, and there are two regions of gene flow in Wales that correspond to the areas where the majority of the north and south Welsh samples derive their ancestry from. The Western Isles and Highlands of Scotland present a large region of low gene flow, and could represent the relative isolation of the small number of samples from that region, which belong to the W Scotland I cluster.

In Ireland we detect a general trend of gene flow across the island, with three areas of low migration. The first is to the west of the island, including the coast of Connacht. The second is a region of relatively low genetic migration near the Leinster – Munster border. The final region of low genetic migration is found within Ulster, extending into Scotland, and seems to reflect the genetic differentiation of ‘Gaelic’ Ireland and Britain, specifically Scotland. This pattern of Ireland’s isolation is also seen in the gene flow barrier between Wales and Ireland. Interestingly we observe a corridor of relatively high genetic migration between the north-east of Ulster and the south-west of Scotland. This corridor appears to reflect the link between individuals of shared Irish and British ancestry (i.e. N Ireland I, II, and III), and indeed the authors of EEMS expect that under some circumstances EEMS could represent recent genetic migrants this way25.

Ancestry Profiles

To explore the origins of the Irish genetic clusters we used a previously described method21 to model our Irish and British clusters as a mixture of different populations within Europe. If different clusters within Ireland have experienced dissimilar admixture histories within their past (with respect to Europe) this regression-based admixture analysis will demonstrate this with distinct admixture profiles.

In our analysis we estimated the ancestry profiles of each Irish and British cluster at k = 30. Prior to this, we performed fineStructure analysis on 6,021 European individuals to describe population structure in the European sample and inferred 134 clusters (see Methods). We then investigated the various hierarchal levels of the European clustering to identify a value of k-clusters which summarised the main population structure and retained clusters that are large enough to ensure the ancestry profile method’s accuracy and power. In order for our results to be comparable to previous uses of this method21 we chose a similar number of reference clusters using the European individuals. In the end, we identified k = 56 to be a good representation, i.e. the clusters represent the main branches of the dendrogram, yet retain enough samples for the regression admixture analysis. We additionally removed all individuals from putatively recent admixed clusters as we were interested in the ancestral haplotype diversity in each region of Europe. More recent admixture between reference European clusters would distort our ability to resolve more ancient contributions. This left a final k value of 51 clusters and 5,804 European individuals for the ancestry regression analysis. For more detail on the European clustering see Supplementary Table 4. We performed ChromoPainter haplotype painting of the Irish and British individuals, using the 51 European clusters as donors, and also painted the European individuals with the 51 European clusters donating haplotypes. We then solved, by regression, the average proportion of the genome in each Irish and British cluster that is closest, ancestrally, to each of the European clusters.

We report the total levels of ancestry proportions best represented by each group of European clusters grouped by broad country membership (Fig. 3a), and the ancestry proportions of the 19 individual European clusters that contribute at least 2.5% ancestry to any Irish or British cluster (Fig. 3b). The raw ancestry proportions are reported in Supplementary Table 5, and the 95% confidence intervals in Supplementary Table 6. For the seven ‘Gaelic’ Irish clusters, we observe that 80% of ancestry is best explained by clusters of French, Belgian, Danish, and Norwegian membership, with clusters from the other six reference European populations making up the remaining ~20% (Fig. 3a). French clusters are the best fit for about half of the ancestry within these Irish clusters, which is the highest proportion across all the Irish or British clusters. This French proportion is being driven primarily by the European cluster FRA1 which by itself represents an average of 30% ancestry in the ‘Gaelic’ Irish clusters (Fig. 3b). Cluster FRA1 is predominantly (80.0%) made up of individuals from the north-west region of France, an area with genetic affinity to other, British, ‘Celtic’ populations23. This pattern of French ancestry continues in other Irish and British clusters associated with Celtic ancestry; specifically the N Ireland, Scottish, Orcadian, Welsh, and Cornish clusters. The ‘Gaelic’ Irish clusters show the lowest ancestry proportions of German clusters, which in turn are thought to reflect Germanic/Saxon influence21. Orkney shows the second-least ‘Germanic’ proportion, with English clusters showing the most. We also observe a low amount of Belgian-like ancestry within Ireland, compared to groups within Britain, further illustrating Ireland’s relative isolation from mainland Europe.

Figure 2 The estimated effective migration surface of Ireland and Britain from 1803 Irish and British individuals. Shown are the posterior mean migration rates of the six independent EEMS chains (m – on a log10 scale). The outline of Britain was sourced from Global Administrative Areas (2012). GADM database of Global Administrative Areas, version 2.0. www.gadm.org. The outline of Ireland was Open Street Map Ireland, Copyright OpenStreetMap Contributors, (https://www.openstreetmap.ie/) - data available under the Open Database Licence. The figure was produced in the statistical software language R46, version 3.4.1, with the package rEEMSplots. Full size image

Figure 3 The European ancestry profiles of 30 Irish and British clusters. (a) The total ancestry contribution summarised by majority European country of origin to each of the 30 Irish and British clusters. (b) (left) The ancestry contributions of 19 European clusters that donate at least 2.5% ancestry to any one Irish or British cluster. (right) The geographic distribution of the 19 European clusters, shown as the proportion of individuals in each European region belonging to each of the 19 European clusters. The proportion of individuals form each European region not a member of the 19 European clusters is shown in grey. Total numbers of individuals from each region are shown in white text. Not all Europeans included in the analysis were phenotyped geographically. The figure was generated in the statistical software language R46, version 3.4.1, using various packages. The map of Europe was sourced from the R software package “mapdata” (https://CRAN.R-project.org/package=mapdata). Full size image

In comparison to the ‘Gaelic’ Irish clusters, the N Ireland clusters show ancestry proportions that are variously in-between the Irish and the British proportions. Namely, as the proportion of German-like ancestry decreases, the proportion of France-like ancestry increases (r2 = 0.97, p = 0.08). This agrees with the proportions of individuals with recent Irish or British genealogical ancestry in each of the N Ireland clusters (see Results: Population Structure within Ireland), where the clusters with the least number of Irish individuals show the lowest proportion of French-like ancestry. Therefore, the N Ireland clusters appear to have arisen from different proportions of Irish and British ancestry.

A striking result of our admixture analysis is the surprising amount of Norwegian-like ancestry in our Irish clusters. We also detected high levels of Norwegian ancestry in Orcadian and Scottish clusters, and relatively low Norwegian ancestry in English and Welsh clusters. The Norwegian clusters that contribute significant ancestry to any Irish or British clusters predominantly consist of individuals from counties on the north or western coasts of Norway (Fig. 3b). These areas are noted to be regions where Norse Viking activity originated from8. Whilst this surprising Norwegian signal in Ireland is most likely due to Norwegian admixture into Ireland, indeed this would corroborate with accounts of Irish slave trade in the Viking era29, and Y-chromosomal analysis (unpublished). To test this hypothesis we ran an additional regression admixture analysis, this time modelling Norwegian haplotypes as a mixture of Irish, British, or European haplotypes (Supplementary Data 6). We observe significant proportions of Irish, Scottish, and Orcadian ancestry in modern Norway (6.82%, 2.29%, and 2.13%, respectively), particularly western Norway. This could provide evidence for Irish admixture back into Norway, but could also easily be explained by Norwegian haplotypes existing in Ireland, Scotland, and Orkney. Therefore, we are able to provide an upper estimate of ~20% Norwegian ancestry within Ireland, but unable to provide an empirical lower limit.

Admixture within Ireland

In order to investigate evidence of admixture into Ireland from European sources we performed Globetrotter26 analysis on the combined European, Irish, and British dataset used in the regression-based ancestry profiling (see Methods). The Globetrotter method requires no a priori specification of admixture sources, instead modelling the source populations as a mixture of ‘surrogates’ who may or may not be ancestrally related to the actual source populations.

We analysed evidence of an admixture event in all individual ‘Gaelic’ Irish clusters, as well as in a combination of all the ‘Gaelic’ Irish clusters together (Ireland-Combined). We used the previously described 51 European clusters as surrogate source populations. We detected significant evidence of admixture (p < 0.01) in the Ireland-Combined cluster, as well as the; C Ireland, Connacht, Leinster, N Munster, and Ulster clusters (Supplementary Table 7). We observe dates of admixture events that collectively range from 38.72–29.92 generations (788CE–1052CE), involving two sources. The majority source is predominantly modelled as a mixture of FRA2 and FRA1 ancestry. The minority source contributed a range 0.31–0.36 ancestry with consistent French-like and German-like ancestry, and a northern European component, represented by the clusters DEN1, SWE10, and NOR1 and NOR10. These clusters contain individuals largely from regions associated with Viking activity. The majority source we interpret as the native Irish component, with the minor source the admixing source. The joint probability curves for the admixture analyses are presented in Supplementary Data 7 and Supplementary Fig. 7.1, and suggest the largest cluster tested, Ireland-Combined, has the cleanest signal for comparing major ‘Irish’ and ‘Norwegian’ components. Our Globetrotter results suggest an admixture event in Ireland involving a Scandinavian component that is dated to around the times of Viking activity in Ireland.

We hypothesised that clusters with both Irish and British membership (N Ireland I, II, and III), represent individuals with shared Irish and British ancestry due to admixture events. We therefore performed additional Globetrotter analyses, using the other 27 Irish and British clusters as surrogate source populations (see Methods). We found significant evidence of admixture in all three N Ireland clusters (p < 0.01) (see Supplementary Table 8), with dates ranging from the 17th to the 18th centuries. The largest of these clusters, N Ireland II, is estimated to have the oldest admixture date, 10.66 (CI: ±0.43) generations ago. We estimate this admixture event between two sources, contributing 0.34 and 0.66 ancestry. Source 1 is modelled as 0.52 native Irish ancestry (of which C Ireland accounts for 0.258, and Ulster – 0.260), with 0.8 of source 2 ancestry modelled as N England IV. The admixture events detected in the other two clusters, N Ireland I, and N Ireland III, have later dates (7.42 CI: ±0.43 and 9.24 CI: ±0.31, respectively). N Ireland I consists of source 1 contributing 0.4 and source 2 contributing 0.6 ancestry to the admixed population. Source 1 mainly consists of 0.298 Leinster, 0.189 C Ireland, and 0.152 N England I ancestry, with source 2 mainly consisting of 0.646 England I ancestry. Unlike N Ireland I and II, the major source of the N Ireland III admixture event is mainly Irish in ancestry. The major source 2 (0.7 total ancestry) consists mainly of C Ireland (0.412) and Ulster (0.162) ancestry. The minor source 1 (0.3 total ancestry) is modelled to consist mainly of English ancestry (England I – 0.332, N England IV – 0.322, N England I – 0.119, and N England II – 0.103). The joint probability curves of the majority ‘Irish’ and major ‘British’ components (Supplementary Fig. 7.2–4) show that the largest cluster (N Ireland II) demonstrates the cleanest signal. The fitted curves are relatively shallow compared to other admixture signals21,26, suggesting a relatively gradual admixture process.