Population Structure of the Irish Travellers

In order to investigate the genetic relationship between the Irish Travellers and neighbouring populations we performed fineStructure analysis on Irish Travellers, settled Irish from a subset of the Trinity Student dataset14, and British from a subset of the POBI dataset15. A subset of the datasets were used in this analysis as we were primarily interested in the placing of the Irish Travellers within the context of Britain and Ireland, not the full structure found within Britain and Ireland. The results are presented in Fig. 1 in the form of a principal component analysis of fineStructure’s haplotype-based co-ancestry matrix (1A) and a dendrogram of the fineStructure clusters (1B).

Figure 1: Clustering of 34 Irish Travellers, 300 Settled Irish, and 828 British by fineStructure. (A) The first and second components of principal component analysis of the haplotype-based co-ancestry matrix produced by fineStructure analysis. Individual clusters are indicated by colour and shape. Individual Irish Travellers are indicated with black bordered shapes, with cluster shown in Legend. (B) The full fineStructure tree with the highest posterior probability, with cluster size and name, and broad branches shown. Full size image

We observe that 31 of 34 of the Irish Travellers cluster on the Irish branch, indicating a strong affinity with an Irish population ancestral to the current day “Traveller” and “settled” populations (Fig. 1B). One “Irish Traveller” is found within the Borders 1 cluster, and two are found within the Borders 2 cluster. These three individuals report full, or partial, English gypsie ancestry, a distinct and separate travelling population in Britain. One individual is found within the Ireland 1 cluster, and two are found within the Ireland 2 cluster. Traveller individuals within the Ireland 2 cluster report recent settled ancestry, and we have no such genealogical data on the individual grouped within the Ireland 1 cluster. Given their mixed ancestry, these individuals were excluded from subsequent F st , f 3 , and divergence estimate work.

The remaining 28 Irish Travellers in the fineStructure analysis were arranged into four clusters. These clusters were grouped on two separate branches (Fig. 1B), with Traveller 1 (n = 7) and Traveller 2 (n = 5) on the same branch, and Traveller 3 (n = 5) and Traveller 4 (n = 11) on a separate branch. The branch with clusters Traveller 3 and 4, forms an outgroup to the rest of the settled Irish and Irish Traveller clusters. These two branches of Irish Traveller clusters align closely with the split of Irish Travellers observed through PCA (Fig. S1). All the individuals who separate on the first principal component (henceforth “PCA group B”) are found in clusters Traveller 3 and 4 (Fig. S2A), and nearly all the individuals who remain grouped with the settled Irish on principle component 1 (henceforth “PCA group A”) are found in clusters Traveller 1 and 2 (Fig. S2A). The remaining PCA group A individuals are those Irish Travellers found in the aforementioned settled Irish or British clusters. This pattern is also repeated in the PCA (Fig. 1A), where members of Traveller 1 and 2 cluster with the settled Irish, where Traveller 3 and 4 individuals cluster separately.

Having identified distinct genetic groups of Irish Travellers, we investigated the correlation with Irish Traveller sociolinguistic features, specifically Shelta dialect, and Rathkeale residence (Fig. S2B,C, respectively). The majority of the Gammon speakers were members of clusters Traveller 1 and 2. All of Traveller 1 consisted of Gammon speakers. The majority of clusters Traveller 3 and 4 consisted of Cant speakers, where all but one individual, for whom language identity is unknown, of Traveller 4 were Cant speakers. We found that only clusters Traveller 1 and 2 contain any Rathkeale Travellers, where 4 out of 5 individuals in Traveller 2 are Rathkeale Travellers.

We next investigated population structure using the maximum-likelihood estimation of individual ancestries using ADMIXTURE (Figs 2 and S3). For this analysis we used a subset of the European Multiple Sclerosis dataset consisting of three northern European (Norway, Finland and Germany), two southern European (Italy and Spain), and a neighbouring population (France). We categorised the POBI British as English, Scottish, Welsh, and Orcadian. We further separated out the Irish Travellers to those in PCA group A and those in PCA group B.

Figure 2: Ancestry profiles of the Irish Travellers, and neighbouring European populations by ADMIXTURE. Shown are the ancestry components per individual for the two groups of Irish Travellers (Group A and Group B), settled Irish, British, and European populations; modelling for 4 to 6 ancestral populations. Full size image

At k = 4–6 (Fig. 2), we observe the well-described north-south divide in the European populations (k = 4), as well as Finland and Orkney (k = 5) differentiating due to their respective populations’ bottleneck and isolation. Although at lower values of k the Irish Travellers generally resemble the settled Irish profile (Fig. S3), at higher values of k two components are found to be enriched within the population. Each of these components is enriched in one of the two Irish Traveller PCA groups. Individuals with more than 20% of the “red” component when k = 5 belong to PCA group B and individuals with near 100% of “blue” component all belong to PCA group A (Fig. 2). The fact that even at k = 3 PCA group B gains its own ancestral component (Fig. S3) suggests strong group-specific genetic drift.

In order to investigate a possible Roma Gyspie origin of the Irish Travellers, we compared the Irish Travellers, and settled Irish to a dataset of Roma populations found within Europe16 using PCA and ADMIXTURE. The results broadly agree, with the Irish Travellers clustering with the settled Irish in the PCA plot, and resembling the settled Irish profile in ADMIXTURE analysis (see Fig. 3). There was no evidence for a recent ancestral component between the Irish Traveller and Roma populations. In addition, we formally tested evidence of admixture with f 3 statistics in the form of f 3 (Irish Traveller; Settled Irish, Roma). We found no evidence of admixture either when considering all the Roma as one population, or in each individual Roma population’s case (all f3 estimates were positive).

Figure 3: Comparison between the Irish Travellers, the settled Irish, and the European Roma. (A) The first and second components from principal component analysis using gcta64. (B) The ancestry profiles using ADMIXTURE, assuming 2 to 4 ancestral populations. Full size image

Given the apparent structure between the Travellers and the settled Irish populations, we quantified genetic distance using F st and “outgroup” f 3 statistics. F st analysis reveals a considerable genetic distance between the settled Irish and the Irish Traveller population (F st = 0.0034, Table S1) which is comparable to values observed between German and Italian, or Scotland and Spain.

In order to further investigate sub-structure within the Irish Travellers, we performed F st analysis on the Irish Traveller PCA (n = 2) and fineStructure (n = 4) groups, comparing them to the settled Irish (see also Table S1). The individuals belonging to cluster PCA group B are considerably more genetically distant from the settled Irish (F st = 0.0086), relative to PCA group A (F st = 0.0036). This could be explained by distinct founder events for PCA groups A and B, or that PCA group B has experienced greater genetic drift. The F st estimates of the Irish Traveller clusters are higher than the PCA groups. The estimates of clusters Traveller 1, 2, and 3 range from 0.0052 to 0.0054. However, Traveller 4 shows the highest F st value (F st = 0.0104), suggesting this cluster of individuals is responsible for the inflation of the PCA group B’s estimate. Generally, however, these results suggest that the general Irish Traveller population does not have a very recent source, i.e. within 5 generations or so. If we perform the same F st analysis on two random groups of settled Irish see observe a F st value < 1∙10−5.

To inform on whether lineage-specific drift is influencing the observed genetic distances between the Irish Travellers, the settled Irish and other neighbouring populations, we performed outgroup f 3 analysis, using HGDP Yorubans as the outgroup. Such analysis can inform on whether PCA group B and Traveller 4 do indeed represent an older Irish Traveller group, or a sub-group that has experienced more intense drift. When we compare PCA groups A/B to the settled Irish we see no significant difference between the two groups (see Table S2, A:settled f 3 = 0.1694 (stderr = 0.0013), B:settled f 3 = 0.1698 (stdrr = 0.0013), A:B f 3 = 0.1700 (stderr = 0.0013)); with similar results for the fineStructure clusters (Table S2). These results suggest that PCA group B has experienced more drift than PCA group A, inflating the F st statistic, which in turn has inflated the Irish Traveller population F st . We note however that f 3 statistics may not be sensitive enough to detect differences from settled Irish to Traveller PCA groups A and B should the difference between A and B be a relatively limited number of generations.

Divergence

A key question in the history of the Travellers is the period of time for which the population has been isolated from the settled Irish. In order to address this we utilized two methods, one based on linkage disequilibrium patterns and F st (which we call T F ), and one based on Identity-by-Descent (IBD) patterns (which we call T IBD ).

The T F method estimates the divergence to be 40 (±2 std.dev – obtained via bootstrapping) generations. Assuming an average generation time of 30 years the T F method estimates that the divergence occurred 1200 (±60 – std.dev) years ago. The method also estimates the harmonic mean N e for the two populations over the last 2000 years. The Irish Traveller estimate (1395, std.dev = 16 – obtained via bootstrapping) is considerably lower than the settled Irish estimate (6162, std.err = 122 – obtained via bootstrapping). However, the isolation of the Irish Travellers will artificially increase the F st value and consequently inflate the T F divergence estimate. We therefore estimated the divergence time with a different IBD-based method; as such an approach can accommodate genetic drift.

We first identified IBD segment sharing within and between the Irish Travellers and our settled Irish subset. The Irish Travellers were found to share 35-fold more genetic material IBD (in cM per pair) than the settled population (Fig. 4A). Specifically, a pair of Travellers share, on average, 5.0 segments of mean length 12.9 cM, compared to 0.4 segments of mean length 4.9 cM for the settled population (Fig. 4A; segments with length >3 cM). Additionally we compared IBD sharing within and between the two PCA groups; A and B (Fig. 4B). We observe a greater amount of IBD segments shared within PCA group B than PCA group A. These sharing patterns are not due to familial sharing, as we have previously removed individuals with close kinship (see Supplementary Methods 1.3). Sharing between settled and Traveller Irish was of similar extent to that within the settled group (Fig. 4A), with no significant difference between the PCA groups A and B (p = 0.12, using permutations, for the difference in the number of segments shared with the settled) (Fig. S4). We used the number and lengths of segments shared within settled, within Travellers, and between the groups to estimate the demographic history of those populations, and in particular, the split time between these two groups.

Figure 4: Extent of haplotype sharing between the settled Irish and the Irish Travellers, and between the two groups of Irish Travellers. (A) The number and lengths of shared segments within Settled Irish, within Traveller Irish, and between the groups. Left panel: The mean segment length; middle panel: the mean number of shared segments; right panel: the mean total sequence length (in cM) shared between each pair of individuals. (B) The number and lengths of shared segments within Traveller Group A, Traveller Group B, and between the groups. The format of the figure is as in (A). Full size image

Briefly, we used the method developed in Palamara et al.17 (see also Zidan et al.18). We assumed a demographic model for the two populations (Fig. 5A), in which an ancestral Irish population has entered a period of exponential expansion before the ancestors of the present day settled Irish and Irish Travellers split. After this split, the settled Irish continued the exponential expansion, whilst the Irish Travellers experienced an exponential population contraction. We then computed the expected proportion of the genome found in shared segments of different length intervals using the theory of ref. 17, and found the parameters of the demographic model that best fitted the data (see Supplementary Data 1.3, Fig. 5B, and Table 1).

Figure 5 (A) The model used for demographic inference. The two populations were one ancestral population, with size N e , T G generations ago. At this point the ancestral population started to grow exponentially until T S generations ago, where the ancestral Traveller and settled populations split from each other, with N S,T being the initial starting population size of the Traveller population. The settled population experienced continued exponential growth until the present, with a population size of N C,S . The Traveller population experienced a period of exponential contraction until the present, with a population of N C,T . (B) The proportion of the genome in IBD segments vs the IBD segments length. The total genome size and the sum of segment lengths were computed in cM. Left: sharing between pairs of settled Irish; middle: sharing between pairs of one settled and one Traveller individuals; right: sharing between pairs of Traveller Irish. Each data point is located at the harmonic mean of the boundaries of the length interval it represents. Full size image

Table 1 The best fitting parameters for the T IBD model, with the 95% confidence intervals (CI) shown below. Full size table

The results of the model suggest the Irish Travellers and settled Irish separation occurred 12 generations ago (95% CI: 8–14). The results also support opposite trends in the effective population sizes (N e ) of the settled and Traveller Irish since that split: while the settled population has expanded rapidly, the Irish Travellers have contracted (see Table 1). When restricting to the 12 members of PCA group A, the split time was estimated to be 15 generations ago (95% CI: 13–18) (Table 2). When restricting to the 16 members of PCA group B, the split time was 10 generations ago (95% CI: 3–14). We stress these results should be seen as the best fitting projection of the true history into a simplified demographic model, in particular given the limited sample sizes.

Table 2 The best fitting parameters for the T IBD model, with the 95% confidence intervals (CI) shown below, considering only individuals from the PCA groups A or B. Full size table

Runs of Homozygosity

Consanguinity is common within the Irish Traveller population, and in this context we quantified the levels of homozygosity compared to settled Irish and world-wide populations19. We calculated the average total extent of homozygosity of each population using four categories of minimum length of Runs of Homozygosity (ROH) (1/5/10/16 Mb). Elevated ROH levels between 1 and 5 Mb are indicative of a historical smaller population size. Elevated ROH levels over 10 Mb, on the other hand, are reflective of more recent consanguinity in an individuals’ ancestry10. We also include average figures for the European Roma in the Irish Traveller – European analysis. Full European Roma ROH profiles are shown in Figure S5.

As expected, the Irish Travellers present a significantly higher amount of homozygosity compared to the other outbred populations and to the European isolates the French Basque and Sardinian, which is sustained through to the larger cutoff categories of 10–16 Mb (see Fig. 6). Our results for the other world-wide populations agree with previous estimates10, with the Native American Karitiana showing the most autozygosity, and the Papuan population showing an excess of short ROHs. Two other consanguineous populations, the Balochi and Druze show slightly more homozygosity than the Irish Travellers, and the European Roma are most similar to the Travellers for both shorter and longer ROH.

Figure 6: Extent of autozygosity in the Irish Travellers, settled Irish, select world-wide populations, and the European Roma. Shown, across four minimum lengths of runs of homozygosity (ROH), are the average lengths of ROH in each population. The average ROH burdens for the European Roma are the mean of means across the 13 Roma populations studied. These values are from a separate analysis, and collated with the wider European ROH values for reasons of SNP coverage between the different datasets. Full size image

These results indicate a higher level of background relatedness in the Irish Traveller population history. The high levels of ROH larger than 10 Mb in length reflect recent parental relatedness within the population. This is supported by the average F ROH5 in the Irish Travellers (F ROH5 = 0.015), which is slightly lower but comparable to the F ROH5 score found among Orcadian offspring of 1st/2nd cousins (F ROH5 = 0.017)20.

Finally, in order to explore the potential of the Irish Traveller population for studying rare, functional variation for disease purposes, we tested minor allele frequency (MAF) differences between the settled Irish and the Irish Travellers from a common dataset of 560,256 common SNPs for 36 Traveller, and 2232 settled Irish individuals. We observed 24,670 SNPs with a MAF between 0.02–0.05 in the settled Irish population. We found that 3.29% of these SNPs had a MAF >0.1 in the Irish Traveller population. We tested the significance of this observation by calculating the same percentage, but taking a random 36 settled Irish sample instead of 36 Irish Travellers. We repeated this 1000 times and found no samples (p =< 0.001) with a greater percentage than 3.29 (mean = 1.3, std.dev = 0.11). This has additional implications for disease mapping within Ireland, as a proportion of the functional variants in the settled Irish population will be observed at a higher frequency in the Traveller population.