One of the great things about the mass personal genomic revolution is that it allows people to have direct access to their own information. This is important for the more than 90% of the human population which has sketchy genealogical records. But even with genealogical records there are often omissions and biases in transmission of information. This is one reason that HAP, Dodecad, and Eurogenes BGA are so interesting: they combine what people already know with scientific genealogy. This intersection can often be very inferentially fruitful.

But what about if you had a whole population with rich robust conventional genealogical records? Combined with the power of the new genomics you could really crank up the level of insight. Where to find these records? A reason that Jewish genetics is so useful and interesting is that there is often a relative dearth of records when it comes to the lineages of American Ashkenazi Jews. Many American Jews even today are often sketchy about the region of the “Old Country” from which their forebears arrived. Jews have been interesting from a genetic perspective because of the relative excess of ethnically distinctive Mendelian disorders within their population. There happens to be another group in North America with the same characteristic: the French Canadians. And importantly, in the French Canadian population you do have copious genealogical records. The origins of this group lay in the 17th and 18th century, and the Roman Catholic Church has often been a punctilious institution when it comes to preserving events under its purview such as baptisms and marriages. The genealogical archives are so robust that last fall a research group input centuries of ancestry for ~2,000 French Canadians, and used it to infer patterns of genetic relationships as a function of geography, as well as long term contribution by provenance. Admixed ancestry and stratification of Quebec regional populations:

Population stratification results from unequal, nonrandom genetic contribution of ancestors and should be reflected in the underlying genealogies. In Quebec, the distribution of Mendelian diseases points to local founder effects suggesting stratification of the contemporary French Canadian gene pool. Here we characterize the population structure through the analysis of the genetic contribution of 7,798 immigrant founders identified in the genealogies of 2,221 subjects partitioned in eight regions. In all but one region, about 90% of gene pools were contributed by early French founders. In the eastern region where this contribution was 76%, we observed higher contributions of Acadians, British and American Loyalists. To detect population stratification from genealogical data, we propose an approach based on principal component analysis (PCA) of immigrant founders’ genetic contributions. This analysis was compared with a multidimensional scaling of pairwise kinship coefficients. Both methods showed evidence of a distinct identity of the northeastern and eastern regions and stratification of the regional populations correlated with geographical location along the St-Lawrence River. In addition, we observed a West-East decreasing gradient of diversity. Analysis of PC-correlated founders illustrates the differential impact of early versus latter founders consistent with specific regional genetic patterns. These results highlight the importance of considering the geographic origin of samples in the design of genetic epidemiology studies conducted in Quebec. Moreover, our results demonstrate that the study of deep ascending genealogies can accurately reveal population structure.

That paper found that nearly 70% of the immigrant founding stock in this data set came directly from France. For the period before 1700 that fraction exceeds 95%. Of the remainder, about 15% of the founding stock were Acadians, who themselves were presumably mostly of French origin. Because of the earlier migration of the French founding stock, they left a stronger impact on future generations:

fren1

Much of the difference here is because earlier ancestors in a population which went through demographic expansion would have more of an impact on the nature of the population than later contributors (the earlier ancestors would show up in many more downstream genealogies). But notice that the Amerindians in the pool are a much larger proportion of ancestors than their final genetic contribution (50% of the French Canadians had at least once Amerindian ancestor). I suspect this may be due to differential fertility because of variation in social status by race (i.e., mixed-race French Canadians having lower fertility, perhaps by way of their exclusion from highly fecund elite families), and not just later absorption of Amerindians than French (on the contrary, I suspect that Amerindians were assimilated earlier, not later).

But this research did not look directly at genetics. Rather, these inferences were generated from genealogical records which go back to the founding of Quebec and maintained coherency and integrity from generation to generation. Some of the members of the same research group now have a paper out which looks at the genomics of French Canadians, and directly compares their results to that of the earlier paper. Genomic and genealogical investigation of the French Canadian founder population structure:

Characterizing the genetic structure of worldwide populations is important for understanding human history and is essential to the design and analysis of genetic epidemiological studies. In this study, we examined genetic structure and distant relatedness and their effect on the extent of linkage disequilibrium (LD) and homozygosity in the founder population of Quebec (Canada). In the French Canadian founder population, such analysis can be performed using both genomic and genealogical data. We investigated genetic differences, extent of LD, and homozygosity in 140 individuals from seven sub-populations of Quebec characterized by different demographic histories reflecting complex founder events. Genetic findings from genome-wide single nucleotide polymorphism data were correlated with genealogical information on each of these sub-populations. Our genomic data showed significant population structure and relatedness present in the contemporary Quebec population, also reflected in LD and homozygosity levels. Our extended genealogical data corroborated these findings and indicated that this structure is consistent with the settlement patterns involving several founder events. This provides an independent and complementary validation of genomic-based studies of population structure. Combined genomic and genealogical data in the Quebec founder population provide insights into the effects of the interplay of two important sources of bias in genetic epidemiological studies, unrecognized genetic structure and cryptic relatedness.

In 1760 there were 70,000 residents in the areas of Canada which were under French rule. A substantial fraction of these derived from the much smaller 17th century founding population. Today the number of North Americans with some known French Canadian ancestry numbers around ~10 million. I happen to know an individual whose great-great-grandmother was French Canadian. Using the internet it turned out that I could trace this woman’s ancestry along one line back to the countryside outside of Poitiers in the mid 16th century! Being conservative it seems that at least 5 million North Americans have overwhelming descent from the 1760 founding stock. These are the core French Canadians.

An immediate inference one might make from these background facts, the rapid expansion of the French Canadian ethnic group from a small core founding stock, is that they would have gone through a “population bottleneck.” The data here are mixed. On the one hand, there are particular Mendelian diseases associated with French Canadians. This is evidence of some level of inbreeding which would randomly increase the frequencies of deleterious recessively expressed alleles. And yet as noted in the paper French Canadians do not seem to have lower genetic diversity than the parental stock of French in the HGDP data set. Why? Because to go through a population bottleneck which is genetically significant you need a verysmall window of census size indeed. Tens of thousands is sufficiently large enough to preserve most of the genetic variation in the founder population which is not private to families. The sort of genetic polymorphisms which might have been typed for in widely distributed SNP chips.

fren2

But that’s not the end of the story. Though French Canadians don’t seem exhibit the hallmarks of having gone through an extreme population bottleneck as an aggregate, it turns out that in the populations surveyed there was evidence of substructure. The map to the left shows you the regions where the samples were drawn. Unlike the earlier study the sample size is smaller; this is a nod to the difference between a purely genealogical study and a genomic one. There needs to be money and time invested in typing individuals. Relatively public genealogical records are a different matter. Apparently the Gaspesia sample population were from a relatively later settlement. The urban samples naturally include descendants of local French Canadians, as well as rural to urban transplants.

fren3

As one would expect the French Canadian sample clustered with the CEU (Utah whites from the HapMap) and French (from the HGDP) in the world wide PCA. And not surprisingly they exhibited smaller genetic distance to the French than to the Utah whites (who were of mostly British extraction). Using Fst, which measures the extent of genetic variance partitioning between populations, the values from the aggregate French Canadian sample to the CEU sample was 0.0014 and to the French HGDP sample was 0.00078. The Montreal French Canadian group exhibited values of 0.0020 and 0.0012. But, it is important to observe that there was statistically significant differences between the various French Canadian populations as well (excluding the Montreal-Quebec City pairing). This may explain the existence of particular Mendelian diseases in the French Canadian population despite their lack of reduced genetic variation: there’s localized pockets of inbreeding which are not smoked out by looking at total variation statistics. Additionally, the authors conclude that not taking this substructure into account in medical genetics could lead to false positives. Inter-population differences in disease susceptibilities correlated with genome-wide differences in allele frequencies could produce spurious associations.

french4

The population substructure can also be elucidated by extraction of the independent components of variance on a plot, as you can see to the left. Panel A represents PCA of genomic data, while panel B is an MDS derived from genealogical data. The gist here is that you’re seeing the two biggest independent dimensions of variance each data set (these dimensions explain only a few percent of the total variance). Each individual color represents a French Canadian subpopulation. It is clear that there is substructure. Individuals from each group tend to cluster with individuals from their own subpopulation. The authors take this to confirm the Fst values earlier. But to me another interesting aspect is the difference between the genomic and genealogical visualizations. The genealogical visualization looks far “cleaner” to me than the genomic visualization. Why? Genealogical records are imperfect. The rough congruence validates that the Roman Catholic Church in Quebec didn’t make records out of whole cloth, but there were likely fudges, guesses, and deceptions on the margins. One thing to remember is that even if some of the difference is due to issues with paternity, much of that sort of thing would still be within population. Of course I’m looking at this somewhat glass-half-empty. The rough congruency could be seen as a validation of the robustness of the record-keeping of French Canadian institutions over all these centuries. When there isn’t genetic data, one can use genealogical data as a substitute. At least to a rough approximation.

In the final section the paper notes that there are some peculiarities n the genetics of the French Canadians which do indicate some level of genetic homogeneity, at least by locality. To explore this issue they focus on two genomic phenomena which measure correlations of alleles, genetic variations, over spans of the genome within populations. The two phenomena are linkage disequilibrium, which measures association across loci of particular variants, and runs-of-homozygosity, which highlights genomic regions where homozygosity seems enriched beyond expectation (the former is inter-locus, while the latter is intra-locus). Both of these values could be indicators of some level of population bottleneck or substructure, where stochastic evolutionary forces shift a population away from equilibrium as measured by the balance of parameters such as drift, selection, and mutation.

french5

To the right is a mashup of figures 5 and 6. On the left you have a figure which shows the extent of linkage disequilibrium as a function of distance between SNP. As you would expect the greater the distance between two SNPs, the more likely they’re to be in equilibrium as recombination has broken apart associations. The closer and closer two markers, the more likely they’re to be linked, physically and statistically. But there’s a difference between the two LD plots. There’s no difference between the CEU and French Canadian samples in the top panel, but there is in the bottom one. Why? The bottom panel shows LD between markers much further apart. Acadians in particular seem to exhibit more long distance LD than the other populations. This may be a sign of a population bottleneck and inbreeding. Also, please note that the Utah white CEU sample is probably relatively similar to the French Canadians in its demographic history as North American groups go. It is homogeneous and expanded rapidly from a small founder group. To the right you have in the top panel total length of ROH per individual, and the bottom length of ROH greater than 1 MB. Again, the Acadians seem to be standouts in terms of their difference from the CEU reference. Interestingly, there’s no difference between CEU, French, and the two French Canadian urban samples. I suspect this is due to the fact that in Montreal and Quebec City the distinctive inbreeding found in the other samples has been eliminated through intermarriage. ROH disappear when you introduce heterozygosity through outbreeding.

What has all this told us? From a medical genetic perspective it is implying that population structure matters when evaluating French Canadians, an Acadian is not interchangeable with a native of Montreal. In terms of ethnically clustered diseases of French Canadians, in the USA the Cajuns, it may not be that there are patterns across the whole ethnic group, but trends within subgroups characterized by long-term endogamy. I wonder if the same might be true of Ashkenazi. Is there is a difference between Galicians and Litvaks? Such regional differences among European Jews are new, but the French Canadians themselves are the result of the past three centuries. These results also seem to reinforce the Frenchness of the French Canadians. Years ago I skimmed a book on the cultural history of the people of Quebec, and the author went to great lengths to emphasize the amalgamative power of the French Catholic identity in Canada. Arguing that to some extent the roots of the community in the colonial era was something of an overblown myth. These results come close to rejecting that view. In particular the first paper, which shows the disproportionate impact that earlier settler waves have on the long term demographics of a population. A group which one could analyze in a similar vein would be the Boers, who are an amalgam of French Protestants, Dutch, and Germans, but seem to exhibit a dominance of the Dutch element culturally.

Finally, the French Canadians may give us a small window in the long term demographic patterns and genetic dynamics which might be operative on a nearby ethnic group: the Puritans of New England. Because of their fecundity it seems likely that tens of millions of Americans today descend from the 30,000 or so English settlers who arrived in New England in the two decades between 1620 and 1640. This is the subject of the Great Migration Project. With numbers in the few tens of thousands it seems unlikely that much of a thorough population bottleneck occurred with this group in a genetic sense in the aggregate. But the results from the French Canadians indicate that isolated groups can be subject to stochastic dynamics, and develop in their own peculiar directions.

Citation: Bherer C, Labuda D, Roy-Gagnon MH, Houde L, Tremblay M, & Vézina H (2010). Admixed ancestry and stratification of Quebec regional populations. American journal of physical anthropology PMID: 21069878

Citation: Roy-Gagnon MH, Moreau C, Bherer C, St-Onge P, Sinnett D, Laprise C, Vézina H, & Labuda D (2011). Genomic and genealogical investigation of the French Canadian founder population structure. Human genetics PMID: 21234765