Abstract For years, studies of founder populations and genetic isolates represented the mainstream of genetic mapping in the effort to target genetic defects causing Mendelian disorders. The genetic homogeneity of such populations as well as relatively homogeneous environmental exposures were also seen as primary advantages in studies of genetic susceptibility loci that underlie complex diseases. European colonization of the St-Lawrence Valley by a small number of settlers, mainly from France, resulted in a founder effect reflected by the appearance of a number of population-specific disease-causing mutations in Quebec. The purported genetic homogeneity of this population was recently challenged by genealogical and genetic analyses. We studied one of the contributing factors to genetic heterogeneity, early Native American admixture that was never investigated in this population before. Consistent admixture estimates, in the order of one per cent, were obtained from genome-wide autosomal data using the ADMIXTURE and HAPMIX software, as well as with the fastIBD software evaluating the degree of the identity-by-descent between Quebec individuals and Native American populations. These genomic results correlated well with the genealogical estimates. Correlations are imperfect most likely because of incomplete records of Native founders’ origin in genealogical data. Although the overall degree of admixture is modest, it contributed to the enrichment of the population diversity and to its demographic stratification. Because admixture greatly varies among regions of Quebec and among individuals, it could have significantly affected the homogeneity of the population, which is of importance in mapping studies, especially when rare genetic susceptibility variants are in play.

Citation: Moreau C, Lefebvre J-F, Jomphe M, Bhérer C, Ruiz-Linares A, Vézina H, et al. (2013) Native American Admixture in the Quebec Founder Population. PLoS ONE 8(6): e65507. https://doi.org/10.1371/journal.pone.0065507 Editor: Dennis O’Rourke, University of Utah, United States of America Received: February 25, 2013; Accepted: April 25, 2013; Published: June 12, 2013 Copyright: © 2013 Moreau et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This research was supported by Réseau de Médecine Génétique Appliquée of Fonds de Recherche en Santé du Québec (FRSQ) (HV and DL) and by FRSQ grant (MHRG). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: The authors have declared that no competing interests exist.

Introduction A major goal of medical and population genetics is to understand phenotypic consequences of genetic variation [1]. For years, studies of founder populations and genetic isolates represented the mainstream of the genetic mapping effort in targeting rare single gene defects held to cause Mendelian disorders [2]. Searching for genetic determinants of complex disorders changed the focus from rare deleterious mutations to susceptibility variants of common frequencies [3] and shifted attention towards association studies requiring very large and diversified cohorts. However, the importance of rare variants in genetic susceptibility to common diseases has been vindicated [4]. This paradigm change, from the causal common to causal rare susceptibility variants, has renewed interest in founder populations [5]. In populations arising from a founder event or in a genetic isolate, an initially “rare” mutation may gain in frequency to become more “mappable” [6]–[9]. Numerous founder events accompanied European colonization of the Americas, creating populations that remained isolated due to geographic barriers and/or distinctive cultural/religious/ethnic identities [10]–[13]. Their putative cultural and genetic homogeneity (enhanced by demographic bottlenecks typical of New World settlements) are considered as important advantages in association studies [14]. But is this really so? The population descending from settlers of Nouvelle-France, of European and mainly French origins, forming the majority of today’s province of Quebec, Canada, is known for a number of recessive diseases that are endemic, of increased frequency and/or due to a population specific mutation(s) [15], [16]. Because of its relatively limited number of European founders, the genetic homogeneity of French Canadians has been implicitly assumed, due to a founder effect reinforced by a demographic spurt in the 19th century [17], [18]. However, non-disease oriented genealogical and genetic studies have shown that the population of Quebec is more genetically diversified than previously anticipated [19], [20]. This can be ascribed to highly variable genetic contribution of distinct founders and to uneven geographic expansion of their descendants within the new colony. While designing association studies, one must be aware of genetic stratification within the patient and the corresponding control group, reflecting demographic history of the sampled population. Therefore, there is a need to understand the relative effects of demographic and genetic forces on the apportionment of genomic diversity among individuals and populations, and to be able to distinguish ancient ancestral relations from more recent admixture [21]. In addition to a diverse contribution of European founders [22], including a minute African origin [23], the resulting population of today’s Quebec was also genetically enriched by Native American admixture [19]. Indeed, the presence of Native Americans among the founders of Nouvelle-France is documented in historical records [24]–[26]. Their contribution was revealed by genetic studies of uniparentally transmitted markers showing the existence of Native American mitochondrial DNA lineages in the contemporary Quebec population [27], [28]. However, the extent of Native American contribution to the genetic makeup of the contemporary Quebec population is largely unknown. Genetic information limited to uniparentally transmitted Native lineages, and especially maternal lineages, is insufficient to quantify the extent of nuclear DNA admixture [28], [29]. Native ancestry is underreported in genealogical records although the extent of missing information remains an unsettled issue [26]. On the other hand, for a given individual, full Native ancestry is assumed once it is recorded, so that if this is not truly the case, it may skew our genealogical estimates of the Native genetic contribution. In genetic epidemiological surveys, the genetic homogeneity of the Quebec founder population was often taken for granted. However, assuming no admixture when in fact there is one, may lead to erroneous results in genetic mapping and association studies. In light of our results, the Quebec population is not different from other New World populations of European descent in being enriched in alleles of Native American origin, although the extent of admixture is much lower than in Central and South American populations [21], [30]–[32]. Similarly as well, the admixture was mainly through marriages with Native women [27], [28]. The aim of the present study was to obtain a more reliable estimate of the Native American ancestry in the Quebec genome using autosomal DNA diversity as well as genealogical data. This study builds on single nucleotide variations (SNVs) from 205 individuals representing different regional groups of the contemporary Quebec population (Figure 1), some of which were previously analysed in a different context [20], [28]. For all but ten of these individuals (n = 195; Table 1), the ascending genealogies were reconstructed up to the Quebec founders. The genomic data of our reference populations were from Reich et al. [33]. PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Figure 1. Map of Quebec subpopulations. In colors are the 10 regions/subpopulations included in the analyses. https://doi.org/10.1371/journal.pone.0065507.g001 PPT PowerPoint slide

PowerPoint slide PNG larger image

larger image TIFF original image Download: Table 1. Native American ancestry proportions in the Quebec regions. https://doi.org/10.1371/journal.pone.0065507.t001

Materials and Methods Population Sample and Ethics Statements We analyzed 205 unrelated (up to the 3rd generation) individuals from 10 groups from Quebec (see map Fig. 1 and Table S1). Five regions were described in [20]: North Shore, Saguenay, and the areas of Quebec City and Montreal, as well as 3 Gaspesian ethnocultural groups (French Canadians, Loyalists and Acadians). In this study, we added 3 and 9 samples from the Montreal and Quebec City areas, respectively, in addition to samples from 3 new regions: Abitibi (18 samples), Outaouais (15 samples) and the Gaspesian-Channel Islander subpopulation (20 samples). All participants provided informed written consent, and the study was approved by the CHU Sainte-Justine Ethics Committee. Regional/ethnic affiliation was self-described by the participants. DNA was obtained for all participants as previously described [20], [34] and sent to the McGill University and Genome Quebec Innovation Center to be genotyped on Illumina HumanHap650Y and 610Quad arrays according to the recommended protocols. Quality control filters were applied at the individual and Single Nucleotide Variation (SNV) levels using the PLINK software v1.07 [35], [36] following the same criteria than in Roy-Gagnon et al. [20]. Information was collected to reconstruct the genealogy of each participant. Genealogies were reconstructed as far back as possible in a total of 195 individuals (genealogical data were missing for 5 individuals from the Montreal area, 3 from the Quebec City area, and 1 from both Outaouais and Saguenay) using the BALSAC population register [37] and the Early Quebec Population Register [38]. Unless stated otherwise, the Native American reference sample (n = 52) includes individuals from Northern America (Aleutians, Algonquin speakers, Chipewyans, Cree, Ojibwa as well as West and East Greenlanders), whereas the European reference sample (CEU+French) consisted of HapMap CEU (n = 108) and French (n = 28) from the Human Genome Diversity Panel (HGDP). Extended reference sample of Europeans contained in addition, HGDP Italians (n = 12) and Tuscans (n = 8), as well as HapMap Tuscans TSI (n = 88) (Table S2). Statistical Analysis To infer local ancestry under a haplotype-based model, we used the HAPMIX software version 1.2 [39], [40] that estimates, at each locus, the probability of having 0, 1 or 2 alleles transmitted by one of the 2 source populations. We used an approach similar to Reich [33] for masking the European and African segments in the Native Americans masked dataset. We retained all loci with ≥0.95 probability that 1 or both alleles originated from the Native American source population. For this analysis we selected subsets of 50 individuals to match the sample size of the Native North American populations. Native American and European populations were phased together using the BEAGLE software version 3.3.2 [41], [42]. We estimated the global ancestry using the model-based approach implemented in the ADMIXTURE software version 1.22 [43], [44] with K = 3 (K being the number of ancestral populations assumed in the model) to distinguish between Native North American and European and Siberian ancestry in the Quebec sample. K = 3 allowed us to distinguish between old and new Asian-Native American ancestry. We used the PLINK software to select single nucleotide variations (SNVs) in approximate linkage equilibrium (pairwise r2<0.1 in sliding windows of size 50 shifting every 10 SNVs), yielding a subset of 46,344 SNVs. We used fastIBD from the BEAGLE software version 3.3.2 [41] to find shared Identity-by-Descent (IBD) segments between individuals of Quebec and the Native Americans to investigate shared ancestry [45]–[47]. The fastIBD method is based on estimating frequencies of shared haplotypes allowing for phase uncertainty. Results from this method were shown to be well correlated with genealogical kinship coefficients in Quebec individuals. Following the authors’ recommendation, we performed 10 runs of fastIBD (ibscale = 4) that we merged using the scripts provided by the authors. We performed the analysis with the unmasked complete data and for each Native American individual separately, we discarded afterwards shared IBD segments located in masked genomic regions. We retained the remaining shared IBD segments of at least 1 cM. All IBD segments shared with Native Americans were finally pooled for each Quebec individual. We calculated the f 3 statistic implemented in the ADMIXtools software [40], [48], which is based on patterns of allele frequency correlations across populations, to estimate the Native American ancestry proportion lower and upper bounds in the Quebec sample. The proportion of Native American ancestry in the Quebec sample was also evaluates using the ALDER software version 1.0 [49], [50]. Both ALDER and ADMIXtools (rolloff analysis) were used to estimate the linkage disequilibrium (LD) decay due to admixture. ALDER decay curves obtained using only one reference population were fitted starting at 0.8 and 0.7 cM (determined by the software) for the Quebec and French+CEU sample respectively using the masked Native North Americans as reference. The rolloff decay curve for the Quebec sample was estimated with two reference populations, Native North Americans (unmasked) and Europeans, and was fitted starting at 0.5 cM. The R statistical environment version 2.15.0 Patched [51] was used for additional statistical testing and graphing. Genealogical Analysis Of the 8424 founders (individuals whose parents could not be traced in Quebec parish and civil records) identified in the 195 genealogies of the Quebec individuals, 39 were of documented Native American origin. The genetic contribution (GC), based on genealogical data, is the expected proportion of the genome transmitted by an ancestor to a given individual. The GC was calculated using the GENLIB package, a genealogical analysis tool developed at BALSAC for the S-PLUS environment and transferred to the R environment for internal use. The mean GC was obtained by summing the GC of all Native American founders to all Quebec individuals and dividing by the number of individuals.

Discussion The advantages of using genetically isolated, founder populations in gene mapping have often been discussed in the context of studies on the genetic bases of Mendelian disorders, as well as on genetic susceptibility genes of complex diseases [14], [55], [56]. Some of the anticipated advantages can be compromised by a hidden population structure due to local founder effects [6], [19], [20], [57] and/or to unrecognized admixture. We found a low level of admixture overall. However, the variance of admixture estimates among individuals is very large (Table S1 and Figures S1, S2 and S3). Admixture estimates also vary significantly between the studied subpopulations (Kruskal-Wallis p<0.001 for genealogical and IBD estimates and p<0.05 for HAPMIX and ADMIXTURE estimates, Table S3). The four groups that consistently show the highest Native ancestry estimates are Gaspesians (ethnocultural groups of French Canadian or Channel Islander origin) as well as French Canadians in the North Shore and Saguenay regions. These results were obtained using the sample of 52 Native North American genomes described above. In order to test how the choice of the Native American or European reference individuals affected our results, we conducted the same analyses using different reference populations. We observed a small difference in the extent of admixture using different sets of reference genomes. These differences likely reflect geographic proximity and time depth of shared ancestry of the populations we could analyze (Table 2 and Figures S1, S2 and S3). Our study also shows that the analysis of IBD sharing performed very well in the analysis of recent admixture, comparable or perhaps even better than HAPMIX. Therefore, IBD sharing can be used to assist population structure analysis in association studies. Others have also shown that the analysis of IBD sharing is a promising tool for reconstructing populations’ demographic history [45]–[47], [58]. Interestingly, principal components analysis (PCA), often used in admixture studies, was not useful here because the observed level of Native American admixture was insufficient to be revealed in PCA plots (Figure S5). In contrast, approaches based on LD and haplotype information used here appear sensitive enough to capture the subtle recent Native American ancestry latent in the Quebec population. Otherwise it would be difficult to discern between the genetic sharing due to common ancient population history and recent admixture (Table 2, Figure 4). In conclusion, using dense genotypic data and deep-rooted genealogical data, we estimated the Native American ancestry in a population sample from Quebec, Canada. The Native American genetic contribution calculated using genealogical data was low. Unlike most studied admixed populations that have greater admixture proportions [30], [59]–[61], we estimated the part of genetic ancestry coming from Native Americans to Quebec regional samples at about 1%. An individual separated by m meioses from an ancestor is expected to share 2−m of this ancestor’s genome, i.e. on average about 0.1%, or 3.3 cM, after 10 generations. However, the length (exponentially distributed) of a shared segment has a mean of 100cM/m, or about 10 cM after 10 generations, suggesting that some individuals carry fairly long shared fragments and others not at all, explaining the variance among individual genomes. One percent Native ancestry can be understood as if everybody shared a Native American ancestor 6–7 generations ago. Indeed, a recent study based on four Quebec regional populations indicates that between 53 and 78% of Quebecers have at least one Native American ancestor in their genealogy [24]. Because of the small size of the early Quebec population, the same ancestor often contributed through more than one line to the same contemporary genome, thus suggesting its average occurrence at more distant generations than 6 or 7. Obviously, in historical reality, this genetic contribution varied both in time and space, impacting on stratification of the population and uneven distribution of the rare variants. Hence, Native American ancestry likely played a role in a reduction in the homogeneity of the Quebec founder population. This should be taken into account in the context of mapping and association studies, especially when rare genetic variants are involved.

Acknowledgments We are grateful to Brad Loewen and Stephen Oppenheimer for their comments, to Guylaine Cloutier for her help in recruiting individuals from Abitibi, and to all study participants who consented to provide their genealogical information and/or genomic sample.

Author Contributions Conceived and designed the experiments: CM DL. Performed the experiments: CM JFL. Analyzed the data: CM DL. Contributed reagents/materials/analysis tools: CM JFL MJ CB MHRG. Wrote the paper: CM DL. Participant recrutment: CM JFL CB MHRG. Data collection: MJ HV. Manuscript revision: JFL CB MJ MHRG HV ARL.