It has been well documented that genetic studies of human disease, especially large-scale ones, have not captured the level of diversity that exists globally, as they are predominantly based on populations of European ancestry (). The under-representation of ethnically diverse populations impedes our ability to fully understand the genetic architecture of human disease and exacerbates health inequalities. Further, the lack of ethnic diversity in human genomic studies means that our ability to translate genetic research into clinical practice or public health policy may be dangerously incomplete, or worse, mistaken. For example, attempts to use estimates of genetic risk from European-based studies in non-Europeans may result in inaccurate assessment of risk and lack of interventions in under-studied populations. In this commentary, we discuss examples that illustrate why inclusion of ethnically diverse populations in human genetic studies facilitates identification of genetic risk factors for Mendelian and complex diseases. Additionally, we discuss why lack of replication across populations of genetic associations with complex traits, including disease risk, is expected based on the evolutionary history of populations across the globe. Lastly, we discuss challenges and future directions for promoting equity in human genomic studies.

As whole-genome sequencing (WGS) is increasingly used to infer the causes of rare undiagnosed diseases, reference genomes from more ethnically diverse populations are particularly important. This is because one criterion for identifying putative causal variants is by confirming rarity across populations. Therefore, if databases do not include sufficient data from ethnically diverse populations we may mistakenly infer that a benign variant is pathogenic. For example, the gnomAD exome and genome database includes ∼60% European sequences and less than 10% sequences from individuals of African ancestry at present ( https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/ ).

Genetic modifiers can also complicate the understanding of the biology underlying differences in disease presentation. Sickle cell disease (SCD), which is caused by homozygosity for a missense mutation (Glu6Val) in the β-globin gene (HBB), is an example. Every year about 300,000 newborns are diagnosed with SCD, and the SCD mortality in Africa among children less than 5 years old can reach 90%. Despite the high mortality of homozygotes, this mutation is maintained at high frequency in malaria endemic regions of Africa because heterozygous individuals are protected from malaria, resulting in balancing selection. Modifiers of SCD severity include maintenance of high fetal hemoglobin (HbF) expression, normally completely lost by 12 months of age, which results in fewer sickling crises and less severe disease. In Saudi patients with the Benin haplotype, HbF expression persists in adults, reaching twice the levels observed in African patients with the same haplotype (), and other genes can also modify the expression of HbF. Further, distinct traits can modify SCD presentation, including alpha-thalassemia, the prevalence and allelic spectrum of which varies across populations (). Hence, presentation of SCD may vary among populations due to gene-gene interactions (epistasis). The reason for phenotypic differences among populations with respect to SCD disease modifiers remains largely unexplained, despite clear clinical relevance. Larger studies across diverse populations of genetic factors influencing SCD presentation are needed.

Identifying pathogenic variants causing Mendelian disease is more complicated when one considers locus heterogeneity. For example, there are more than 300 genes involved in retinal disease; over 3000 mutations in 65 genes cause retinitis pigmentosa (RP) with different modes of inheritance ( https://sph.uth.edu/retnet/ ). As many of these mutations have only been characterized in Europeans, we know little about the genetic causes of retinal disease across ethnically diverse populations.

There may also be population-specific mutations in understudied populations that cause health disparities. For example, transthyretin (TTR) amyloid cardiomyopathy (ATTR-CM), due to a mutant transthyretin protein producing accumulation of amyloid fibrils, is an important and underdiagnosed cause of heart failure (HF) in African Americans. A TTR pathogenic missense mutation (V122I) is almost exclusively found in African descent subjects, with a carrier frequency in African Americans of 3%–4%. V122I acts in a dominant manner and accounts for as much as 10% of all HF cases in African Americans (). As new treatments targeting TTR gene expression or stabilizing the abnormal transthyretin are becoming available, genetic screening for this prevalent mutation in people of African descent can provide critical information with respect to both diagnostic accuracy and therapy, a case in point for precision medicine.

Cystic fibrosis (CF) is a Mendelian disease common in Europe (1 in 2000–3000 births) and rarer in African Americans (1 in 17,000 births). A consequence of low CF prevalence in African-descent individuals is that CF is often underdiagnosed. In Europeans, the most common causative allele is ΔF508 in the CFTR gene, accounting for more than 70% of cases. However, ΔF508 accounts for only about 29% of CF cases in people of the African diaspora (). In contrast, a different mutation 3120+1G→A accounts for between 15% and 65% of CF patients in South Africans with African ancestry (). Indeed, more than 2,000 rare mutations in CFTR underlie considerable clinical heterogeneity and guide different treatment modalities. Such is the case for the CF medication ivacaftor that selectively targets mutations affecting the receptor gating capacity of the CFTR protein. Knowing and testing for specific pathogenic variants that vary in frequency across populations is crucial for appropriate clinical intervention.

For Mendelian diseases, a pathogenic variant usually causes disease regardless of the population in which it occurs ( Figure 1 A). However, in some cases, such as X-linked G6PD deficiency and favism, a condition will only present upon specific environmental exposure (i.e., fava bean consumption causing hemolytic anemia). Given diverse evolutionary histories, different mutations in the same gene may account for a given disease in diverse populations (allelic heterogeneity). Variation in causative mutations may confound diagnoses or treatments.

(B) The role of differential linkage disequilibrium (LD) on transferability. (Top) A single common causative variant may be tagged by different SNPs in different populations; (bottom) causative variants differ by population (allelic heterogeneity) that are tagged by different SNPs (solid lines) or tagged by the same SNP but with weaker LD (dotted line). In either case, tagSNPs derived from a single population (e.g., European) may be inadequate to allow replication of associations across study populations.

(A) The role of effect size on transferability—monogenic (Mendelian) diseases with large effect sizes will cause disease wherever they occur. In contrast, genetic associations with complex traits affected by a few genes with moderate effect sizes (oligogenic) are less likely to transfer across populations. Lastly, for conditions affected by many genes (polygenic) with small effect sizes, transferability may be limited. For all cases, if risk is assessed by tagSNPs, the degree of LD will affect transferability.

The Impact of Genetic Diversity on Complex Traits

Figure 2 Summary of GWAS Studies by Ancestry for Studies in the GWAS Catalog through January 2019 Show full caption We show the distribution of ancestry categories in percentages included in GWAS ( https://www.ebi.ac.uk/gwas/home ) based on the study (left) and based on the total number of individuals (right). As of 2018, the majority of genome-wide association studies (GWAS), which aim to identify genetic variants associated with complex traits including disease risk, have been conducted in European (52%) or Asian (21%) populations ( Figure 2 , left). When we consider the number of individuals included in GWAS based on ethnicity, 78% are European, 10% are Asian, 2% are African, 1% are Hispanic, and all other ethnicities represent < 1% of GWAS ( Figure 2 , right). These disparities are unacceptable, particularly since GWAS findings may not replicate across ethnic groups.

Tishkoff et al., 2009 Tishkoff S.A.

Reed F.A.

Friedlaender F.R.

Ehret C.

Ranciaro A.

Froment A.

Hirbo J.B.

Awomoyi A.A.

Bodo J.M.

Doumbo O.

et al. The genetic structure and history of Africans and African Americans. The ability to replicate genetic associations across diverse populations can be affected by several factors. Differences in linkage disequilibrium (LD) across ethnicities influences how well causal variants are captured by tagging SNPs identified in a single population ( Figure 1 B). Markers in LD with risk variants in Europeans may not be in LD in other populations because LD patterns reflect different demographic histories that vary globally. Modern humans originated in Africa within the past 300,000 years and migrated out of Africa within the past 80,000 years. Africans have maintained larger and more sub-structured populations resulting in diverse patterns of LD across the continent (). In contrast, the migration out of Africa resulted in a population bottleneck followed by a series of founding events as modern humans spread across the globe. Thus, non-Africans have more extended regions of LD, the precise structure of which is determined by their population histories ( Figure 1 B). These differences in LD among populations can make trans-ethnic mapping particularly informative for identifying risk variants.

Lim et al., 2014 Lim E.T.

Würtz P.

Havulinna A.S.

Palta P.

Tukiainen T.

Rehnström K.

Esko T.

Mägi R.

Inouye M.

Lappalainen T.

et al. Sequencing Initiative Suomi (SISu) Project

Distribution and medical impact of loss-of-function variants in the Finnish founder population. The lack of replication of GWAS among populations could also be due to differences in genetic architecture. Differences in genetic architecture among ethnically diverse groups could be due to population specific variation as well as changes in allele frequency that arise as a product of genetic drift, local selection, or both. For example, founder populations can have differences in prevalence of a complex disease, or related intermediate phenotypes, compared to the source population and may be particularly informative for complex disease mapping. In the Finnish population that underwent founder events, there are many low-frequency loss-of-function variants that associate with complex phenotypes, including lipid levels (). Variation in phenotypic prevalence for complex diseases can also be found in Ashkenazim, French Canadians, Icelanders, and Sardinians—populations that experienced founder effects. However, differentiating between the effects of selection and drift on genetic diversity in these populations, not always mutually exclusive, needs to be resolved on a case by case basis.

Saleheen et al., 2017 Saleheen D.

Natarajan P.

Armean I.M.

Zhao W.

Rasheed A.

Khetarpal S.A.

Won H.H.

Karczewski K.J.

O’Donnell-Luria A.H.

Samocha K.E.

et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. In small, bottlenecked populations or those practicing consanguineous mating, homozygosity is enriched. Homozygosity mapping in such populations has long been a successful strategy to locate recessive disease genes. More recently, the identification of rare homozygous loss-of-function mutations due to consanguinity in apparently healthy individuals has provided important insights into gene function and has paved the way to the discovery or validation of drug targets, as in the Human Knockout Project that focuses on Pakistanis (). Loss-of-function variants can also shed light on biological pathways that have relevance across populations, as has been the case with the discovery of PCSK9, an important gene for regulating LDL levels. This discovery was facilitated by studying people of African descent with PCSK9 nonsense mutations, but the knowledge has translated into a drug with global utility. Beneficial loss of function mutations are promising targets for treatment, as it is easier to develop therapeutics that turn gene products off rather than on.

Freedman et al., 2018 Freedman B.I.

Limou S.

Ma L.

Kopp J.B. APOL1-Associated Nephropathy: A Key Contributor to Racial Disparities in CKD. Genovese et al., 2010 Genovese G.

Friedman D.J.

Ross M.D.

Lecordier L.

Uzureau P.

Freedman B.I.

Bowden D.W.

Langefeld C.D.

Oleksyk T.K.

Uscinski Knob A.L.

et al. Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Freedman et al., 2018 Freedman B.I.

Limou S.

Ma L.

Kopp J.B. APOL1-Associated Nephropathy: A Key Contributor to Racial Disparities in CKD. Local adaptation can also influence the genetic architecture of complex traits, which may not necessarily be related to the initial selection event(s), as many genes have pleiotropic effects. For example, in African Americans with non-diabetic progressive chronic and end stage kidney disease, two African-specific risk variants (G1 and G2) in the apolipoprotein L1 gene (APOL1) are strongly associated with these conditions. These variants confer an increased risk of approximately 7- to 10-fold for kidney disease, and, together, partially explain the higher incidence of end stage renal disease in African Americans as compared to European Americans (). It has been argued that these variants are at high frequency in populations of West African descent because they are protective against sleeping sickness caused by Trypanosoma brucei protozoa. These same variants at APOL1 are also associated with a broad range of nondiabetic kidney diseases, including severe lupus nephritis and sickle cell nephropathy ().

Kilpeläinen et al., 2019 Kilpeläinen T.O.

Bentley A.R.

Noordam R.

Sung Y.J.

Schwander K.

Winkler T.W.

Jakupović H.

Chasman D.I.

Manning A.

Ntalla I.

et al. Lifelines Cohort Study

Multi-ancestry study of blood lipid levels identifies four loci interacting with physical activity. Another potential complication causing lack of replication of GWAS across ethnic groups could be epistasis due to differences in genetic backgrounds (G × G) as well as gene-environment (G × E) interactions that vary among populations. A recent multi-ethnic, genome-wide study identified genetic loci interacting with physical activity to affect blood lipid levels, illustrating G × E interactions and the value of including different ancestries in studies of complex traits. Variants in four loci were found to interact with physical activity to influence lipid levels. These interactions were discovered in a trans-ethnic mapping study that included European, African, Asian, and Hispanic populations. Two out of the four loci (SNTA1 and CNTNAP2) were identified because Africans and Hispanics showed a relatively high frequency of the variants associated with lipid levels compared to other populations (). Had the study only been performed in Europeans, these effects would not have been discovered.

Even when diverse populations are studied, specific gene effects in these populations may not be evident, as often diverse populations are studied only as part of large meta-analyses that estimate associations from combined data. The result of this analytical strategy is to identify variants that have mostly similar effects across populations, but it can reduce the ability to detect population-specific genetic risk factors. Although studies that perform meta-analyses clearly discover true risk variants, they often fail to identify those variants that differ in frequency among populations, thereby missing an unknown proportion of the genetic risk.