“This [commentary] is a very important perspective, and has exactly the kind of data we need to keep pushing this conversation forward,” says Shawneequa Callier , a bioethicist who specializes in the ethics of genetics research at the George Washington University School of Medicine and Health Sciences.

The consequences of these data discrepancies have already proven very real. Doctors have known for decades that people of African, Puerto Rican, and Mexican descent suffer unusually high rates of asthma-related deaths, but only recently learned why this might be : These groups commonly carry genetic variants that could be making them less sensitive to albuterol, a drug used in inhalers. The connection might have been obvious when looking at a pool of diverse genomic data—but over 90 percent of lung research has been done on populations of European descent.

For nearly 20 years, scientists have been using genome-wide association studies, known as GWAS, to identify connections between genetic variants and disease risk. But according to a commentary published today in the journal Cell, 78 percent of data used in GWAS come from people of predominantly European descent—even though they make up only 16 percent of the global population. The failure to include racially and ethnically diverse populations in genetic research both hampers our ability to understand human health and disease and exacerbates long-standing disparities in health, the authors write.

It’s never been faster, easier, or cheaper to sequence a human genome. An individual’s entire genetic code can be unraveled in as little as an hour, allowing researchers to scour its billions of base pairs for patterns that might predict a person’s risk of developing disease. Technologically speaking, studies of human genetics remain on the cutting edge.

Everyone should have equal opportunities to understand the diseases they might be at risk for, Tishkoff says. Right now, that’s not the world we live in.

It also doesn’t take much for repercussions to stray from a lack of information into misinformation, Tishkoff says. In a 2016 study , researchers found that a genetic test had incorrectly claimed that several patients with African ancestry had a high risk of developing heart disease based on the genetic variants they carried. The test had been designed using data from primarily white populations, in which many of the variants are more rare. But the variants identified in these patients were actually completely benign—which would have been clear if blacks hadn’t been left out of research in the first place.

This is an especially pertinent conversation given the recent rise in personal genomics companies like 23andMe, which recently announced that it will offer its customers information on their risk of developing type 2 diabetes , Martin says. But the predictive power probably won’t be equal across the board, she says, given that genetic tests might perform up to four or five times better in those of European descent than in people with other ancestry.

By focusing on only a subset of the human population, we’re bound to miss important determinants of health, says Sarah Tishkoff , a geneticist at the University of Pennsylvania’s Perelman School of Medicine and one of the co-authors of the new commentary. For instance, multiple genetic variants are associated with an increased risk of developing cystic fibrosis—and the ones that are common in individuals of European descent aren’t the same as those found most frequently in people with African ancestry. A narrow pool of study subjects makes it all too easy for researchers to miss genetic variants that happen to be rare in whites. And if those genetic variants aren’t catalogued with the rest, people who carry them might lose out on important diagnostic information.

The distribution of ancestry categories in percentages included in genome-wide association studies (GWAS) based on the study (left) and based on the total number of individuals (right). Image Credit: Sirugo et al. 2019, Cell

“Diversity issues aren’t just moral, but also scientific,” says Alicia Martin , a geneticist at the Broad Institute of MIT and Harvard. “There’s a staggering disparity that we haven’t grappled with that’s going to affect clinical tests or other parts of precision medicine.”

This European-centric skew might represent one of the biggest obstacles to the efficacy of precision medicine, wherein doctors could use each patient’s unique genetic makeup to tailor treatments and medical advice. Already, information gleaned from GWAS has yielded crucial insights into the roles genes play in a growing list of conditions, including obesity, prostate cancer, type 2 diabetes, and schizophrenia. But as things stand, it’s those with European ancestry who will benefit the most.

Increases in other underrepresented populations, on the other hand, remain marginal at best . To this day, individuals who identify as being of African, Hispanic, or Latinx descent make up less than four percent of people recruited into GWAS.

The issue isn’t a new one—and it’s true that, at one point, things were far worse. In 2009, just seven years after the publication of the first successful GWAS, a staggering 96 percent of study participants were of European ancestry. But the lion’s share of that gap has been closed by a recent increase in genetic studies of Asians, who now represent about 10 percent of GWAS subjects.

This tendency towards exclusion, she says, is also keeping researchers from making discoveries that can benefit everyone—not just the specific populations being studied. Take, for instance, the PCSK9 gene. Some individuals of African descent carry a nonfunctional copy of PCSK9, leading them to have unusually low levels of LDL cholesterol in their blood. By identifying the link between this specific gene and cholesterol levels, researchers have been able to develop new drugs that could someday stave off heart disease in people of all backgrounds.

Callier points out that this genetic variation is not evidence of inherent biological differences based on race. Modern humans originated from a common ancestor in Africa. But as people migrated out in small cohorts, random chance led to disparate groups leaving with slightly different genetic makeups. Then, as populations settled around the globe, many of these differences were further shaped and strengthened by distinct environments. All humans are subject to the same biology—but the genetic variants each person carries can modulate how and when that biology manifests.

By including diversity in genetic data, scientists can explore the resilience and flexibility inherent to human biology. Which means failing to acknowledge and address natural variation isn’t just an issue of modern segregation; it also inevitably erases significant, and culturally and medically relevant, segments of human history.

The vast majority of the human genome is identical from person to person. But genetic variations between populations can make a big difference in determining a person's risk of developing disease. Image Credit: Tetiana Lazunova, iStock

Old habits die hard—even in genetic research.

To fix the diversity problem, it’s worth examining how we wound up here in the first place.

In the early days of GWAS, sequencing even a single genome carried an exorbitantly high cost. Right off the bat, this priced out all but the most well-endowed and research-focused countries in the world, leaving the field largely to places like the United States and the United Kingdom—places chock full of people with European ancestry, says Sekar Kathiresan, a human geneticist at Massachusetts General Hospital and the Broad Institute.

The cost of genetic sequencing eventually came down. But by this point, cohorts had already been established and samples had already been collected. It was far easier for researchers to simply dig deeper into existing datasets than to spend the time and money required to seek out new ones, says Malia Fullerton, a bioethicist specializing in genomic research at the University of Washington. “It was a pragmatic decision,” she says. “If we wanted more detailed genetic information, it was easier to just use the samples already in hand.”

Bolstering the case for European-heavy studies was also the fact that, for many years, it was a technical challenge for genetic researchers to analyze diverse datasets, which contain far more variables.

Within a matter of years, depth inadvertently became prioritized over breadth. From this point on, “things just sort of spun out of control,” says Kari North, a genetic epidemiologist at the University of North Carolina Gillings School of Public Health.

There’s a long road ahead of us.

The severity of the data imbalance is an especially hard pill to swallow when you consider the fact that, in addition to adding nuance to genetic studies, diverse data often give researchers more bang for their buck, says Lucia Hindorff, program director in the Division of Genomic Medicine at the National Human Genome Research Institute (NHGRI). A recent analysis found that even though individuals of Hispanic or Latinx descent make up just one percent of GWAS participants, the data from these populations have produced over four percent of the genetic associations uncovered by these studies. Those of European descent, on the other hand, comprise 78 percent of GWAS study subjects, but have contributed just 54 percent of these discoveries.

The only way to solve the diversity problem, Fullerton says, is to break the cycle and start collecting new samples from previously overlooked populations. Many such efforts are well underway. Several initiatives, such as the Human Heredity and Health in Africa Initiative (H3Africa) and the Hispanic Community Health Study/Study of Latinos, are building databases in traditionally underrepresented populations. And at the National Institutes of Health (NIH), genetic research consortia like CSER, PAGE, and TOPMed have made leveraging data from diverse participants part of their missions.

Sheko men taking part in a lactose tolerance test during a study by the Tishkoff Lab in Ethiopia. Image Credit: Sarah Tishkoff Lab, University of Pennsylvania Perelman School of Medicine

One of the most ambitious efforts, NIH’s All of Us research program, plans to recruit a cohort of one million diverse individuals—at least half of whom are of non-European ancestry—to learn more about how differences in lifestyle, environment, and biology can influence health and disease. “In the past, we’ve done research mostly on the people who [voluntarily] come into academic medical centers,” Fullerton says. “Fixing this problem means we need to stop waiting for people to come to us…and inviting people who have not typically been a part of this research.”

Of course, these changes won’t happen overnight. “It’ll take time to fill in the gaps and really start to see benefits from the research. But everything has to start somewhere,” Hindorff says.

And amassing samples is just one piece of the puzzle: Technology must also keep pace. Since the advent of GWAS, robust statistical methods have been developed to account for participants with diverse backgrounds—including individuals with mixed ancestry—in the same study. But these techniques still need refining, Martin says.

Perhaps the biggest challenges going forward will be cultural. From individual scientists to funding agencies, the entities driving research will need to make a concerted push for inclusivity, Martin says.

A big part of that will be cultivating trust, equity, and partnership in the populations that have long been left out of these conversations. “There’s distrust amongst minority communities, particularly in the United States because of [scientific] abuses that have happened in the past, like the Tuskegee [syphilis experiment],” Tishkoff says.

It also doesn’t help that the biomedical research community also continues to overrepresent those of European descent. Closing some of these gaps may require scientists to better embody the principles of inclusivity themselves, says Callier. “Across the board, we know that when we have a diverse set of people contributing to a research project, the scope will be broader and we may ask questions that are different,” she says.

Within the research community itself, there is also a problem of diverse representation. Experts believe that establishing trust with traditionally underrepresented study populations will require the scientists that interface with them to model inclusivity as well. Image Credit: torwai, iStock

Repairing these relationships means avoiding the “parachute research” that’s been conducted in the past, wherein scientists swoop in, collect samples, and leave. “There needs to be a process with a back-and-forth, and we need to give feedback to communities,” Callier says.

“If we want to do this process right and on a global level, we have to do it by building local infrastructure and capacity,” adds Inês Barroso, a geneticist at the University of Cambridge. “The local population has to be involved from the outset, as collaborators. They shouldn’t feel like they’re just sample donors.”

These efforts will take years. But it’s essential that these studies are done right—otherwise, researchers will still end up marginalizing the very populations we’re now fighting to protect, North says.

None of this will be easy. But both ethically and scientifically, championing diversity is key to advancing research. And these first steps in addressing ancestry might just open the door to tackling other, similarly lacking aspects of diversity in genetic research, including socioeconomic status, sex, gender, and disability. All of those can affect the context in which genes and their products operate, North says. Ultimately, understanding—and acting on—these ideas is the only way forward.

“It’s no secret: Research takes time,” Hindorff says. “But hearts and minds are changing. I do think, in the past couple years, the message has really started to hit home.”