Lack Of Diversity In Genetic Databases Hampers Research

Enlarge this image toggle caption Yann Arthus-Bertrand/Getty Images Yann Arthus-Bertrand/Getty Images

When Lalita Manrai went to see her doctor for treatment of kidney disease, she noticed that some of the blood test results had different "normal" ranges for African Americans compared with everybody else.

When she asked her doctor which range applied to her — a woman born in India — he said the "everybody else" category was actually based on a study of Europeans, so neither category was right.

Instead, he said, he calculated "normal" for her by averaging the two values.

"It's ridiculous," says Arjun Manrai, a medical researcher at Harvard Medical School, who recounted this story of his mother, who died in 2018. But there simply isn't good information about a lot of medical issues that may vary based on a person's ancestry. "In this vacuum of information, this was what [the doctor] was doing as his approach to staging her kidney disease," Manrai says.

It's important to get those laboratory results right, because they influence a patient's treatment, Manrai says.

The same problem comes up in other common situations, such as the A1C test that is used to diagnose and manage diabetes, and in genetic variants that can identify people at risk of sudden death from heart disease.

These factual gaps exist because much of the research used to understand these genetic tests and lab values comes from predominantly European populations. Manrai is part of a growing effort to correct the skewed picture that results.

One of the most widely used resources for studying the genetics of disease is the U.K. Biobank, which contains samples from half a million middle-aged British people, 95% of whom are of European ancestry.

"At the time they were recruited and the age group that were recruited, that largely reflected the average across the U.K.," says Dr. Cathie Sudlow, the biobank's chief scientist. "So because the study was in the U.K., that's what we got."

The biobank has been a boon to scientists who want to identify the genes that are involved in disease. Genes are universal. But the ethnically skewed resource doesn't work as well to identify the genetic variants that differ based on ancestry.

"There is no one cohort anywhere in the world that can answer all questions for all people," Sudlow says. So the biobank is working to help develop much more diverse resources.

The U.K. Biobank has helped establish large repositories in Mexico and China. In the United States, Sudlow and her colleagues have been offering advice to the National Institutes of Health, which is gradually putting together a biobank that aims to have a diverse population of a million volunteers.

There are dozens and dozens of collections like this scattered around the world, some in private hands and others accessible to scientists. Nobody knows exactly how many of these collections exist, but "broadly we're talking about at least millions of people," says Ewan Birney, co-director of the European Bioinformatics Institute.

He is part of an effort to find ways to link some of these resources together so scientists can quickly see how a discovery in one group applies to people with different ancestries. Birney says even though most of the initial work has been in European populations, a lot of it is relevant to everybody.

"How genetics works in different countries — sort of a surprise — is that very often the genetics is pretty much the same as you move between different countries," Birney says.

Where biobank study conclusions can be misleading is in the details. The same genes and proteins are involved in diseases such as diabetes, but the variants that can affect a person's risk of disease differ based on a person's genetic heritage.

Birney expects that the new and linked databases not only will help identify issues of concern to a particular ethnic group but will identify genes that are important for everybody's health. He's particularly eager to learn what comes out of a biobank project taking shape in sub-Saharan Africa.

"Because Africa is the birthplace of humans, there's the highest amount of genetic diversity inside of sub-Saharan Africa," he says. "And it's really clear if you are a geneticist, we should be spending an awful lot more time studying humans there."

Birney is mindful of simply allowing scientists from rich companies to swoop in on this resource, so right now the African scientists developing biobanks will have an opportunity to study the data first. Birney says it's "really important that we do that in a way that is empowering and enabling for the scientists who come from these different countries."

Manrai at Harvard is tapping into data that's already available, including medical databases curated by the National Institutes of Health and the Centers for Disease Control and Prevention.

"I think understanding ancestry, race, ethnicity is an area that we're going to see a tremendous amount of work in over the next 10 years," he says.

You can contact NPR Science Correspondent Richard Harris at rharris@npr.org.