How Should Scientists' Access To Health Databanks Be Managed?

Enlarge this image toggle caption KTSDESIGN/Getty Images/Science Photo Library KTSDESIGN/Getty Images/Science Photo Library

More than a million Americans have donated genetic information and medical data for research projects. But how that information gets used varies a lot, depending on the philosophy of the organizations that have gathered the data.

Some hold the data close, while others are working to make the data as widely available to as many researchers as possible — figuring science will progress faster that way. But scientific openness can be constrained b y both practical and commercial considerations.

Three major projects in the United States illustrate these differing philosophies.

VA scientists spearhead research on veterans database

The first project involves three-quarters of a million veterans, mostly men over age 60. Every day, 400 to 500 blood samples show up in a modern lab in the basement of the Veterans Affairs hospital in Boston. Luis Selva, the center's associate director, explains that robots extract DNA from the samples and then the genetic material is sent out for analysis.

The blood samples themselves end up in gigantic, automated freezers for future use — one in Boston and a backup facility at a VA location in Albuquerque, N.M.

Even at this early stage of the process, the volunteers' names have been replaced with bar codes. Scientists can still link the DNA findings to the veterans' medical records, but the entire operation is designed to ensure that no personal information can be deduced from the findings.

Only VA scientists and their collaborators are granted access to the vets' medical records and genetic information. Dr. J. Michael Gaziano, a VA scientist and principal investigator of the Million Veteran Program, says that so far there are 30 projects involving this huge data set.

The studies emphasize health issues of concern to vets "in areas of schizophrenia and bipolar disease, in PTSD, cardiovascular disease, diabetes [and] hypertension," Gaziano says.

Gaziano and his colleagues have published the first of those results and have approved 30 research projects in total. It's a start, but a stark contrast to more than a thousand studies currently underway using the much more accessible data set at the British-based UK Biobank, which is a pioneer in this field.

The U.K. project has granted access to 10,000 qualified scientists, who can download its anonymized data and explore it. That effort had a head start: UK Biobank completed its enrollment in 2010, one year before the VA started to collect samples.

UK Biobank has reported no security or privacy issues, but Gaziano still isn't about to make the VA data in the U.S. as readily accessible.

"I don't think that we understand all the security risks as we move into this new era," he says. "So I think we're being quite cautious."

Gaziano is trying to make the data more accessible to scientists in academia, but doing so is complicated by the fact that the data are housed on computers at the VA and the Energy Department; access is strictly controlled.

"We view this as a national resource," Gaziano says, "and it's a national resource that will not only help veterans but will help all Americans and mankind."

Intermountain Healthcare teams with deCODE genetics

Our second example involves what is largely an extended family: descendants of settlers in Utah, primarily from the Church of Jesus Christ of Latter-day Saints. This year, Intermountain Healthcare in Utah announced that it was going to sequence the complete DNA of half a million of its patients, resulting in what the health system says will be the world's largest collection of complete genomes.

"We have families who have been here for three, four, five, six generations," says Dr. Lincoln Nadauld, executive director of precision medicine and genomics, "and under our care at Intermountain Healthcare, we have taken care of families for multiple generations, so we have health information and health histories on those families and patients."

Family trees provide a great shortcut for understanding the genetic basis of disease. To plumb this information, Intermountain has an exclusive deal with a company in Iceland, deCODE Genetics, which is owned by pharmaceutical giant Amgen. This data set will remain a closely held resource, not available to the broader scientific community.

"We don't anticipate sharing this data outside of the Intermountain Healthcare databases, for example," Nadauld says.

DeCODE will do the DNA sequencing and will get to scour that information with an eye toward developing new drugs.

"It would be natural for deCODE and Amgen to do that, given their expertise and experience there," Nadauld says. "Conversely, if there's an opportunity to implement some novel discovery or finding into clinical care, Intermountain Health will be the lead on that." Insights would be published in the scientific literature, he says.

Other highly restricted databases like this one include those from other medical systems, including Geisinger Health in Pennsylvania and Kaiser Permanente, based in California.

NIH's All of Us aims to diversify and democratize research

Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health, behavior and genetics. Its philosophy sharply contrasts with that of Intermountain Health.

"We do have a very strong goal around diversity, in making sure that the participants in the All of Us research program reflect the vast diversity of the United States," says Stephanie Devaney, the program's deputy director.

The program has been budgeted $1 billion in taxpayer money so far, and it's expect to take another five years to recruit the million volunteers. The program anticipates needing another billion dollars to attain its goals. (The fully operational UK Biobank has spent about $300 million, from taxpayers and charities.)

So far, Devaney says, the All of Us program is getting excellent diversity in its samples. It's also striving for good diversity among the researchers who will end up using the data.

"We set up from the beginning, when we [got consent from] our participants, that all different types of researchers would be able to ask for access to the data," Devaney says.

"We are not limited to just folks who work at a certain institution or even who live in the United States. We will be open for foreign researchers, and we will be open for folks into the private sector and the government and academia and even ultimately citizen scientists or community scientists."

(Government officials granted access will not be allowed to use the data for crime-solving or similar activities, Devaney says.)

Program officials still need to work out exactly how they will provide this access while ensuring privacy and security. They would like to put the information on computer servers that scientists can access but which will not allow data to be downloaded. The goal is to make the information secure and as accessible as possible, while not putting too many constraints on how the data can be analyzed.

The philosophy is straightforward: The more easily smart people can see the data, the more likely they are to make discoveries that can benefit us all.

You can reach NPR science correspondent Richard Harris at rharris@npr.org.