Gene detectives ID 'anonymous' men in registry

Scientific sleuths identified 12% of "anonymous" men in a genetics registry with publicly available genealogy records. The report raises privacy concerns in an era when more and more genetics data is becoming public.

The registries contain the genomes, or full genetic profiles, of those who volunteered to have their genes analyzed. Since the completion of human genome efforts a decade ago, increasing amounts of such human gene maps have appeared in research registries, even as prices to complete them have fallen to a few thousand dollars. Popular genealogical or ancestry-tracing efforts have also started using genes tied to family names as well, making information about genes that run in particular families public.

"Everyone benefits from more genomic information finding its way into research, offering clues to rare diseases and other ailments," says Yaniv Erlich of the Whitehead Institute for Biomedical Research in Cambridge, Mass., who headed the demonstration effort reported Thursday in the journal, Science. "But we wanted to illuminate the issue of potential privacy issues," he says, not to stop the flood of genomic information helping researchers, but to halt avenues for privacy abuse.

In the study, the team started with men in the international 1000 Genomes Project, an anonymous registry of genomes. They next tied male genes in the registry to the last names of men in publicly available ancestry-tracing sites, finally using age and location information available in the project data to identify about 50 men. Fairly common last names tied to distinct genes in ancestry site records served as the best pointers to the otherwise anonymous project men's identification. The study does not name them, although a test run did identify genetic luminaries such as human genome pioneer Craig Venter of the J. Craig Venter Institute in Rockville, Md., whose information is online without any disguise.

"In one sense, this is a unique situation," applying only to male genes, says geneticist Martin Bobrow of the United Kingdom's University of Cambridge, who was not part of the study. "But in another sense, the critical factor is that many people are putting chunks of their personal DNA sequence into databases which are not well secured from public access."

The 2008 Genetic Information Nondiscrimination Act (GINA) federal law bars insurers and employers from discriminating because of genetic tests. And National Institutes of Health (NIH) officials have hidden age information from the "1000 Genomes Project" used in the demonstration, in response to the report released Thursday. Erlich suggests that a combination of better computing ability combined with more and more genetic information raises a need to prevent yet-unforeseen privacy abuses, while preserving the medical and social benefits. In his own lab, for example, Erlich notes such databases have told parents carrying genes for dangerous rare diseases about their fertility risks.

"I would think someone whose privacy alarm bells would be set off by this is already unlikely to be a participant in a study that made their genome data available," says Princeton genome expert Leonid Kruglyak. "But it is something that will need to be noted in informed consent forms," he says, and perhaps considered in non-discrimination law.

"My genome is all out there and I've suffered no ill effects," says Venter, who was an adviser to lawmakers behind the 2008 law. "I would actually encourage people to put more of their genetic information online."