Easy DNA Identifications With Genealogy Databases Raise Privacy Concerns

Enlarge this image toggle caption Randy Pench/Sacramento Bee/TNS via Getty Images Randy Pench/Sacramento Bee/TNS via Getty Images

Police in California made headlines this spring when they charged a former police officer with being the Golden State Killer, a man who allegedly committed a series of notorious rapes and murders in the 1970s and '80s.

Authorities revealed they used DNA from a publicly available genealogy website to crack the case.

Since then, police around the country have started doing the same sort of thing to solve other cold cases.

That prompted Yaniv Erlich, the chief science officer at the Israeli company MyHeritage, to investigate just how easy it is to use public genealogy databases to track down people.

"We wanted to quantify how powerful this technique is to identify individuals," Erlich says. So he and his colleagues analyzed the genomes of 1.28 million people in the company's database.

In a paper published Thursday in the journal Science, the researchers projected that they could identify third cousins and more closely related relatives in more than 60 percent of people of European descent. (They chose this group because most people in their database have that ancestry.)

"It's kind of like each person in this database is a beacon that illuminates hundreds of distant relatives," Erlich says. "So it's enough to have your third cousin or your second cousin once-removed in these databases to actually identify you."

And when the researchers combined their strategy with other information, such a specific geographic area or the approximate age of a person, they could quickly reduce a list of possibilities to just a few people.

"Of course, you need the genealogical records. You need to do the work. But you have enough power to to get very close," Erlich says.

And that's not all. Erlich estimates that as his and other databases grow, investigators will essentially be able to identify anyone in the United States within that ethnic background within a few years.

"It seems that very quickly we can get virtually to nearly everyone," Erlich says.

In another part of the study, the researchers went even further to see if they could do the same thing with other DNA databases. They were able to use their techniques to identify a supposedly anonymous woman whose DNA was stored in the 1,000 Genomes Project, a National Institutes of Health research database.

"This technique doesn't only get you criminals," Erlich says. "You can also use this technique for other purposes — maybe purposes that could be illegitimate."

And that, he says, raises serious questions about privacy.

"The police currently [are] using these techniques to find ... [murderers] and bad people," Erlich says. "But are we OK with using this technique to identify people in a political demonstration who left their DNA behind? There are many scenarios that you can think about misuse."

But some people involved in genealogical forensics defend the use of the techniques to help solves serious crimes.

"I was excited to see this demonstration that genetic genealogy is so powerful," says Ellen Greytak, director of bioinformatics at Parabon Nanolabs, Inc., which helps police solve crimes this way.

"We're working on these cases that haven't been able to be solved for decades. They are all either homicide or sexual assault. And some of these are horrific," she says.

But Greytak and her colleagues caution that this study suggests the process is easier than it seems.

"There are a number of problematic assumptions made in the study that do not reflect the reality of the work I am doing," writes CeCe Moore, who works with Parabon, in an e-mail. "The study demonstrates the power of genetic genealogy in a theoretical way, but does not fully capture the challenges of the work in practice."

But others argue that the findings underscore the need to make sure people know what they're getting into when they provide their genetic information to genealogy services and other databases.

"When you make those decisions to put the genome out in the world it's really hard to dial it back," Erin Murphy, a professor at the New York University School of Law.

"And more importantly," she says, "you've made a decision not just for yourself but for your siblings, for your distant cousins, people you don't even know you're related to, for your children, for your children's children."

A second paper published Thursday in the journal Cell found that it could be possible to link ancestry databases to older law enforcement DNA databases, giving police yet another potential tool.

"We were trying to pose the question of whether a newer, more modern system of genetic markers could be tested against the old system and still get matches and find relatives," says Noah Rosenberg, a biology professor at Stanford University.

Taking these studies together, some bioethicists and legal experts say they show that it's important to take steps to protect genetic information and make sure people providing DNA samples are aware of the risks.

"We can tell people that we can de-identify their data," says Benjamin Berkman, a bioethicist at the National Institutes of Health, who was speaking for himself, not NIH. "We can tell them about all the procedural and technical safeguards that we've put in place to protect the confidentiality of their data. But I don't think we can promise people anonymity."

As a result, Berkman says, "it's incumbent on anyone collecting and aggregating and sharing genomic data to be clear exactly how the data will be treated and whether there are any risks to genomic privacy."

For his part, Erlich proposes that all genetic information be encrypted to protect the information and enable people to explicitly provide consent for using their data.

"It sounds geeky and complicated, but it's very simple in practice," Erlich says.