Razib Khan, a genomics researcher who is head of scientific content at Insitome, a service that interprets DNA for consumers, called the new security research a large-scale demonstration of weaknesses already known to enthusiasts.

Khan says he has been aware of efforts to “scrape” GEDmatch, or collect more data than usual, and believes a larger attack to whisk away much of the data could already have occurred. “My guess is that almost certainly it’s already been done,” he says. “Governments are collecting data on people. You never know what they can use it for.”

Asked if there was evidence the database had already faced concerted attacks, scraping, or scanning, Rogers said, “I don’t want to get into it.”

“Not that I am aware of,” he added. “I don’t know.”

Rogers declined to comment on whether he’d been approached by national security officials about the site.

Crowdsourcing DNA

Rogers started the genealogy service as a way for people to upload DNA test results from services like 23andMe and locate relatives among other users, by comparing their DNA. The crowdsourced database now holds 1.3 million profiles, he says, although some of these are duplicates.

As the site grew, it drew the attention of police investigators. In 2017, police in California announced they had used the database, without Rogers' knowledge, to help identify a murderer known as the Golden State Killer. Police did it by uploading DNA data extracted from crime-scene evidence and comparing it with users’ data to identify some of his relatives.

Since then, dozens of murderers and rapists have been identified using GEDmatch. But a privacy debate erupted as well, partly because police had searched users’ DNA without their knowledge. In response, Rogers allowed users to opt in or out of police searches, or just delete their profiles.

But there was an even broader concern: if a DNA database is large enough, practically everyone can now be tracked though their relatives, even if they never took a DNA test.

With the million or so profiles in the database, most Americans have second or third cousins in it, says Doc Edge, a researcher at the University of California, Davis, who last week posted the first paper showing how ancestry databases could be vulnerable to a clever searcher.

Now the team at the University of Washington has demonstrated a new attack specifically on GEDmatch that is “much stronger,” according to Yaniv Erlich, chief scientist of MyHeritage, another DNA genealogy company.

The researchers exploited the way GEDmatch’s genetic comparison engine works in order to infer the DNA data of other people. “These researchers went in through the main gates—they did not break in,” says Erlich. ”Here we have a method that is not even illegal as far as we know.”

When a user searches for relatives, the program compares thousands of DNA markers (called SNPs) from the user’s genome to those of others in the database. The better the match, the more closely that person is related to you. A parent and child will share half their DNA, for example.

To test his hack, Ney uploaded specially designed “attack” DNA files, which he then compared with target profiles he also created. He found that with a dozen attack files he could infer the nearly all the actual DNA markers of the target profile, even though these are meant to be private.

National security risk

The same attack wouldn’t work on other genealogy sites, like 23andMe, because they don’t permit data uploads. Others, like MyHeritage, do allow uploads but don’t give users as much information about their matches. “The problem with GEDmatch is the browser is too good, and searches too deeply,” says Erlich. “If I were them, I would remove it, fix it, then put it back.”

According to Erlich, the vulnerability has national security implications. If a foreign counterintelligence agency grabbed a million American DNA profiles, that country could use genetic genealogy to identify the true identity of American spies or diplomats, locate their relatives, or discover genetic kompromat like unacknowledged children. Since other countries don’t have such databases for the US to steal, the risk would not be symmetric.

“You could have a capability which is better than what the FBI currently has, and you can use it in any way that you want,” says Erlich. “With the raw data you could come up with even better algorithms. You can identify spies or do genetic surveillance.”

As well, says Ney, fraudsters could create fake accounts and pretend to be someone’s long-lost relative.

Ney says he told GEDmatch of the vulnerabilities in July but is not convinced the tiny company is capable of repairing the problems. Initially, his team gave it a deadline of September to fix the problems, but Ney says he held off posting his report for more than a month when he noticed the site hadn’t been fixed.

“Then, a month ago, they did make a small change to their algorithm that prevents the most significant attack we developed,” says Ney. “Our question is whether those fixes are robust to a determined adversary. It might be a temporary patch.”

GEDMatch, run out of a house in Lake Worth, Florida, is small business whose aim is genealogy and education, not profits, says Rogers. He acknowledged that its team of five part-time volunteers would not have the resources to hire security consultants.

Rogers, who is not a computer programmer, did not offer details about what fixes GEDmatch had implemented. “I let the technical people work on it, and I believe they have,” he said in an interview. He later emailed to say the site was “actively working to add more security measures based on the reported problems.”

Ney says he does not believe the genealogy site is secure. “How much effort does it take to secure a large website with a million-plus in genetic data? I think it’s hard for anyone to do,” he says. “The question I have is whether a volunteer-run effort is capable of having the manpower to handle it.”

Ney also doesn’t believe administrators at GEDmatch have any way of knowing whether or not the trove of DNA data has already been carried off by an attacker, since an attack could look like an ordinary search for relatives.

“They are in a situation of being ignorant, which is its own problem,” says Ney. “The worst kind of attack is where you don’t even know it happened.”