The study suggests that microbiome sequencing may someday have utility in criminal investigations, for example, but it also raises questions about how microbiome sequence data should best be handled to protect the privacy of study participants.

“Each of us personally has a specific set of bugs that are an extension of us, just the same way that our own genome is a part of what defines us,” said coauthor Curtis Huttenhower , a biostatistician at the Harvard School of Public Health.

People harbor distinctive sets of microbes, the genetic signatures of which can be used to identify individuals participating in the Human Microbiome Project , according to work published today (May 11) in PNAS .

Huttenhower and his colleagues developed an algorithm to identify microbiome signatures based on both 16S ribosomal RNA sequences and whole metagenome shotgun sequencing. They assessed the abundance of microbial taxons and also of specific microbial genes and other stretches of DNA.

The researchers sought to identify the minimum set of microbiome features that would be necessary to uniquely identify a person, drawing from the field of data transmission. “The coding problem is actually very similar to what’s used to transmit information on the Internet or over a cellphone,” Huttenhower explained. “You want to represent [the information] using a code that’s short—so you don’t use up a bunch of bandwidth—but robust, so that small errors don’t change your message.”

The researchers developed an algorithm that would scan a person’s microbial genetic code and then move through various sequence features searching for patterns unique to that person, starting with features that were generally best at distinguishing individuals. They then trained and tested the algorithm on microbiome data from the Human Microbiome Project, selecting individuals who had their microbiomes sampled multiple times during the study. They determined microbial signatures for 120 people and assessed whether they could use those signatures to match a participant’s original sample to samples taken between 30 to 300 days later.

Sure enough, the team was able to correctly identify a person’s stool microbiome more than 80 percent of the time. When the researchers applied their algorithm to various sites on the body, including the skin, mouth, and vagina, their accuracy dropped to 30 percent. False positives were rare—microbial signatures generally either clearly identified a specific person or did not appear to correspond to any individual.

As microbiome signatures mature, law enforcement or intelligence agents could theoretically track people by looking for traces of them left in the microbes they shed. Mark Gerstein, who studies biomedical informatics at Yale University and was not involved in the new study, suggested, for instance, that one could imagine tracking a terrorist’s movements through caves using their microbiome signature.

Huttenhower and his colleagues were identifying individuals out of pools of just hundreds of project participants, however. It is currently unclear how well the algorithm will perform when applied to the general population, though the researchers estimate that their code could likely pick someone out from a group of 500 to 1,000. “I would expect that number to get bigger in the future as we get more data and better data and better coding strategies,” Huttenhower said.

But the work raises privacy concerns similar to those faced by scientists gather human genomic data. Microbiome researchers are already wary of the human genomic DNA that gets caught up in microbiome sequences, but it increasingly appears that the microbiome sequences themselves are quite personal.

In the genomics field, researchers have increasingly limited access to databases containing human genomic sequencing data. Researchers must apply to use these data. “People might increasingly want to put the microbiome data under the same type of protection that they put normal genomic variants under,” said Gerstein. “Your microbiome is associated with various disease risks and proclivities for X and Y. I don’t think it’s a completely neutral identification. It potentially says things about you.”

E.A. Franzosa et al., “Identifying personal microbiomes using metagenomic codes,” PNAS, doi:10.1073/pnas.1423854112, 2015.