But when you are looking at snippets from a mass of different bacteria from the human gut, assembling those snippets is like trying to assemble 100 jigsaw puzzles from a pile of pieces from all 100 puzzles jumbled together, explained Snyder. Any two pieces could be from completely unrelated puzzles — analogous to different species of bacteria — while others could be from multiple copies of the same puzzle — analogous to the same species of bacteria.

If that sounds difficult, the real challenge is being able to tell apart the pieces from puzzles that are almost the same but not quite. And that’s what the researchers’ new technique does. “We assembled one whole genome from this big gemisch, which has never been done before,” said Snyder.

“We normally sequence 100 DNA bases off a 300-base fragment,” he said. “You just get snippets of information.” But using a new informatics approach, Snyder and Batzoglou’s team stitched together larger segments of the genome. “We have a sophisticated algorithm that lets us put together all these pieces — first assembling the snippets into longer, 10,000-base pieces, then the 10,000-base pieces into still-longer fragments, and then those into whole genomes,” Snyder said.

Such long sequences of DNA can span hundreds or even thousands of genes that couldn’t be recovered from short-read sequencing; they can help classify bacteria and other organisms by how related they are to one another; and the long sequences also help identify rare bacteria that might be missed by current methods. “We could assemble either entire genomes or at least very, very large chunks of the genome,” said Snyder.

Great bacterial diversity

Being able to see such long sections of the genome means being able to distinguish not only different species of bacteria, but different strains of the same species. The team tested the technique on a standardized sample of known bacteria and then took it for a spin on the gut contents of a human male. The result revealed not only lots of species, but many different strains of the same species. One bacterial species, for example, included five separate strains — all from one person.

The consequences of having so many different strains are hard to predict, but some strains may be more or less likely to make people ill. For example, many strains of E. coli bacteria live harmlessly and even helpfully in the human gut, while others are lethal. Being able to tell one strain from another could help researchers determine which strains are dangerous and why.

Right now, researchers who want to study virulence have to isolate that strain and then grow it in the lab. But some bacteria don’t grow easily in the lab. If researchers can study the genes that contribute to virulence directly in the mixture of bacteria from a human gut sample, they don’t need to isolate it and grow it in a pure culture. “When you assemble the whole genome, you have a better idea of what the pathogenic genes are. I think it’s going to be very, very powerful for understanding the genetic basis of pathogenesis,” said Snyder.

The new approach will make it easier to construct the evolutionary history of strains of infectious bacteria or viruses, such as Ebola. And the approach can be used in the field to study microbial diversity in healthy people and other animals, as well as in plants, water and soil. “When we put this together now, using these long reads, it’s like an IMAX movie,” Snyder said. “You can see the whole thing much more clearly than with what we do now, which is like an old black-and-white TV.”

Other Stanford-affiliated authors of the paper are postdoctoral scholars Chao Jiang, PhD, and Wenyu Zhou, PhD, and research associate Fereshteh Jahaniani, PhD.

This work was supported by National Institutes of Health (grant 3U54DK102556).

Stanford’s Department of Genetics in the School of Medicine and the Department of Computer Science in the School of Engineering also supported the work.