A prominent genetics institute recently sequenced its trillionth base pair of DNA, highlighting just how fast genome sequencing technology has improved this century.

Every two minutes, the Wellcome Trust Sanger Institute sequences as many base pairs as all researchers worldwide did from 1982 to 1987, the first five years of international genome-sequencing efforts.

That speed is thanks to the technology underlying genomics research, which has been improving exponentially every couple of years, similar to the way computer tech improves under Moore's Law.

"Up to 2006, the various cycles of new technology and introduction were cutting costs in half for a similar product every 22 months," said

Adam Felsenfeld, a program director at the National Human Genome Research Institute, which invests about as much money in DNA sequencing as the Sanger Institute.

Progress in DNA sequencing has been as breathtaking as any technological change in the IT realm. The Human Genome Project was estimated to cost $3 billion – to sequence a single genome – when it began in 1990, but cost reductions during the decade-long effort drove its actual cost closer to $300 million. By the end of the project, researchers estimated that if they were starting again, they could have sequenced the genome for less than $50 million.

By 2006, Harvard's George Church estimated that his lab could sequence a genome for $2.2 million. In 2007, the sequencing of James Watson's genome was said to cost less than $1

million. Looking into the future, the NIH wants genomes to cost a mere

$100,000 by 2009, and $1,000 five years later.

With dropping costs and increasing speed, a flood of genetic data is flowing out of international institutes across the world. Previous progress was measured in gigabases (billions of DNA base pairs), but now major research centers are stepping up to the terabase level (trillions of bp, as they are abbreviated). (Human genomes contain about 3 gigabases.)

"We're going to go from raw production of 150 gigabases per year to something between two-and-a-half to five terabases in 2008 and double that in 2009," said Felsenfeld. "If things behave like they have in the past, we might be on a Moore's Law-like curve again."

Given the rate of change, it's worth asking: How did this happen? And, more importantly, will costs continue dropping as fast as they have?

Phase One: Manual Labor

Early DNA sequencing efforts were a labor-intensive and slow effort. Humans had to do almost everything to turn the building blocks of DNA – thymine, adenine, guanine and cytosine – into data.

Strands of DNA were replicated over and over, and then sliced with specialized enzymes into different lengths using the chain-termination method developed by Frederick Sanger, and variations on it that followed. Long DNA strands were cut into smaller ones, separated by length in a gel with lanes, and then imaged. Molecule by molecule, the DNA code appeared.

But human involvement meant that this kind of sequencing was inevitably slow. Machines can do this sort of work faster than humans, so biologists began the long engineering task of automating what they did every day with pipettes and their bare hands.

Phase Two: Automation

First, researchers automated the reading of the T's, G's, C's and A's. Then, a better separation system, called capillary electrophoresis, began to take hold in the major research centers. With this system, DNA is sorted inside tubes the width of a human hair instead of within grooves carved into a gel. That allowed the automation of the DNA loading system, leading to more throughput increases and higher speeds.

Those were the technologies available until about 2006. The major research centers optimized these machines and processes and brought the cost of sequencing a base pair down to about 30 cents for 1000 base pairs. That drop in cost was significant, but it wasn't enough. With 3 billion base pairs in a human genome, and the necessity of redudancy in the process, sequencing an individual's particular genome was still a multimillion-dollar dream.

Phase Three: Next-Generation Sequencers

So-called next-generation sequencers from ABI, Illumina, and 454 are now pushing the cost curve down yet again. Jeff Schloss, program director at NHGRI in charge of technology, noted that the new technologies allow for a far greater number of samples in every run. Previous technologies allowed a maximum of a few hundred samples while the new technologies allow up to a million samples per run.

These new sequencers don't use the old chain-termination paradigm. Instead, 454's technology, for example, binds DNA to small beads, which are dropped into tiny wells in a fiber optic chip. In that state, the DNA is essentially waiting to add another molecule to its chain. When that happens, the sequencer picks up which wells used a T or G molecule, indicating which base is next in the sequence. This technique is called sequencing-by-synthesis.

While this process allows for high-throughput sequencing, it does come with the downside that it generates only tiny fragments of DNA data – just a few hundred base pairs long. Given that even tiny genomes include millions of base pairs, assembling those little bits into a complete genome is a major task.

That assembly work falls to computers, specifically computers running bioinformatics software to assemble DNA sequences into genomes. Right now, to complete a single genome, the same DNA needs to be run through a sequencer like 454's dozens of times before there's enough information to puzzle out the right relationships between short sequences.

Still, the NHGRI researchers we spoke with were confident that the next several years would yield major increases in speed and reductions in cost with the current technology.

"We have grantees that are working on systems that can increase the throughput of machines like those produced by 454 or ABI and potentially bring genomes down to ten thousand dollars," Schlossen said. "But they are really interesting tweaks to them."

Phase Four: The Next Generation

The next generation of technologies could come from a new set of companies like Pacific Biosciences and Helicos. The latter company's technology, Schloss said, promises to deliver very long sequences of base pairs, perhaps up to 100,000 bp. That would allow scientists to spot new types of patterns as well as make assembling genomes much easier. Schloss expects that technology to become viable within five years.

What's clear is that the DNA sequencing technology pipeline is deep and ready to deliver innovation and reduced cost for years to come. Within the next decade, nanopores, tiny holes about 1.2 nanometers across, combined with new microscopy techniques, could even allow scientists to "read" individual DNA bases as easily as we read the letters A, C, T, G.

"The ultimate goal would be to use electronic sequencing," Schlossen said. "You'd take genomic DNA, thread it through the pore, and get an electronic readout."

Image: 1. An ABI 3100 sequencer from flickr/Beige Alert. 2. From Nature Genetics. George Church's visualization of the rise of computing power and ability to sequence base pairs of DNA.