We're getting so good at sequencing DNA that it's beginning to create a data crisis as we struggle to store all the data coming off the sequencing machines. Now, however, some researchers have reversed the process, taking some information—an entire book—and storing it in the form of DNA. And they find that it is actually a very efficient form of storage, although reading and writing remains a hassle.

Why would you want to stick information in DNA? For one, if properly stored, it can be stable for centuries and even millennia, which is as good as most books and better than most digital storage media. And, as the authors point out, we don't have to worry about having any sort of special mechanism (like, say, a DVD player) to get the data back out: "DNA’s essential biological role provides access to natural reading and writing enzymes and ensures that DNA will remain a readable standard for the foreseeable future."

Finally, DNA can easily be stored in a three-dimensional space, something that can't be done with most other forms of media. Earlier methods of storing data in DNA have led to storage densities that were competitive with some of the most elaborate forms of digital storage that have been experimented with, like quantum holography and bits made from 12 atoms. But the earlier DNA work was done with less advanced technology, and the new work uses the latest and greatest to impressive results: an improvement of two orders of magnitude in storage density.

The process was pretty simple. The researchers first converted a book, including 53,000 words and 11 JPEGs, into HTML, throwing in a bit of JavaScript to the process. The HTML took up 5.3 megabits, which were translated to a DNA sequence, one base per bit. (A and G were one, C and T zero.) That stream of bases was then split up into 96-base chunks. Each chunk was linked to a 19-base address, which indicated where within the data it resided, and then was flanked by 22 base sequences that allowed it to be amplified by PCR. All of these sequences were then converted to DNA using a current synthesis machine, and printed on a DNA chip.

To get the data back out, the researchers just used standard, high-throughput DNA sequencing. Normally, that's dedicated to genomes that are far, far larger than this book. As a result, they sequenced the average bit 3,000 times, which provides a rather impressive degree of error correction.

That said, there were 10 errors in their final results. These tended to be in long runs of a single base, like GGGGGG. Since the setup allows alternate coding of bits—G and A are equivalent—they could always go back and redesign their sequence to break these up, changing it to GAGAGA.

Obviously, reading and writing to DNA took a lot longer than doing the same to a disk—days instead of a fraction of a second. But the final density was 5.5 petabytes per square millimeter, over 1,000,000 times that of a typical hard disk. And that's without any form of compression. The error rate was a bit high, but the authors have already identified a way of correcting for it, as mentioned above. They also noted that it would be very easy to add a parity bit to their sequence, which would make catching additional errors even easier.

The big issue is really cost, since making DNA is still much more expensive than flipping a digital bit. But, if this were purely used for archival purposes, then the slow speed and high cost might not be as much of a barrier, given that the end result would last for centuries.

Science, 2012. DOI: 10.1126/science.1226355 (About DOIs).