Researchers have created a way to store data in the form of DNA, which can last for tens of thousands of years.

The encoding method makes it possible to store at least 100 million hours of high-definition video in about a cup of DNA, the researchers said in a paper published in the journal Nature this week.

The researchers, from UK-based EMBL-European Bioinformatics Institute (EMBL-EBI), claimed to have stored encoded versions of an .mp3 of Martin Luther King's "I Have a Dream" speech, along with a .jpg photo of EMBL-EBI and several text files.

"We already know that DNA is a robust way to store information because we can extract it from wooly mammoth bones, which date back tens of thousands of years, and make sense of it," Nick Goldman, co-author of the study at EMBL-EBI, said in a statement. "It's also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy."

Reading DNA is fairly straightforward, but writing it has been a major hurdle. There are two challenges: First, using current methods, it is only possible to manufacture DNA in short strings. Secondly, both writing and reading DNA are prone to errors, particularly when the same DNA letter is repeated.

Goldman and co-author Ewan Birney, associate director of EMBL-EBI, set out to create a code that overcomes both problems. The new method requires synthesizing DNA from the encoded information. EMBL-EBI worked with California-based Agilent Technologies, a maker of electronic and bio-analytical measurement instruments such as oscilloscopes and signal generators, to transmit the data and then encode it in DNA.

Agilent downloaded the files from the Web and then synthesized hundreds of thousands of pieces of DNA to represent the data. "The result looks like a tiny piece of dust," said Emily Leproust of Agilent.

Agilent then mailed the sample to EMBL-EBI, where the researchers were able to sequence the DNA and decode the files without errors.

This is not the first time DNA has been shown to be an effective method of storing data. Last fall, researchers at Harvard University demonstrated the ability to store 70 billion copies of a book in HTML form in DNA binary code.

The researchers created the binary code through DNA markers to preserve the text of the book, Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves in DNA.

The difference between the two studies is that EMBL-EBI was the first to present an error-correcting code (ECC) that converts zeros and ones to As, Gs, Ts and Cs," according to Goldman, who added that neither school knew of each other's research at the time.

Genetic data is encoded as a sequence of nucleotides recorded using the letters G, A, T, and C, which represent guanine, adenine, thymine, and cytosine.

Goldman said the two schools used similar techniques to store the data in DNA, the main difference being that the ECC made the DNA-storage approach more practical to use.

"We invented an [ECC] that was specially tailored to deal with the types of errors that sequencing technologies --- both synthesis (writing) and sequencing (reading) --- tend to make," Goldman said. "That was important: our experiment worked essentially perfectly, whereas Church's [Harvard's] team experienced errors -- loss of information."

Goldman noted that EMBL-EBI demonstrated that its encoding scheme could be used to store vastly more information than the experiment did.

"We could in principle store all the digital information in the world, billions of times over," Goldman said.

Goldman's team also performed an analysis of the cost-effectiveness of the technology and projections that enabled them suggest what the DNA-storage medium might be practically useful for in the near future. For example, globally- and nationally-important information of historical value) and the medium-term future archiving of information of high personal value that you want to preserve for a couple of generations, such as wedding video for grandchildren to see.In contrast, the Harvard researchers stored 5.5 petabits, or 1 million gigabits, per cubic millimeter in the DNA storage medium. Because of the slow process for setting down the data, the researchers consider the DNA storage medium suitable only for data archive purposes -- for now.

"The total world's information, which is 1.8 zettabytes, [could be stored] in about four grams of DNA," Sriram Kosuri, a senior scientist at Harvard's Wyss Institute and senior author of the paper explaining the science, said at the time.

Researchers are pursuing methods of storing data in smaller and smaller packets because of the tremendous growth of data.

During the next eight years, the amount of digital data produced will exceed 40 zettabytes, which is the equivalent of 5,200GB of data for every man, woman and child on Earth, according to the latest Digital Universe study by research firm IDC.

The majority of data between now and 2020 will not be produced by humans but by machines as they talk to each other over data networks. That would include, for example, machine sensors and smart devices communicating with other devices.

"We've created a code that's error tolerant using a molecular form we know will last in the right conditions for 10,000 years, or possibly longer," Nick said. "As long as someone knows what the code is, you will be able to read it back if you have a machine that can read DNA."

The researchers said the next step in development is to perfect the coding scheme and explore practical aspects, paving the way for a commercially viable DNA storage model.

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed . His e-mail address is lmearian@computerworld.com.

See more by Lucas Mearian on Computerworld.com.