I recently stumbled upon th e image on the right , which distills the changes in data storage in the past 40-plus years. Perhaps even more amazing to consider is that the future of data storage could become even smaller. The genetic material that stores all the information required to build a person or a pear or a penguin may be the key to creating even smaller data storage that never reaches obsolescence .

In fact, DNA is a proven data storage system with billions of years of reliable use . While your old floppy di sks may no w be unreadable , the tools required to read and copy DNA are present in every genome, making it unlikely that we would lose the ability to decode DNA. These advan tages led scien tists to ask: could DNA also be used to store other types of dat a? Perhaps the information that would normally be encoded by 0's and 1's in your hard drive c ould be stored in sequences based on ACGT's. Our genome is often compared to a computer, where DNA is the code DNA is a proven data storage system

The first publication to propose that DNA could be used for purposes other than building an organism comes in 1999 from Bancroft and colleagues in the journal Science . They suggest that genomic steganography could be a method for storing coded messages in DNA for use in espionage. Using a simple substitution cipher where each codon equals an alphanumeric value, the researchers synthesized a DNA sequence to encode the message "June 6 invasion: Normandy". The message was flanked by sequences to allow the recipient to decode the message.

The final sequence of just 109 nucleotides of DNA was hidden within denatured human DNA and, j

ust like the predecessor

microdots used in espionage , embedded on top of a period in a typewritten message.

Subsequent work from Bancroft's group and others in the early 00's suggested that DNA could help to address the need for increasing data storage. Computer scientists estimate that by 2020, there will be 4.4 x

10 1 9 bytes (44 zettabytes) of digital data; to give you a sense of scale, of digital data; to give you a sense of scale, 1 ZB would be about 152 million years of high resolution video

Even with the advances in storage potential, storing just 1 ZB requires more than 1000 kilograms of the cobalt alloy used to make hard drives. In contrast, 1 g ram of DNA could store 4.6

x 10

bytes.

Early publications

were

proof of principal experiments that aimed

to generate increasingly bigger data files encoded in DNA. The general approach, outlined above, convers a digital file to binary and to DNA.

The

beginnings

were admittedly small, just as scientists had to sequence the genome of E. coli before they could complete the human genome. One problem is that

DNA sequencing technology is improving at much faster rates than DNA synthesis techniques. Essentially, you could read the data you stored faster and cheaper than you could write it. C

reating long accurate strands of DNA had technical and financial limitations. To circumvent this problem, G

eorge Church's lab used multiple copies of short DNA sequences to encode an entire book (53,246 words), 11 JPG images, and a JavaScript program. The paper, published in

also describes the recovery and reassembly process. The following year, a

paper from Ewan Birney's lab at the European Bioinformatics Institute reported a similar approach that increased the file size and decreased

decoding errors

. The final DNA file consis ted of 739 KB of information, including text, pictures, videos, and audio files; they also added a

paper from Watson and Crick describing the structure of DNA.

I

Their storage reached 200 MB and includ ed copie s of t he Universal Declar ation of Human Rights, the top 100 books from Project Gutenberg , and the Cro p Trust seed database ; for fun, they encoded a orage again ( coverage in The Verge ). oft to push the limits of DNA st ton collaborated with Micros n July 2016, researchers from the University of Washing

v ideo f rom the band OK Go

a paper in Science from Most recently,

w Yaniv Erlich and Dina Zielinski , who are details DNA storage . They foun tain coding , eli minate gap s in playb ack. Th e m et hod greatly imp roved the storage densi ty , getting closer to storage sample included decoded by on e of the researcher s' Twi tter followers). While the size of the data was smaller than previous attempts ( only 2.2 M B), the method greatly improved data den sity and rea dability. One problem wi th previous storage methods is that reading the DNA leads to loss of the original sample . While it is easy to amplify DNA , it can sometimes introduce mistakes. Erli ch and Zielinski 's fountain technique permitted error-free amplification even after 10 complete reads . Their work achieved a density o f a Train , an entire computer operating system, a computer virus, and a Amazon gift card (which was quickly the movie The Arrival of included ir DNA The ide). orage (1.83 bits per nucleot retical limit for DNA st heo the t which is currently used by streaming services like Netflix and Spotify to adapted itecture for more efficient. They a new storage arc section of molecular biology and computer science, at the inter orking

2 .15 x 10 1 8 bytes , which would allow storage of all the world's data in the trunk of a car.

Another stumbling block was that

DNA was writable, but not re-writable, which limit the

applications to archival data storage.

Two recent papers

( in Nature Communications and PNAS ) report on a method that allows re-writing of DNA (bringing us from 8 track to cassette tapes) as well as reading

from any point in the sample, rather than from a set start ing spot (bringing us from cassette to CD).

1 gram of DNA can store 4.5 x 10 18 byte



While there has been tre mendous progress in in creas ing the amount and den sity of data s torage, the major r oa dblock continues to be the amo

unt of time it takes to encode and decode data in DNA . Another place where inorganic data storage beat carbon-based products is in the cos t , espe cially of synthesizing DNA. In th e most recent paper , the cost was $ 3,500/MB , while the 2012 paper $12,400/MB.

biologists are teaming up with compu ter scientists to explore the future of DNA data sto by 2040 global memory demand ( Estimates indicate that ources digital data with decreasing res rage. This is largely driven by the need to store increasing amounts of pite these limitations,explore the future of DNA data sto Des

3 x 10 2 4 bytes) will exceed the supply of silicon necessary to build traditional data storage devi ces.

Obsolescence is another shortcoming of current storage methods. Just as it has become difficult to play your cassette tape collection (much to my chag rin) , your old flop pies and ZIP disks are not readable either . Scientists conjecture that because DNA is the basis for life on Earth, we will always have methods for DN A sequencing . This gives DNA a huge advantage for long - term archival storage. Luckily, DNA also has great fide lity o ver the l o ng term. Scientists are increasingly able to recover rea d able s equences from an cient samples of DNA with the best results coming from samples store d at low temperature . Thus, you could imagine a long -term storage system , lik e a secu re server in a remo te tundra, where the DNA back up disk to re-start civilization would be stab le and safe.

zy. The cra This isn't completely Svalbard Global Seed Vault is a huge storage site in the frozen tundra of Norway where scientists and governments are making contributions of plant seeds. The idea is to keep a stock of the original seed in case of the collapse of ci vilization a shoe box-size th e relevant files fro m huma n kind ( that me ans there probably won't be room for cat videos) . It l DNA rne of a thought experiment over beer , not just a reality but a necessity. age, bo or st digitaDNA e to make will continu ce limitations is certain that resour It space there for storing all we could rentshoe box-size I am sure a huge storage site in the frozen tundra of Norway where scientists and governments are making contributions of plant seeds. The idea is to keep a stock of the original seed in case of the collapse of ci

References



