In the era of cloud storage and ever-recoverable user accounts, the idea of data just “disappearing” can seem downright odd. The EU has had to pass Right to be Forgotten legislation just to require companies to work to make it possible for data to go away. Yet given the sheer volume of data being generated and made available on the Internet these days, can that trend possibly persist?

Tweets already pass out of easy access through Search in just a few weeks’ time. The Internet is beginning to buckle under the weight of user-generated video. Can digital storage media progress fast enough to keep up with mankind’s ability to generate 1’s and 0’s?

Perhaps it doesn’t have to. After all, in DNA evolution has come up with a highly specialized form of storage, incredibly physically compact and sometimes found intact after hundreds of thousands of years trapped in bone. DNA is nature’s hard drive, and while it’s certainly not perfect, it’s also got some cool features that beat even the most advanced digital technology. Recent advances could take DNA’s abilities in data storage from theory to practice, bringing molecular memory into the mosaic of technologies than let mankind store knowledge outside the brain.

The data “crisis”

At the end of the day, it’s a good problem to have: From the Internet to genomic sequencing, too many people want to use this new world’s rich, innovative features. It’s also a potentially debilitating problem that reduces user interest in the Internet, and puts the integrity of potentially important data at risk. If we have so much stuff to store and we can’t afford multiple redundant backups, then eventually power surges and hardware failures will lead to knowledge that fundamentally disappears.

Consider the fact that despite everything we know today, about topics ranging from nuclear fusion to black holes to genetic engineering, we still don’t know, and never will know, just what knowledge was lost in the burning of the Library of Alexandria. You can’t re-invent the thoughts of ancient people, nor can you re-discover the historical insights of unique documents and ledgers once they’ve been turned to ash. It might seem trivial now, but if a tweet passes on to be forgotten and never recovered, isn’t that an equivalent important sort of loss?

The US Library of Congress tried to step up and manage the full archive of Twitter posts a few years ago — but at close to half a trillion messages, the project has stalled and may still never see the light of day. YouTube execs have claimed the video platform is putting up something like 400 new hours of video every minute — a figure that, if accurate, makes it clear why Google has struggled to make the wildly successful business even modestly profitable. With wearables allowing such detailed tracking of personal metrics, this upward trend in data generation is not going to change any time soon.

DNA as next-gen data storage molecule

Back in 2012, ExtremeTech published an article on an amazing breakthrough in DNA science: researchers from Harvard University had managed to store 700 terabytes of information on just a single gram of material. It was an incredible proof of concept, and a reminder of how biology is really just genetic data given form. Yet, in the wake of that discovery, there was a surprising reaction: serious interest. It turns out that long-term storage of a whole, whole lot of data is a more pressing concern than the researchers had anticipated. Since then, they’ve gone on to set up a commercial business based around the idea.

The basic appeal is two-fold: DNA is extremely small and can store incredible amounts of information in a small physical volume, and it has the capacity to last longer than any magnetic or optical signal could ever hope to.

The first of these advantages is hard to overstate: DNA can hold a lot of data. The before-linked 700 terabyte achievement is incredible, but in no way the limit of what nucleic acids could achieve; in theory, one gram of DNA could hold up to 455 exabytes of information. That’s more than all the current digital data in the world, by a huge margin. Even if we only ever achieve 1% of this theoretical capacity, due to inefficiencies and the necessity of having multiple redundant copies for error-checking, that’s still 4.5 exabytes per gram, or 4.5 million terabyte HDDs.

On the other hand, DNA can also be long-lived. This is a bit counter-intuitive, since DNA is actually quite fragile and is notorious for breaking while you’re trying to work with it. DNA isn’t durable, in that you have to keep it in fairly peaceful conditions, but it is stable, in that if you do keep it safe it will remain intact for, potentially, millions of years. Fossilized bone has managed to keep samples safe for tens and even hundreds of thousands of years, so scientists working with high-quality glass and vacuum tubes should be able to come up with something as well.

Making and replicating DNA data has never been easier, with automated systems for creating a tailored DNA molecule from a digital code, and high-throughput replication techniques that can create thousands of copies in just an hour or two. Much of the credit has to go to biological evolution, of course, but also to the scientists who have managed to make use of biology’s highly specialized solutions.

The downside of DNA

On the other hand, DNA isn’t perfect. It’s great as a long-term library, but not as an interactive archive to be accessed quickly and often. In the case of a Twitter archive, DNA may be able to keep us from getting into a Library-of-Alexandria situation, but it couldn’t keep the archive searchable. Not only would the sequencing process be too slow for modern users, but the process of reading DNA introduces some small danger to the molecule itself — and the whole point is to keep this data safe.

That’s why most people are talking about DNA for time capsule-like functions. It’s certainly capable of storing the vast quantity of data that’s ever been uploaded to YouTube, but granting access to that data is another matter.

In addition, it’s recently been pointed out that DNA’s very facility with data storage could be our undoing — we didn’t invent it, after all. There’s an almost unimaginable amount of DNA data out there in the biological world, not counting anything extra we derive from analysis of that information, and sequencing more and more of it is becoming mankind’s primary source of new, raw data. Even YouTube can’t keep up with the biomedical and pure science research sectors in terms of the volume of new data created and in need of storage on a daily basis.

DNA has more than enough storage capacity to fulfill our needs for the near- and mid-term future of data science — but storage isn’t the only thing we’re interested in doing with data. DNA likely has a part to play in keeping our knowledge and history alive for the coming decades, centuries, and millennia, but you’re not going to be running your operating system off of DNA memory any time soon.

New frontiers

In the future, data storage might end up in two distinct technological categories: long-term storage of information with relatively low accessibility, and short-term storage of searchable, easily available data. The short-term option can provide incredible speed, but unimpressive permanence. Nonetheless, to the people of the future, it may seem odd that we were ever willing to trust our digital heritage to the transient electrical states of silicon transistors, rather than the hard-nosed reliability of chemistry.