Google’s vice-president Vint Cerf has warned that all digitally stored information could be wiped out by tech upgrades, putting the sum total of human knowledge under threat. An author and scientist explains why today’s systems are so vulnerable – and how pioneers are preparing for the worst

A huge amount of the information we consume and transmit in our everyday lives is perilously ephemeral. Every second, thousands of new photographs are uploaded to social media. Most of the images we take today are uploaded straight from a digital camera or a phone, with the picture never actually existing as a physical artefact.

So how will future historians and biographers piece together our lives and times without bundles of diaries, paper letters and professional correspondence? Family photos and emails are important to us personally, but what about more significant losses of our collective heritage? How do we preserve our interaction on Facebook, Twitter, comment threads and citizen journalism across the web? And does the “grey literature” of official reports, briefings and policy statements that are only published online also risk being lost to the future? In a speech last week, Google’s vice-president Vint Cerf warned that a whole century of digital material could be lost.

Google boss warns of 'forgotten century' with email and photos at risk Read more

There are some attempts to preserve this digital data. In 2010, the US Library of Congress signed an agreement with Twitter to archive public tweets sent since the platform’s birth in 2006, and to continue preserving tweets to make this data available for analysis and research. In the UK, the British Library is taking bold steps to rectify what it refers to as the “digital black hole”, where information is lost once it is taken down from a webpage or an entire site shuts down. Since 2004, it has been working to archive websites for future generations, just like paper-based literature. This effort received a huge boost in 2013 when the non-print legal deposit regulations came into force and allowed the British Library, as well as the five other UK deposit libraries, including those at Oxford and Cambridge universities and Trinity College Dublin, to archive all digitally published material. Nearly 5m UK-based websites will be preserved for the historical record, with regular snapshots taken so future historians can track how webpages evolve over time. Online retailers are also getting in on the act – services such as Blurb.co.uk or MySocialBook.com will print a physical photo album from Facebook posts.

But it is not just words and images that we risk losing for ever. Huddie William Ledbetter was an influential American folk and blues musician at the turn of the 20th century, admired as the king of the 12-string guitar. As Lead Belly he is included in the Rock and Roll Hall of Fame in Cleveland, and is considered the godfather of modern music; Elvis Presley, Johnny Cash, Led Zeppelin, the White Stripes, Red Hot Chili Peppers and Nirvana have all covered his tracks. Yet, sadly, many of his original recordings have already been lost to time. Tapes of his sessions have degraded beyond salvaging – the recording on a tape is stored as a magnetic imprint in a thin film of metal oxide, and if this delicate coating flakes off, the music is irretrievably lost.

Facebook Twitter Pinterest Lead Belly. Photograph: Michael Ochs Archives

The sound archive at the British Library is one of the largest such repositories in the world, and the archivists here estimate that around two million of their recordings are fragile and at risk of being lost for ever. These historical recordings exist on large reel-to-reel tapes, cassettes, lacquer discs and even wax cylinders, and are vulnerable not just to physical degradation, but obsolescence and the disappearance of the technology needed to play them. If archivists don’t get to the deteriorating media soon, the very act of trying to copy a recording could destroy it in the process.

Similarily, deciding on the best format to preserve them for the next hundred years relies on anticipating what technology is likely to still be available in the future. Computer hard disks can hold vast amounts of digitised information, but everything is lost if it fails or is wiped. Nasa has had great problems trying to recover and archive old information gathered by its space probes, simply because the knowledge had been lost on what archaic format the images and data had been saved in.

The sound archives don’t save just music, but recordings of pivotal speeches, oral histories, dying languages and sounds of rare or extinct wildlife. But how far should this information conservation extend? How do you decide what cultural output is worthy of being preserved? Are YouTube vloggers such as Zoella or LOLcats-style internet memes worthy?

Facebook Twitter Pinterest Does a picture of a cat in a sink have any cultural worth? Photograph: Florchina/Getty Images/RooM RF

Perhaps we should be thinking not just about our personal or cultural ephemera, but attempting to preserve a core kernel of human knowledge in case the worst were to happen. Plenty of once-great civilisations have collapsed, and our current industrialised society is by no means invulnerable – in fact, due to the intricate interconnectedness of production and economies around the world today, our technological civilisation is perhaps more prone to a sudden collapse than other societies through history. We buy life assurance to help provide for those left behind if we die suddenly; surely it is also rational for us collectively to safeguard our informational heritage, accumulated over the centuries, to help accelerate the recovery of the society after our own?

In fact, there is nothing new about thoughts on protecting the fragility of human knowledge in case of a global catastrophe. The early encyclopedia compilers of the mid-1700s were acutely aware of the volatility of knowledge and the collapse of the ancient civilisations of Egypt, Greece and Rome, leaving behind only fragments of their writing. Denis Diderot specifically considered his Encyclopédie a safe repository of knowledge in case of cataclysm, and compiled not just explicit knowledge but also detailed diagrams of craft skills and practical knowhow.

So how could we improve on such efforts today? Wikipedia is a phenomenal monument to what can be achieved by collective human effort; a bank of more than 4.7m English articles compiled by volunteers writing and editing each other without top-down editorial coordination. Internet theorist Clay Shirky estimates that Wikipedia represents about 100m hours of labour, and a comparison run in 2005 by the science journal Nature found that Wikipedia was comparable in accuracy to the Encyclopaedia Britannica. A tongue-in-cheek Wiki page on the Terminal Event Management Policy proposes the rapid export of the online encyclopaedia to physical media in the event of a global catastrophe. In 2014, PediaPress launched a crowdfunding scheme on Indiegogo to raise $50,000 to print Wikipedia on to 1,000 books of 1,200 pages each, then send this exhibition on an international tour. Unfortunately, this project hasn’t yet come to fruition.

But even though Wikipedia represents a vast repository of information, it is not structured in a way that would guide a post-catastrophe society through stages of recovery. James Lovelock, the originator of the Gaia hypothesis on the natural regulation of the Earth’s climate, argued in 1998 for a Book for All Seasons – a textbook of the most crucial human knowledge, structured in a logical progression. This notion has been picked up by Kevin Kelly, a former editor of the Whole Earth Review and the founder of Wired magazine, with his idea of the Library of Utility on a remote mountaintop. The Long Now Foundation has already started collecting volumes for its Manual for Civilisation.

The Svalbard Global Seed Vault. Photograph: Sergio Pitamitz/National Geographic Creative/Corbis

It is not just factual information that we need to preserve, but also genetic information. The high-yielding crops we grow today are the product of countless generations of artificial selection – ancient genetic tinkering – as we hacked the life cycle of plant species to better serve our own ends. Even disregarding the chance of a global catastrophe, preserving seeds of many varieties of the world’s crop species, as well as wild relatives, as a reserve of genetic diversity will be vital in making sure we can continue to grow food productively as the climate changes. The Svalbard Global Seed Vault, on the remote island of Spitsbergen deep within the Arctic circle, was constructed as an agricultural “save file” specifically in case of a global crisis, and stores around 1.5m seed samples. The facility is secured by blast-proof doors, and was built into the side of a mountain so that even if power is lost the permafrost will keep the seeds naturally refrigerated for centuries.

While books printed on paper are vulnerable to damp or fire, they actually represent a pretty good medium for a long-duration repository of human knowledge compared to inscriptions on granite slabs or computer drives. Books store a relatively high density of information without being too bulky, and no special equipment is required for accessing it. But we are on the brink of game-changing technology: 3D printing, a capability that would never have been dreamed of by the 18th-century encyclopedia compilers trying to describe the making and use of crucial tools. Perhaps in the near future, all that will be needed to reboot civilisation will be a vault with a 3D printer in the corner, a resilient computer database storing designs and key instruction manuals, and a big print button on the wall. The facility could manufacture a quick-start kit for accelerating development – the tools needed to make more tools. Let’s hope that civilisation never needs it.

Lewis Dartnell is a research fellow at the University of Leicester. His latest book, The Knowledge: How to Rebuild our World After an Apocalypse – www.the-knowledge.org – is out in paperback on 5 March. Order a copy for £7.29 including free p&p at bookshop.theguardian.com.

This article contains affiliate links, which means we may earn a small commission if a reader clicks through and makes a purchase. All our journalism is independent and is in no way influenced by any advertiser or commercial initiative. By clicking on an affiliate link, you accept that third-party cookies will be set. More information.