Brewster Kahle, founder of the Internet Archive, a nonprofit organization devoted to preserving Web pages, at his book repository in Richmond, Calif., in 2012. Kahle has started amassing physical texts in case they're needed for future digitization — and because he abhors throwing them out. Lianne Milton/The New York Times/Redux

At The Guardian’s 2013 Activate conference in London, the computer scientist and Internet founder Vint Cerf, when asked about the future of libraries in the digital age, expressed concern. “I am really worried right now about the possibility of saving bits but losing their meaning and ending up with bit rot,” he said. “You have a bag of bits that you saved for a thousand years, but you don’t know what they mean because the software that was needed to interpret them is no longer available or it’s no longer executable … This is a serious, serious problem, and we have to solve that.” “Bit rot”? The term is nightmarish, conjuring images of a computer system gone haywire, cannibalizing itself from the inside. The phenomenon it describes — the self-erasure of computer bits, caused by aging software’s obsolescence, leading to an irrevocable loss of data — directly contradicts the popular belief that digital data are permanent. Comparatively, the fire at the Library of Alexandria was more straightforward. But bit rot — and its perceived threat — is contested in the library and archival communities. Some say it exists, while others call it a joke, “the digital equivalent of ‘my dog ate it.’” Even among the believers, its definition is murky, often contradicting itself. The tech blog Ars Technica describes it as “a random bit here or there” flipping and erasing itself, while Cerf’s description relies more on the planned obsolescence of the software used to read those bits. Compared with paper, the turnaround for corruption is astonishingly short. Floppy discs from 1985, the Software Preservation Society notes, “are frequently found rotten.” Meanwhile, the Abusir Papyri, a series of administrative documents dating to ancient Egypt’s Old Kingdom, are more than 4,000 years old and still legible. Jane Mandelbaum, project manager of the Library of Congress’ IT office, is emphatic when she tells Al Jazeera, “‘Bit rot’ is not a term that we use in the library. It’s not a term that we use in the IT part of our IT infrastructure.” “We talk about bit preservation,” says Leslie Johnston, chief of the library’s repository development.

The loss of one bit, then, is more akin to the loss of a page number in a book’s index – irritating but hardly a guaranteed disaster.

Why not talk about bit rot? According to Thomas Youkel, chief of the library’s systems engineering and networking, the term is misleading. Bit degradation is, by design, expected. “Statistically, it’s more likely that a bit is going to change. If you lose one pixel, it’s not a bad thing. You’d still have a picture … This is a technical term, but if you lose a bit in a pointer, you might lose something.” (A pointer controls and orders the data of a program.) The loss of one bit, then, is more akin to the loss of a page number in a book’s index — irritating but hardly a guaranteed disaster. Nancy McGovern, head of curation and preservation services at the Massachusetts Institute of Technology, shares this ambivalence. “Bit rot is an issue for digital content,” she writes in an email, but preservationists guard against this by making many digital copies of an original object and data and storing these copies across multiple locations. “Bit rot can affect an object, but not all copies would degrade at the same rate,” she says. Creating these copies is key to digital preservation’s process. The Library of Congress’ Carl Fleischhauer says, “Our stratagem is to immediately migrate the content” received onto “safer, more secure storage systems.” At the Library of Congress, checksums — “a mathematical way of saying that this is the state of the file,” explains Johnston — are used to monitor the material over time. Data received on more outdated and vulnerable formats, such as personal hard drives or CD-ROMs, are transferred to disc images, after which labels are created and photographed for documentation purposes. The labels are monitored for degradation alongside the data they describe. Throughout, Youkel says, “you have to actively manage the data. And that’s what we do.” National and academic libraries monitor their on-site systems. But as digital formats like e-books become increasingly popular, prompting public libraries to make the transition from analog to digital, the real threat might be a question of ownership and accessibility, not bit rot.

Digital archives, which rely on Internet access and electricity, are inherently less stable than their print counterparts.

BiblioTech, the nation’s first all-digital public library, opened its doors in Bexar County, Texas, in September 2013. Since then, it has proved popular with library patrons, and a second branch opened in January. For head librarian Ashley Eklof, maintaining digital data is not yet a concern, but it will be. “When you talk about bit rot,” she tells Al Jazeera, “I think library vendors, the digital vendors, are going to be facing that much more with how they host the material … We will, once we get digital content from independent authors,” whose e-book files will be monitored or maintained not by outside vendors but by the libraries. For now, BiblioTech does not host its files on site, unlike the Library of Congress. Rather, it uses a cloud-based arrangement maintained by 3M Library Systems, which stores the e-books on its servers, which are out of state and not accessible to BiblioTech’s librarians. This could leave the library and its 800 e-readers without content in the event of a technological glitch or Internet failure outside its control. “If for whatever reason the Internet stops, just is not there,” says Eklof, “then it’s very difficult to ensure access to that content … [If] all that stuff is gone, then we need a … copy, whether that’s print or whether that’s an e-book on a flash drive.” This vulnerability might explain why in 2011 the Internet Archive, a nonprofit dedicated to preserving the Web via screen captures accessible through its website, the Wayback Machine, announced that it would begin preserving paper books alongside its digital content, a “physical archive of the Internet Archive.” For all of their perceived ease and flexibility, digital archives, which rely on Internet access and electricity to preserve and present their content, are inherently less stable than their print counterparts.

This is, in essence, bit rot by design: data erasing itself after a certain amount of time.