Dark Archives

Hello and welcome to our Dark Archives tour. We’re glad you could join us. I should note right here at the outset that this is a tour of a conceptual space so things tend to get a little crowded. Please make yourself as comfortable as you can. I am required by law to inform you that this tour may involve cognitive risks that include but are not limited to epistemic uncertainty, sudden categorical shifts, and a high likelihood that more questions will be asked than answered.

On a personal note, I’d just like to say what a privilege it has been to work with this material. Dark archives are slippery beasts, and I’m grateful that the Institute has given me the opportunity to work with—whatever it is that I’ve been working with.

Are you ready? Then we can begin.

Some of you are no doubt wondering what a dark archive actually is. I can tell you, I struggle with the same question every day. The Institute’s working definition is: Dark archives are the repositories of human knowledge to which we no longer have operational access. They are the documents that have been lost, even though they still exist and the records that hold information we don’t realize is there.

Dark archives are, by their very nature, nearly impossible to see. We can really only notice them when they’ve been uncovered, or by observing the way they distort the course of human history.

First, let me show you three things that dark archives are not. On the left is an artist’s conception of the burned Library of Alexandria. That great library was once an archive, but when it was destroyed, it was destroyed utterly. It is no dark archive, it is simply gone. Proceeding clockwards, we have an artist’s rendering of the universal theory that connects gravity to quantum mechanics. This theory and countless other pieces of missing scientific knowledge are contained in no dark archive (so far as we know). They are simply unknown. They remain to be discovered. Finally, we have a screenshot of Amazon.com’s homepage. Its database of goods is vast, but Amazon invests considerable resources in ensuring that whatever is there is findable, and, through its network of affiliate links and public relations, ensuring that we know to look. Its archives are bright.

If you’ll follow me through here, we come to the statue of Donald Rumsfeld, former Secretary of Defense for the American United States. The Institute commissioned this statue because Mr. Rumsfeld has become, in a way, our patron saint.

You may recall that in the wake of the decision to conduct a retaliatory invasion of Iraq in 2002, Mr. Rumsfeld infamously tried to explain the problems around planning for war. “There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.”

Known knowns. Known unknowns. Unknown unknowns.

If you think about that formulation, you’ll see that there is an unspoken fourth quadrant. These are the unknown knowns: the things we don’t know that we know. It is appropriate to our field of study that Mr. Rumsfeld left it off.

This brings us to the first artifact of a dark archive that we have on display here. It is a replica of the August 6th 2001 Presidential Daily Briefing. These documents are prepared by analysts in the American government and given to the President to keep him or her abreast of the state of the world. They are classified TOP SECRET.

The August 6 PDB was declassified as part of a political stunt during a series of hearings on the September 11th attacks, wherein terrorists acting under the leadership of Osama Bin Laden had flown jetliners into American buildings. 9/11 Commission member Richard Ben-Veniste was questioning National Security Advisor Condoleezza Rice, and—here, I’ll just play the recording for you.

BEN VENISTE: Isn’t it a fact, Dr. Rice, that the August 6 PDB warned against possible attacks in this country? And I ask you whether you recall the title of that PDB? RICE: I believe the title was, “Bin Laden Determined to Attack Inside the United States.”

This infamous title became a symbol of the information that the United States security apparatus had collected that should have been adequate to warn them of the impending attacks, but was not. Instead of serving as useful intelligence, that information remained hidden, only to be uncovered later as damning evidence of incompetence. The United States knew that an attack was coming, but it did not know that it knew. And so the attack occurred.

As you can see, we are on tricky epistemic ground.

Lest you think that we are only interested in American military matters, consider also this ship’s logbook. I can’t recall which ship’s logbook it is and it doesn’t matter. Indeed, if you look closely, you’ll see that it is not one ship’s logbook, but rather hundreds of thousands of logbooks with billions of individual entries, spread over hundreds of commercial and governmental archives.

Why do we have the logbooks? These days, detailed climate data is extremely valuable. As humanity comes to grips with to what extent and in what way it is changing the climate, you hunger for historical information. This was not always the case: climatology only began to use statistical analysis around the time of World War II, and systematically collected climatological records are quite scarce for earlier years.

However, for a very long time you have been intensely interested in the well-being of the ships you sail, and so there are centuries of recorded data hand-written in globetrotting ships’ logbooks. These include both precise reckonings of location and records of the weather. Alongside information from farmers’ records and birdwatchers’ notes, these logbooks form a massive corpus of climate data which is only now beginning to be unlocked for scientific use by efforts like the oldWeather project. As these projects progress, the logbooks slowly leave our care.

These two objects, the embarrassing Presidential memo and the unexpectedly useful ship’s log nicely describe the contours of our investigations, here at the Institute.

Before we continue, I’d like to pause for a moment and commit some light theory.

Archives as commonly understood are bastions in the war on entropy. From within the archives, a holding action is fought against the ravages of time. The mission is to preserve the few scraps of knowledge, art, and memory you have clawed out of a barely intelligible universe, and to pass them down to future generations.

The basic activities of an archive are collection, storage, preservation, search, and retrieval. Together, these form interrelated but non-identical streams. When the rate of some streams outpace the others, things start to go strange: when the rate of collection and storage outpaces search and retrieval, we begin to lose access to an archive’s contents, and the archive begins to go dark.

How will we know when we have dark archives? Let Saint Rumsfeld be our guide. With his help, we can construct a taxonomy of ignorance. It is analogous but not identical to the operational ignorance of military affairs.

The search-engine-optimized collections of information that dominate public discourse and life are the known available. The information locked behind paywalls, security clearances, trade secrets, redacted reports, and dark conspiracies are the known unavailable—somebody knows this stuff, but it’s not us. Reality as it exists beyond the horizon of human understanding is the unknown unavailable.

Dark archives are the unknown available. They are the reams of knowledge that have been carefully collected and catalogued, but are effectively missing for want of a good analysis system or even clues about where to look. Their contents are effectively unknown.

Even here, our taxonomical task is complicated by realities on the ground.

There are as many ways to slice up archives into known and unknown, available and unavailable as there are factions of humanity. To the people with the right security clearance, the impending destruction of the WTC towers was unknown but available. To those with no security clearance at all, the event was simply unknowable. To the architects of the attacks, it was front and center in their structure of knowledge.

It is no coincidence that so many dark archives are or were secret archives. Secrecy requires limiting access, and as you limit access to an archive, it becomes easier for it to slip into darkness. In this, individual private collections are as vulnerable as any military vault.

Indeed, through these doors, we come to the Hall of Digital Photographs Sitting on a Disc Somewhere. If you are an avid photographer but not prone to engaging in the careful processes of adding good metadata to the thousands of images you take, you yourself may be the proud owner of a dark archive. With a large enough volume of photography, it becomes impossible to maintain good information about what the photos contain.

The relative darkness of our personal archives is always teetering in one direction or another. Perhaps you will one day install powerful image recognition software that will automatically group and categorize the gigabytes of photos. Perhaps instead you will die without taking adequate steps to ensure your digital legacy, and what little knowledge of the archive’s contents that ever existed will be lost entirely.

Indeed, here past the Hall of Digital Photographs Sitting on a Disc Somewhere, we descend to the Crypts of Dead Media. Throughout history, countless means and methods of storing and retrieving information have been tried and many have been left by the wayside. We collect those and bring them here.

The crypts themselves are not safe to travel, but here on the threshold we have an original, barely working terminal from the BBC’s Domesday project. Conceived as an update to the 900-year-old Domesday Book, it is a collection of multimedia images, virtual tours, writings, and other contributions from more than a million contributors, intended to be a comprehensive survey of life in the United Kingdom. Within years, it had become nearly completely inaccessible as the computers and drives needed to access the project became increasingly rare.

For a time, the Domesday project seemed doomed to exist only as a mythological example of the dangers of archiving on unstable platforms. Several efforts were made to re-digitize it, and over and over they failed. On the project’s 25th anniversary, the BBC launched a website called Domesday Reloaded. It is sadly only a partial republishing. The Community Disc has been made available and supplemented with contemporary photos and notes, while the National Disc remains locked away in the darkness.

I said earlier that archives are at the front lines of the fight against entropy. To be precise, the entropy of decay that robs us of the scraps of knowledge we have managed to collect. Here at the Institute, we have come to understand that there is a second kind of entropy at play: the entropy of an archive grown too quickly, which outstrips its maintainers’ abilities to retrieve the information being stored.

Even Amazon, which I earlier offered up as an example of a very bright archive, must constantly struggle against a flood of new products and offerings. Spam content, algorithmically generated books, texts with titles confusingly similar to that of bestsellers, and a rising tide of new products: these are the forces that strain Amazon’s discovery algorithms and threaten to make the great online retailer’s archives go dark. The other great risk is that commerce rather than spam could overwhelm Amazon. If the company were to go out of business, its intellectual property is likely to end up among the Towers of Undeleted User Data from Failed Startups.

This question of retrieval is a slippery and relative one. Your emails, stored on a simple hard drive using an unsophisticated client, may join our collection in the Unpublishable Personal Correspondence Reading Room, while service providers like Google offer powerful search features intended to return your archives to the light.

This ebb and flow—and the publicity that can accompany it—sometimes drives professional archivists mad. From time to time, your popular news media will publish a story about the “discovery” of a “lost” letter or artifact, when in fact the “missing” item was in some archive or other, neatly numbered and catalogued, all along. Indignantly, the archivists ask whether they should publish a press release about “discovering” new things every time they release better tools for indexing and retrieval.

At the Institute, we argue that yes, they should. In the moment, there is very little difference between an unknown known and any other kind of unknown. The difference emerges only later, when the unknown known resurfaces for good or ill.

Dark archives can become a liability, as we saw with the PDB. Archives that appear dark to people working day to day in an organization may be bright to an investigative team with time and a clear agenda to guide their search. In this sense, archives can become a serious liability, both expensive to maintain and containing potentially damning evidence. Our collection of Unexamined Evidence of Corporate Malfeasance is sublimely vast. No wonder so many corporations have comprehensive document destruction policies. Better to have information fall away into the unknown unknown, than risk legal action based on evidence found in an unknown known.

But dark archives also contain treasures. As information retrieval systems get better, newly recovered information may be immeasurably valuable to our descendants, while the harm they might do to their creators fades behind statutes of limitations and brief human lifespans. This is why we archive, after all: our deep belief that someday, someone somewhere, will want to know what we knew.

I have heard that archaeologists on a dig site will purposefully mark off sections to be left unexcavated. This is because they work on a long view. They want to find out what they can about the past, but they know also that their future colleagues will likely have access to better tools. So they leave some evidence undisturbed, purposefully accepting present ignorance to enhance future knowledge.

Might you create a model like that for our gathered information? Could you agree on a commons of files with time locks? Perhaps an amnesty of some kind to those who contribute to the cause? No one can be liable for what’s in them, but they will be open to future generations who will, we hope, have better tools—and better questions—than the ones we can offer.

And here I must conclude our tour. Thank you for visting the Institute, and please feel free to explore the lighted areas on your own. Take notes and pictures if you wish.

But not too many.