In 2009, a band of rogue digital preservationists called Archive Team did their best to collect and preserve Geocities. The resulting data has became the basis for at least two works of art: Deleted City and One Terabyte of Kilobyte Age. I think the story of this data set and these works offer insights into the future roles of cultural heritage organizations and their collections.

Let Them Build Interfaces

In short, Archive Team collected the data and made the dataset available for bulk download. If you like, you can also just access the 51,000 MIDI music files from the data set from the Internet Archive. Beyond that, because the data was available in mass, the corpus of personal websites became the basis for other works. Taking the Geocities data as a basis, Richard Vijgen’s Deleted City interprets and presents an interface to the data and Olia Lialina & Dragan Espenschied’s One Terabyte of Kilobyte Age is in effect designed reenactment grounded in an articulated approach to accessibility and authenticity.

An Artwork as the Interface to Your Collection

Some of the most powerful ways to interact the Geocities collection is through works of created by those who have access to the collection as a data set. Working with digital objects means we don’t need to define the way that they will be accessed or made available. By making the raw data available on the web, and providing a point of reference for the data set everyone is enabled to create interfaces to it.

How to make available digital collections and objects?

Access remains the burning question for cultural heritage organizations interested in the acquisition and preservation of digital artifacts and collections. What kinds of interfaces do they need in place to serve what kinds of users? If you don’t know how to make it available in advance what can you do with it? I’ve been in discussions with staff from a range of cultural heritage organizations who don’t really want to wade too deep into acquiring born digital materials without having a plan for how to make them available.

The story of Geocities, Archive Team and these artists suggests that if you can make the data available you can invite others to invent the interfaces. If users can help figure out and develop modes of access, as illustrated in this case, then cultural heritage organizations could potentially invite much larger communities of users to help figure out issues around migration and emulation as modes of access as well. By making the content broadly available, organizations have the ability to broaden the network of people who might contribute to efforts to make digital artifacts accessible into the future.

Collections and Interfaces Inside and Outside

An exciting model can emerge here. Through data dumps of full sets of raw data, cultural heritage organizations can consider embracing the fact that they don’t need to provide the best interface, or for that matter much of any interface at all, for digital content they agree to steward. Instead, a cultural heritage organization can agree to acquire materials or collections which are considered interesting and important, but which they don’t necessarily have the resources or inclination to build sophisticated interfaces to if they are willing to simply provide canonical homes for the data, offer information about the provenance of the data, and invest in dedicated ongoing bit-level preservation. This approach would resonate quite strongly with a more product less process approach to born digital archival materials.

An Example: 4Chan Collection/Data set @ Stanford

For a sense of what it might look like for a cultural heritage organization to do something like this we need look no further than a recent Stanford University Library acquisition. The recent acquisition of an archive of a collection of 4Chan data into Stanford’s digital repository shows how a research library could go about exactly this sort of activity. The page for the data set/collection briefly describes the structure of the data and some information and context about the collector who offered it to Stanford. Stanford acts as the repository and makes the data available for others to explore, manipulate and create a multiplicity of interfaces to. How will others explore or interface with this content? Only time will tell. In any event, it likely did not take that many resources to acquire it and it will likely not require that much in resources to maintain it at a basic level into the future.

How to encourage rather than discourage this?

If we wanted to encourage this kind of behavior how would we do it? First, off I think we need more data dumps for this kind of data. With the added note that bitsize downloadable chunks of data are going to be the easiest thing for any potential user to right click and save to their desktop. Beyond that, cultural heritage organizations could embrace this example and put up prizes and bounties for artists and designers to develop and create interfaces to different collections.

What I think is particularly exciting here is that by letting go of the requirement to provide the definitive interface cultural heritage organizations could focus more on selection and working to ensure long term preservation and integrity of data. Who knows, some of the interfaces others create might be such great works of art that another cultural heritage organization might feature it in their own database of works.