What is the document as a format and a medium? Even apart from any divide between analog and digital, the document is itself a form that has its own history, one that has long been tied up in ideas about reproduction. In Paper Knowledge: Toward a Media History of Documents, Lisa Gitelman explores the history of different kinds of documents (from printed bills, notes, receipts and tickets to today’s PDFs) and the technologies used to produce and reproduce them: letterpress, mimeograph, xerography, optical scans.

In this installment of the NDSA Insights Interview series I am thrilled to talk with her about the book, in particular, about what her book has to say to those working to ensure long-term access to digital information that is often encoded in PDFs. For the purposes of this audience, our questions focus most directly on ideas about the PDF in the book.

Trevor: For starters, what exactly is a document? In particular, you talk about the function of the document as “know-show” and the “authority of documents.” Could you unpack those ideas for us a bit?

Lisa: Paper Knowledge sets out to describe documents very broadly, as instruments used in the kinds of knowing that are all wrapped up with showing, and showing wrapped up with knowing. Think about identity documents, of course, that you show to the authorities under specific conditions and to be known by them, but think too of any piece of paper you have ever squirreled away just in case and then produced to convince someone of something. Anything–not just paper–can be mobilized this way and might count as a document, but of course it is primarily the affordances and habitual uses of paper that have helped us to understand documents in the ways we do and now to create and use digital documents too.

In terms of authority, documents tend to possess power partly by dint of the institutions for which or within which they circulate, if you think about the many institutions relevant to the credit economy, civil procedure, voluntary association, medical practice, municipal governance, institutionalized education, corporate communication, etc. I wanted to think about documents in this book partly as an antidote to the contemporary fixation with “the book” as touchstone within our ongoing experience of digital mediation, but also as an antidote to “the literary” as cynosure within academic departments of English. There is a lot we need to know about vernacular texts–their pasts and their potential futures–that has nothing at all to do with books or with literature. Modernity is saturate with documents, though of course the document predates the modern; many of the earliest inscriptions found by archaeologists in the Near East are identified as “administrative,” that is, as documents.

Trevor: What do you think we learn about the PDF by considering the various historical episodes in the history of the document? I’d be particularly interested in some of the connections to microfilm, photocopying and faxing.

Lisa: I guess one of the lessons I learn over and over is that the familiarity of present conditions can prevent us from seeing those conditions critically. I understood so much more about PDFs and other digital formats by learning some of the history of earlier media for reproducing documents, like microfilm, photocopies and faxes. Not only is the utopian rhetoric that welcomed microforms in the 1930s a humbling reminder that utopian rhetoric about digital media is, well, rhetoric, but the uses and contexts of these earlier media offer instructive parallels that can help defamiliarize present conditions and so make them visible.

Thinking about microfilm as a medium dependent upon a client/server logic or thinking about photocopies as an origin point for “archive-don’t-delete” type thinking (reproduction as preservation) is of course anachronistic. But there are ways in which the histories of these earlier media feed into and enrich our sense of documents in the present, be they PDFs or other forms.

Trevor: You suggest the PDF was created with the idea of corporate authorship in mind. Could you briefly explain that and suggest some of the implications of that idea of authorship?

Lisa: Because PDF technology involves a separation between those who create files and those who merely read them–with a PDF reader application–the technology helps to structure authorship that is often corporate authorship. (The early web worked this way too, if I can generalize, since browsers are distinct from HTML-editors.) For me the quintessential PDF file is an airline boarding pass, printed out or held open on a smartphone, or else it is the manual that explains the smartphone itself, or else the quarterly statements the smartphone corporation publishes for investors.

Going way back in time, there’s a way in which technologies like this reinstall a sort of monopoly that letterpress printers once enjoyed on the look of printedness. Before typewriters and before a whole raft of documentary reproduction technologies developed after them, only printers in printing houses could print; everyone else was trapped in longhand. We’re used to hearing the present moment celebrated as one of amateur cultural production, of YouTube, selfies, and blogs, but there are ways that authorial power remains structured (pre-programmed) by the material conditions of authorship. Paper Knowledge renders some of the history of the development of PDF at Adobe Systems, yet it also tries to gesture toward a broader story, both about techniques of documentary reproduction and about the office culture of the 1990s from and for which PDFs emerged.

Trevor: At one point, you suggest that the PDF imagines its users and its users reimagine it. I would be curious to have you work through an example or two in this regard.

Lisa: I started in on this in a vague way in answer to your last question. Users tend to imagine PDFs primarily in contrast to other formats with which they are familiar: paper, yes, but also other digital formats like *.doc or *.htm or *.jpg. Like older, non-electronic formats, PDFs can feel fixed, locked, in comparison to other digital formats for text at the same time they can feel “smart” in comparison with digital formats for images.

Meanwhile, I think PDF technology imagines its users distributed across the hierarchy of an org chart, divided into authors and readers, form-makers and form-fillers. Some users resist. Some loathe PDFs as clunky and backward looking. And of course imagining is culturally and historically specific: imaginations–literally, what is imaginable–can change. As it was first imagined by Adobe, PDF technology “solved” a lot of the difficulties that office workers had in the 1990s, in part by reducing the uses of paper as well as the uses of copiers, fax machines, express mail, interoffice mail, airplanes, envelopes and paper clips. In all of our enthusiastic imagination of “the paperless office,” we tend to forget today about those airplanes and their relations with paper.

Trevor: You focus on documents, and a lot of our readers are archivists who focus on records. To what extent are these the same things with shared histories? Do you think your media history approach to documents would be synonymous with a media history of records? Or, do you think it would be something quite different?

Lisa: I have an earlier book called Always Already New: Media, History, and the Data of Culture that is about both records and documents, though cutely so, because the records in question are by and large the phonograph records used for sound recording starting in 1878. Really I think there isn’t much distance between these two terms, though focusing on documents in this recent project has allowed me to focus in particular on techniques of reproduction.

What’s important is the shared impulse to preserve and interpret that defines both records and documents. Or, better, what’s important is the shared impulse to interpret and preserve, since designating something a document or a record is already in some sense to interpret its value to history, its status as potential evidence, as archivable. Archivists are now necessarily at the forefront in thinking about digital records and what their preservation and access must entail.

Trevor: An article in the Guardian titled “Is the PDF hurting democracy?” noted that “a new report by the World Bank suggests that the venerable PDF is keeping valuable information buried in servers, unread and unloved.” From your perspective on the history of the document as media is this a meaningful question?

Lisa: Well, it is interesting, though certainly less salient than it might have been in 2000, before Google started to index PDFs in 2001. These World Bank PDFs are findable, after all, they just aren’t mineable yet in any fully automated way, if I understand it correctly. It may help to remember that there has long been a technical literature, a “gray” literature. I’m thinking of the sort of material that circulates outside formal publishing channels, can prove problematic for cataloguers, and has a relatively short shelf-life because so soon obsolete. PDFs now inhabit a similar, gray logic, if one thinks of technical manuals, reports, price lists, college coursepacks, and–ironically–white papers. These are the kinds of documents it can be a challenge to locate, much less preserve. I don’t think that means democracy is on the skids. Today’s networked environment has helped promote a myth of total information–everything available to everyone–but a myth is just that, myth.

Aside from being potentially problematic for democracy, a set of scientists have had ongoing meetings and discussions about getting “ beyond the PDF .” In this vein, they are interested in developing document formats for scholarly communication that function less as documents and more as structured data . Given that there is this interest, I would be curious to see if you have any thoughts on what resistance in the material or format these kinds of endeavors would need to overcome.

Lisa: As unwise as it may be for me to predict the future, the PDF file has a backward-looking feel to it that begs precisely this question. It may be that future protocols for scholarly communication can displace current publication norms, I don’t know. I’m hopeful. Certainly so much is in flux right now when one considers scientific communication in particular and the success of platforms like arXiv.org. In general I guess one challenge we all face for the future is thinking of ways that the ongoing work of archives and archivists can include and adjust to databases as an important part of the reigning knowledge infrastructure within which we will continue to exist and prosper.