This almost-entirely digital collection, and its unwieldy scale and multiple formats, should sound familiar to all of us. Over the past two decades, we have each become unwitting archivists for our own supersized collections, as we have adopted forms of communication that are prolific and easy to create, and that accumulate over time into numbers that dwarf our printed record and can easily mount into a pile of digital files that borders on shameful hoarding. I have more than 300,000 email messages going back to my first email address in the 1990s (including an eye-watering 75,000 that I have sent), and 30,000 digital photos. This is what happens when work life meets Microsoft Office and our smartphone cameras meet kids and pets.

Now multiply those levels of modern media production by the hundreds of staffers and intense communications of the White House, and you can begin to imagine the quandary of the 21st-century presidential library—and the difficulties that await a future Robert Caro or anyone else who will want to use the Obama presidential collection. While that collection contains a smattering of objects that look like they come from the 20th century, such as handwritten edits by President Obama on drafts of important speeches, there are countless more prosaic documents and communications that might be important to understand his presidency and its era more deeply, and that will help us trace, day by day, the attention given to topics such as climate change or al-Qaeda.

Whether we like it or not, this challenge of a digital record and its enormous scale requires us to use the commensurate power of digital technology to scan and sort these documents, and to provide an interface to make sense of it all. This is not to the exclusion of Caro-like page flipping, but a billion pages are beyond the efforts of even the most dogged, caffeinated researcher. Before you can do some close reading of the Obama collection, it will be necessary to do some distant reading—working with indexing, search, and analytical tools to separate the wheat from the chaff. As Leslie Johnston, the director of digital preservation for NARA, has put it, given hundreds of millions of email messages, we will have to figure out “which of those are the relevant important records and which are, ‘Please go over to the corner and get me a sandwich.’”

The presidential-library system has always been a sandwich itself, with layers of a library and an archive but also museum and educational ingredients, and often a mixture of presidential foundations (and their funding) and public agencies such as NARA and the Smithsonian (and their taxpayer dollars). Over the past two decades, the question of who does what and who pays for what has been a persistent source of tension within the system. The addition of an almost unimaginably large digital record creates a stress that makes some kind of rethinking of the institution a necessity. The digitization of tens of millions of documents is also enormously expensive, and David Ferriero, the archivist of the United States, has wisely pushed NARA to become more digitally capable and to make more of the American record available online.