One year ago, a group of professors, librarians, and futurists gathered in San Francisco to discuss how they would go about building a Digital Public Library of America. There were still many questions about the project, which had millions of dollars in charitable funding but hadn’t yet meted out a complete vision of its incarnation. The directors cited Europeana and Wikipedia as examples, but they weren’t sure how a digital library would tackle the problems unique to using published content in America. Despite the hurdles ahead, the founders of the DPLA promised at that conference that a live website would launch in April 2013, come hell or high water.

The founders of the DPLA made good on their promise this week. The organization launched a website on Thursday that allows users to browse more than two million archived books, images, records, and sounds. The content comes from the libraries of institutions like Harvard University, the Internet Archive, and the Boston, Chicago, and San Francisco Public Libraries. The DPLA also makes an API available to anyone who wants to add access to this treasure trove to a third party application.

Last year, Ars wrote about the challenges that the Digital Public Library was facing: “The organization must be a bank of documents, and a vast sea of metadata; an advocate for the people, and a partner with publishing houses; a way to make location irrelevant to library access without giving neighborhoods a reason to cut local library funding. And that will be hard to do.”

A lot of these conflicts are still being sorted out by the DPLA community. But the new website is clean and easy to use, and it undoubtedly represents the possibility of a bright future for the digital library. The challenge now is less about searching for an identity than it is getting everyone on board—from local libraries to big publishers.

Let’s get together

The newly-named Executive Director of the DPLA, Dan Cohen, has no illusions about the difficulty of the task ahead. Ars spoke to him over the phone two weeks ago, before the website and the API went public. He sounded exhausted but excited. “It’s going to be a multi-decade effort,” he admitted.

In the coming years, the DPLA has a few evident goals: getting contemporary works into the database, working with state and regional libraries across the nation to bring their archives under the umbrella of the DPLA, and making sure all of the amassed metadata is clear and concise so that others can use it. The organization currently offers access to items like daguerreotypes, portraits, older scientific articles, pamphlets, and old books.

These items have been donated by other institutions and are hosted on those sites. Currently, when you search at the DPLA website, you’re taken to, say, the archive hosted by the Biodiversity Heritage Library or the Uintah County Library in Utah.

That second archive might represent the more interesting goal of the DPLA: to get small town libraries and archives online and linked up to the DPLA API. Of the 42 state and regional libraries that have digitized all or part of their archives, Cohen says seven are currently searchable through the DPLA site. “I’m going to be trying to get the next 35 to work with us,” Cohen told Ars.

State and regional library access would make America’s history available in a way it’s never been before. Imagine being able to search local newspapers for your great grandparents’ names, or scan through early photos of developing cities throughout the Gold Rush. And the DPLA's open access—allowing bulk downloading and encouraging people to make use of their API—will make analyzing all that information more efficient.

It’s so meta

Making everything orderly in the metadata was the DPLA’s biggest task a year ago. It’s still one of the organization's biggest tasks today. “Metadata can be pretty sketchy and the process behind normalizing all that metadata is a huge part of this..." Cohen said. "We have a data model that is very rigorous and flexible.”

Not everything works smoothly, of course. Last year David Weinberger, a member of the DPLA’s dev team, “described the DPLA’s ‘deep, deep problem’ of ‘duping,’ which happens when two caches of data describe the same book differently, leading to duplicates.” This continues to be a bit of an issue. When using the DPLA’s brand new site, we occasionally saw what looked to be duplicate entries for some search results. Consider these two entries for the same portrait of Edith Hamilton, below.

To solve this, the DPLA is hoping to rely heavily on programming volunteers and library school interns. Cohen told us that he’s already been contacted by library schools to help them set up internships to work on the ingest process (bringing new archives into the DPLA's metadata). And grassroots groups have been set up to bring programmers around the nation together to find uses for the DPLA’s API (more use of the database means more eyes potentially catching metadata oddities and duplicate works).

The big questions

For the moment, the DPLA does not keep its own archives and links out to other digital archives instead. But Cohen hasn’t ruled out eventually keeping a smaller archive of certain items: “I think for the future there’s always a potential for particular kinds of items...in particular e-books.”

Cohen is also distinctly aware that he’ll have to confront publishers for copyrighted works at some point. Waiting 70 years or longer for access to “contemporary” works (if you can call a 70+ year-old work contemporary) would severely undermine the DPLA’s usefulness.

“[We have] projects that are brewing to find alternative licensing for books,” Cohen told Ars. “Whether it is Creative Commons or Library License [where a publisher can designate a period of commercial use before allowing digital library access], we want to explore alternate models where publishers can drop one e-book in the library. Having one copy in the DPLA won’t be a tremendous burden and thinking forward, I think of the DPLA acting as a kind of market maker. [We’re always looking for] ways to get somewhat more contemporary works in the DPLA.”

In fact, the idea that publishers could drop “one e-book” in the library seems like the best way to provide a common ground between people seeking digital access to books and the change-averse publishing industry. Such a model already works in many local libraries, where a branch can lend out only as many copies of an e-book as they have. Users are required to check those books back in when they’re done with them.

Downloading works can also get tricky through the DPLA, but so far it has taken a Creative Commons-like approach. On the website’s FAQ, users are met with a quick explanation. “The copyright status of items in the DPLA varies. Many items are in the public domain. For individual rights information about an item, please check the Rights field in the metadata or follow the link to the digital object on the content provider’s website for more information.” For now, shunting copyright details to the database keepers works, although once the DPLA starts ramping up, more defined rules will probably be necessary.

Working together

The best part of the DPLA? At its core, this is a platform to let other people and organizations draw on the power of the many. The geolocation data included in the metadata has made beautiful maps of the archives possible on the DPLA's own site. Working groups to build apps around the archives have already sprung up and a few have already been built.

One of the neatest so far is Harvard’s StackLife. It takes the data made available by the DPLA and visualizes it as a vertical bookshelf. You can scroll up and down through the blue skeuomorphic books. The deepness of the blue corresponds to how often the book has been accessed, so you get a visual of what has appeared most useful to others. The thickness of the book represents how many pages it has. And the length of the book shows how long ago it was published. Although it’s still a rough app (searching sometimes leads to nothing, clicking on certain books also leads to nothing at times), it’s a pretty cool first attempt at adding depth to the DPLA archives.

Besides the grassroots programmers (mentioned above) who will want to build apps and services around the database, DPLA hopes to add depth to the biggest Internet resources that we have. Specifically, Cohen says that Wikipedia is looking to leverage the DPLA’s archives to add credibility to their crowdsourced information. “Wikipedia is really looking out to link out to other trusted content. I see really natural collaborations there, and with Wikipedia that’s really trying to increase the rigor [of the cited sources].”

For the DPLA, the most important thing is that public reception has been exceptionally good in the past few days. A quick glance at Twitter seems to capture this feeling. "@dpla offers a public API and bulk download facilities. I think I'm going to cry," @sramsay tweeted. @edsu also expressed awe at the project: "Impressed that dp.la appear to have launched with code that's available on Github: opensource not an afterthought." The Digital Public Library of America hopes to be natural thing to citizens of the Web, and hopefully their ingenuity and support can continue the DPLA's momentum going forward.