In a retro twist on the Google Books idea, HP has announced a partnership with the University of Michigan library to sell physical copies of over 500,000 rare and out-of-print works, while making the digital versions available online for free.

HP's BookPrep service, currently in beta, will take in raw scans of books, clean them up to prepare them for re-printing, and then offer print-on-demand copies for sale via normal online book distribution channels like Amazon. This new arrangement mixes a number of aspects of existing efforts like Google Books and current print-on-demand (PoD) offerings, while being a little different from either, and in the process it points the way to a real future for the digital contents of libraries' special collections.

All scanned in and no place to go

The first way in which the HP/Michigan deal differs from Google Books is that HP itself is not doing the scanning. Instead, HP is taking advantage of the rare book scanning efforts that are already underway at Michigan—HP just takes Michigan's raw scans and turns them back into books. This basic idea has much wider applicability than just at Michigan, since libraries across the country are currently in the process of digitizing their special collections.

When I was at the University of Chicago, I seriously explored the idea of doing thesis work in the digital humanities. During that time, I learned that most special collections departments at libraries and museums are engaged in some type of high-quality digitization efforts of rare documents—books, scrolls, photographs, and other printed and handwritten matter. These projects generate huge amounts of high-quality image data, but there's currently no way for most of these collections to make that data available to the public. So that data sits unviewed in an archive somewhere, just like the special collection items it represents.

Google, Microsoft, Amazon—these companies should actually start ingesting that data and hosting it, but there are a number of reasons why this doesn't appear to be happening (that's another article, though). In the meantime, a PoD effort like the HP/Michigan collaboration is a good way to make some of this material available in a convenient format that doesn't involve designing a clunky Web-based interface for it.

The PoD aspect of the HP/Michigan effort isn't just about making books available in a convenient, universally accessible format—it's also part of the printer maker's ongoing attempt to keep people printing in the face of the nascent e-paper and e-book revolution.

"People around the world still value reading books in print," said Andrew Bolwell, HP's director of New Business Initiatives, in a press releases. HP clearly hopes that this statement will continue to hold true for some time to come.

Not ordinary PoD

HP's BookPrep is by no means the only PoD service in the world, nor is HP the only on-demand printer. PoD services like Lulu.com and Apple's iPhoto books have deals with on-demand printers that do the actual printing, binding, and shipping for them, and most large printing houses, like R.R. Donnelly, have print-on-demand services in addition to their traditional presses.

What separates BookPrep from the rest is that normal PoD shops take in only print-ready digital files, usually PDFs. BookPrep, in contrast, will take high-resolution scans that aren't fit to print, and automatically clean them up for printing. Take a look at the examples below from HP's BookPrep website, where the original scan is on top and the print-ready copy is below it.

An example of BookPrep's automated image processing. Source: HP BookPrep

This presentation problem is currently the number one barrier to getting most of the aforementioned special collections' material on the Web, even if the institutions that produced the scans could afford to host them (which they can't).

Making an interface that lets you usefully interact with high-resolution scans of papyri, books, handwritten notes, photographs, and the like is a massive undertaking, and there currently exists no off-the-shelf package designed specifically for this purpose.

The University of Chicago, for instance, uses software that was originally designed for medical images to present its "Archaic Mark" manuscript on the Web, but the experience isn't exactly on par with something like Google Maps.

A PoD effort like what HP has announced could be a less painful method for getting special collections material out to the public, at least until either Google or Microsoft realize that they should take this data and adapt online map services to display it. (There are a ton of similarities between book scans and map data, not the least of which is that both involve 3D datasets that are projected into 2D for Web use; yup, most book scanning is now 3D.)

Hopefully, HP will announce more such deals in the near future, because there are plenty more institutions that would love to take the terabytes of raw, high-resolution scans that are sitting on dusty hard drives and make them available to the viewing public.