The approximately 10 percent of all books actively in print will also be scanned before long. Amazon carries at least four million books, which includes multiple editions of the same title. Amazon is slowly scanning all of them. Recently, several big American publishers have declared themselves eager to move their entire backlist of books into the digital sphere. Many of them are working with Google in a partnership program in which Google scans their books, offers sample pages (controlled by the publisher) to readers and points readers to where they can buy the actual book. No one doubts electronic books will make money eventually. Simple commercial incentives guarantee that all in-print and backlisted books will before long be scanned into the great library. That's not the problem.

The major problem for large publishers is that they are not certain what they actually own. If you would like to amuse yourself, pick an out-of-print book from the library and try to determine who owns its copyright. It's not easy. There is no list of copyrighted works. The Library of Congress does not have a catalog. The publishers don't have an exhaustive list, not even of their own imprints (though they say they are working on it). The older, the more obscure the work, the less likely a publisher will be able to tell you (that is, if the publisher still exists) whether the copyright has reverted to the author, whether the author is alive or dead, whether the copyright has been sold to another company, whether the publisher still owns the copyright or whether it plans to resurrect or scan it. Plan on having a lot of spare time and patience if you inquire. I recently spent two years trying to track down the copyright to a book that led me to Random House. Does the company own it? Can I reproduce it? Three years later, the company is still working on its answer. The prospect of tracking down the copyright — with any certainty — of the roughly 25 million orphaned books is simply ludicrous.

Which leaves 75 percent of the known texts of humans in the dark. The legal limbo surrounding their status as copies prevents them from being digitized. No one argues that these are all masterpieces, but there is history and context enough in their pages to not let them disappear. And if they are not scanned, they in effect will disappear. But with copyright hyperextended beyond reason (the Supreme Court in 2003 declared the law dumb but not unconstitutional), none of this dark library will return to the public domain (and be cleared for scanning) until at least 2019. With no commercial incentive to entice uncertain publishers to pay for scanning these orphan works, they will vanish from view. According to Peter Brantley, director of technology for the California Digital Library, "We have a moral imperative to reach out to our library shelves, grab the material that is orphaned and set it on top of scanners."

Image Credit... Abelardo Morell/Bonni Benrubi Gallery

No one was able to unravel the Gordian knot of copydom until 2004, when Google came up with a clever solution. In addition to scanning the 15 percent out-of-copyright public-domain books with their library partners and the 10 percent in-print books with their publishing partners, Google executives declared that they would also scan the 75 percent out-of-print books that no one else would touch. They would scan the entire book, without resolving its legal status, which would allow the full text to be indexed on Google's internal computers and searched by anyone. But the company would show to readers only a few selected sentence-long snippets from the book at a time. Google's lawyers argued that the snippets the company was proposing were something like a quote or an excerpt in a review and thus should qualify as a "fair use."

Google's plan was to scan the full text of every book in five major libraries: the more than 10 million titles held by Stanford, Harvard, Oxford, the University of Michigan and the New York Public Library. Every book would be indexed, but each would show up in search results in different ways. For out-of-copyright books, Google would show the whole book, page by page. For the in-print books, Google would work with publishers and let them decide what parts of their books would be shown and under what conditions. For the dark orphans, Google would show only limited snippets. And any copyright holder (author or corporation) who could establish ownership of a supposed orphan could ask Google to remove the snippets for any reason.

At first glance, it seemed genius. By scanning all books (something only Google had the cash to do), the company would advance its mission to organize all knowledge. It would let books be searchable, and it could potentially sell ads on those searches, although it does not do that currently. In the same stroke, Google would rescue the lost and forgotten 75 percent of the library. For many authors, this all-out campaign was a salvation. Google became a discovery tool, if not a marketing program. While a few best-selling authors fear piracy, every author fears obscurity. Enabling their works to be found in the same universal search box as everything else in the world was good news for authors and good news for an industry that needed some. For authors with books in the publisher program and for authors of books abandoned by a publisher, Google unleashed a chance that more people would at least read, and perhaps buy, the creation they had sweated for years to complete.