But this isn't a totally honest portrait of how many different books are available, because for books that are in the public domain, often many different editions exist, and the random sample is likely to overrepresent them. "After all," Heald explains, "if one feeds a random ISBN number [into] Amazon, one is more likely to retrieve Milton's Paradise Lost (with 401 editions and 401 ISBN numbers) than Lorimer's A Wife out of Egypt (1 edition and 1 ISBN)." He found that on average the public domain titles had a median of four editions per title. (The mean was 16, but highly distorted by the presence of a small number of books with hundreds of editions. For this reason, statisticians whom Heald consulted recommended using the median.) Heald divided the number of public-domain editions by four, providing a graph that compares the number of titles available.

Paul J. Heald

Heald says the picture is still "quite dramatic." The most recent decade looks better by comparison, but the depression of the 20th century is still notable, followed by a little boom for the most recent decades when works fall into the public domain. Presumably, as Heald writes, in a market with no copyright distortion, these graphs would show "a fairly smoothly downward sloping curve from the decade 2000-2010 to the decade of 1800-1810 based on the assumption that works generally become less popular as they age (and therefore are less desirable to market)." But that's not at all what we see. "Instead," he continues, "the curve declines sharply and quickly, and then rebounds significantly for books currently in the public domain initially published before 1923." Heald's conclusion? Copyright "makes books disappear"; its expiration brings them back to life.

The books that are the worst affected by this are those from pretty recent decades, such as the 80s and 90s, for which there is presumably the largest gap between what would satisfy some abstract notion of people's interest and what is actually available. As Heald writes:

This is not a gently sloping downward curve! Publishers seem unwilling to sell their books on Amazon for more than a few years after their initial publication. The data suggest that publishing business models make books disappear fairly shortly after their publication and long before they are scheduled to fall into the public domain. Copyright law then deters their reappearance as long as they are owned. On the left side of the graph before 1920, the decline presents a more gentle time-sensitive downward sloping curve.

But even this chart may understate the effects of copyright, since the comparison assumes that the same quantity of books has been published every decade. This is of course not the case: Increasing literacy coupled with technological efficiencies mean that far more titles are published per year in the 21st century than in the 19th. The exact number per year for the last 200 years is unknown, but Heald and his assistants were able to arrive at a pretty good approximation by relying on the number of titles available for each year in WorldCat, a library catalog that contains the complete listings of 72,000 libraries around the world. He then normalized his graph to the decade of the 1990s, which saw the greatest number of titles published.

Paul J. Heald