This is how Project Gutenberg is able to publish all these science fiction stories from the 50s and 60s. Those stories were published in issues of magazines that didn't send in the renewal form. But up til now this hasn't been a big factor, because 1) the big publishers generally made sure to send in their renewals, and 2) it's been impossible to check renewal status in bulk.

Up through the 1970s, the Library of Congress published a huge series of books listing all the registrations and the renewals. All these tomes have been scanned -- Internet Archive has the registration books—but only the renewal information was machine-readable. Checking renewal status for a given book was a tedious job, involving flipping back and forth between a bunch of books in a federal depository library or, more recently, a bunch of browser tabs. Checking the status for all books was impossible, because the list of registrations was not machine-readable.

But! A recent NYPL project has paid for the already-digitized registration records to be marked up as XML. (I was not involved, BTW, apart from saying "yes, this would work" four years ago.) Now for anything that's unambiguously a "book", we have a parseable record of its pre-1964 interactions with the Copyright Office: the initial registration and any potential renewal.

The two datasets are in different formats, but a little elbow grease will mesh them up. It turns out that eighty percent of 1924-1963 books never had their copyright renewed. More importantly, with a couple caveats about foreign publication and such, we now know which 80%.

This was announced back in May, but I don't think it got the attention it deserved. This is a really big deal, so I had no choice but to create a bot. Here's Secretly Public Domain, which highlights unrenewed works that have already been scanned for Hathi Trust. This only represents 10% of the 80%, but it's the ten percent most likely to be interesting, and these books have the easiest path towards being available online.

August 9 update: topline number is closer to 73%, next steps for the public domain books, and how to get the data on your own computer.