Books entering the public domain are not easy to figure out. This year most books from 1923 entered the public domain with, more to follow every year. But there’s another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed.

One of the big problems of finding out more recent titles that have or should be entering the public domain falls on to the US Copyright Office’s record, which are certainly not organized in a way that made it possible to easily cross-check a work with its registration and renewal. Almost all of these records are not digitized.

The Internet Archive has taken upon themselves to make a ton of books and their registration records machine-readable. This has allowed the New York Public Library to hire a bunch of people to convert these records to XML, making them available to be data mined.

They found that 80% of books published between 1924-1963, should be in the public domain and estimated that the missing number of titles are huge, over 640,000 of them. Hathi Trust has taken this data, which accounts for only 10% of the 80% of the books that should have entered the public domain and published them online. These books have

the greatest chance to be made available next year to be sold online and converted to ebooks.

