History of Project Gutenberg

Overview

The first ebook was available on July 4, 1971, as eText #1 of Project Gutenberg, a visionary project launched by Michael Hart to create free electronic versions of literary works and disseminate them worldwide. In the 16th century, Gutenberg allowed anyone to have print books for a small cost. In the 21st century, Project Gutenberg would allow anyone to have a digital library at no cost. Project Gutenberg got its first boost with the invention of the web in 1990, and its second boost with the creation of Distributed Proofreaders in 2000, to share the proofreading of ebooks between hundreds of volunteers. In 2010, Project Gutenberg offered more than 33,000 high-quality ebooks being downloaded by the tens of thousands every day, and websites in the United States, in Australia, in Europe, and in Canada, with 40 mirror sites worldwide.

Beginning

As recalled by Michael Hart in January 2009 in an email interview: “On July 4, 1971, while still a freshman at the University of Illinois (UI), I decided to spend the night at the Xerox Sigma V mainframe at the UI Materials Research Lab, rather than walk miles home in the summer heat, only to come back hours later to start another day of school. I stopped on the way to do a little grocery shopping to get through the night, and day, and along with the groceries they put in the faux parchment copy of The U.S. Declaration of Independence that became quite literally the cornerstone of Project Gutenberg. That night, as it turned out, I received my first computer account – I had been hitchhiking on my brother’s best friend’s name, who ran the computer on the night shift. When I got a first look at the huge amount of computer money I was given, I decided I had to do something extremely worthwhile to do justice to what I had been given. This was such a serious, and intense thought process for a college freshman, my first thought was that I had better eat something to get up enough energy to think of something worthwhile enough to repay the cost of all that computer time. As I emptied out groceries, the faux parchment Declaration of Independence fell out, and the light literally went on over my head like in the cartoons and comics… I knew what the future of computing, and the internet, was going to be… ‘The Information Age.’ The rest, as they say, is history.”

Michael keyed in The United States Declaration of Independence to the mainframe he was using, in upper case, because there was no lower case yet. The file was 5 K. To send a 5 K file to the 100 users of the pre-internet of the time would have crashed the network, so Michael mentioned where the etext was stored – though without a hypertext link, because the web was still 20 years ahead. It was downloaded by six users. Project Gutenberg was born.

Michael decided to use the huge amount of computer time he had been given to search the literary works that were stored in libraries, and to digitize these works. A book would become a continuous text file instead of a set of pages. Project Gutenberg’s mission would be the following: to put at everyone’s disposal, in electronic versions, as many literary works as possible for free.

After keying in The United States Declaration of Independence (signed on July 4, 1776) in 1971, Michael typed in a longer text, The United States Bill of Rights, in 1972, i.e. the first ten amendments added in 1789 to the Constitution (dated 1787) and defining the individual rights of the citizens and the distinct powers of the federal government and the States. A volunteer typed in The United States Constitution in 1973.

From one year to the next, disk space was getting larger, by the standards of the time – there was no hard disk yet – making it possible to store larger files.

Volunteers began typing in The Bible, with one individual book at a time, and a file for each book.

Michael typed in the collected works of Shakespeare, with volunteers, one play at a time, and a file for each play. This edition of Shakespeare was never released, unfortunately, due to changes in copyright law. Shakespeare’s works belong to public domain, but comments and notes may be copyrighted, depending on the publication date. Other editions of Shakespeare from public domain were released a few years later.

10 to 1,000 ebooks

Its critics long considered Project Gutenberg as impossible on a large scale. But Michael went on keying book after book during many years, with the help of some volunteers.

In August 1989, Project Gutenberg completed its 10th ebook, The King James Bible (1769), both testaments, and 5M for all files.

In 1990, there were 250,000 internet users. The web was in its infancy. The standard was 360 K disks.

In January 1991, Michael typed in Alice’s Adventures in Wonderland (1865), by Lewis Carroll. In July 1991, he typed in Peter Pan (1904), by James M. Barrie. These two classics of childhood literature each fit on one disk.

The first browser, Mosaic, was released in November 1993. It became easier to circulate etexts and recruit volunteers. From 1991 to 1996, the number of ebooks doubled every year, with one ebook per month in 1991, two ebooks per month in 1992, four ebooks per month in 1993, and eight ebooks per month in 1994.

In January 1994, Project Gutenberg released The Complete Works of William Shakespeare as eBook #100. Shakespeare wrote most of his works between 1590 and 1613.

The steady growth went on, with an average of 8 ebooks per month in 1994, 16 ebooks per month in 1995, and 32 ebooks per month in 1996.

In June 1997, Project Gutenberg released The Merry Adventures of Robin Hood (1883), by Howard Pyle.

Project Gutenberg reached 1,000 ebooks in August 1997. EBook #1000 was La Divina Commedia (1321), by Dante Alighieri, in Italian, its original language.

With the number of ebooks on the rise, three main sections were set up: (a) “Light Literature”, such as Alice’s Adventures in Wonderland, Through the Looking-Glass, Peter Pan and Aesop’s Fables; (b) “Heavy Literature”, such as the Bible, Shakespeare’s works, Moby Dick and Paradise Lost; (c) “Reference Literature”, such as Roget’s Thesaurus, almanacs, and a set of encyclopedias and dictionaries. (A more detailed classification was created later on.)

“Light Literature” was the main section in number of ebooks. As explained on the website in 1998: “The Light Literature Collection is designed to get persons to the computer in the first place, whether the person may be a pre-schooler or a great-grandparent. We love it when we hear about kids or grandparents taking each other to an etext of Peter Pan when they come back from watching Hook at the movies, or when they read Alice in Wonderland after seeing it on TV. We have also been told that nearly every Star Trek movie has quoted current Project Gutenberg etext releases (from Moby Dick in The Wrath of Khan; a Peter Pan quote finishing up the most recent, etc.) not to mention a reference to Through the Looking-Glass in JFK. This was a primary concern when we chose the books for our libraries. We want people to be able to look up quotations they heard in conversation, movies, music, other books, easily with a library containing all these quotations in an easy-to-find etext format.”

Project Gutenberg’s goal is more about selecting books intended for the general public than providing authoritative editions. As explained on the website in 1998: “We do not write for the reader who cares whether a certain phrase in Shakespeare has a ‘:’ or a ‘;’ between its clauses. We put our sights on a goal to release etexts that are 99.9% accurate in the eyes of the general reader. Given the preferences our proofreaders have, and the general lack of reading ability the public is currently reported to have, we probably exceed those requirements by a significant amount. However, for the person who wants an ‘authoritative edition’ we will have to wait some time until this becomes more feasible. We do, however, intend to release many editions of Shakespeare and the other classics for comparative study on a scholarly level.”

The etexts, later called ebooks, were stored in the simplest way, using the low set of ASCII, called Plain Vanilla ASCII, for them to be read on any hardware and software. As a text file, a book could be easily copied, indexed, searched, analyzed, and compared with other books.

As explained by Michael Hart in August 1998 in an email interview: “We consider etext to be a new medium, with no real relationship to paper, other than presenting the same material, but I don’t see how paper can possibly compete once people each find their own comfortable way to etexts, especially in schools. (…) My own personal goal is to put 10,000 etexts on the net [this goal was reached in October 2003] and if I can get some major support, I would like to expand that to 1,000,000 and to also expand our potential audience for the average etext from 1.x% of the world population to over 10%, thus changing our goal from giving away 1,000,000,000,000 etexts to 1,000 times as many, a trillion and a quadrillion in U.S. terminology.”

1,000 to 10,000 ebooks

From 1998 to 2000, the “output” was an average of 36 ebooks per month.

Project Gutenberg reached 2,000 ebooks in May 1999. EBook #2000 was Don Quijote (1605), by Cervantes, in Spanish, its original language.

Project Gutenberg reached 3,000 ebooks in December 2000. EBook #3000 was À l’ombre des jeunes filles en fleurs (In the Shadow of Young Girls in Flower), vol. 3 (1919), by Marcel Proust, in French, its original language.

Project Gutenberg Australia was launched in August 2001.

Project Gutenberg reached 4,000 ebooks in October 2001. EBook #4000 was The French Immortals Series (1905), in English. This book is an anthology of short fictions by authors from the French Academy (Académie Française): Emile Souvestre, Pierre Loti, Hector Malot, Charles de Bernard, Alphonse Daudet, and others.

The output in 2001 was an average of 104 new ebooks per month.

Project Gutenberg reached 5,000 ebooks in April 2002. EBook #5000 was The Notebooks of Leonardo da Vinci, an English version of Leonardo’s early 16th-century writings in Italian. Since its release, this ebook has constantly stayed in the Top 100 of downloaded ebooks.

In 1991, Michael Hart chose to type in Alice’s Adventures in Wonderland and Peter Pan because they would each fit on one 360 K disk, the standard of the time. In 2002, the standard disk was 1.44 M and could be compressed as a zipped file.

A practical file size is about 3 million characters, more than long enough for the average book. The ASCII version of a 300-page novel is 1 M. A bulky book can fit in two ASCII files, that can be downloaded as is or zipped. An average of 50 hours is necessary to get an ebook selected, copyright-cleared, scanned, proofread, formatted, and assembled.

A few numbers are reserved for “special” books. For example, eBook #1984 is reserved for George Orwell’s classic, published in 1949, and still a long way from falling into public domain.

In spring 2002, Project Gutenberg’s ebooks represented 25% of all the public domain works freely available on the web and listed in the Internet Public Library (IPL). The output in 2002 was an average of 203 ebooks per month.

In November 2002, Project Gutenberg released the 75 files of the Human Genome Project, with files of dozens or hundreds of megabytes, shortly after its initial release in February 2001 as a work from public domain.

1,000 ebooks in August 1997, 2,000 ebooks in May 1999, 3,000 ebooks in December 2000, 4,000 ebooks in October 2001, 5,000 ebooks in April 2002, 10,000 ebooks in October 2003. EBook #10000 was The Magna Carta, signed in 1215 and known as the first English constitutional text.

From April 2002 to October 2003, in 18 months, the collections doubled, going from 5,000 ebooks to 10,000 ebooks, with a monthly average of 348 new ebooks in 2003.

The fast growth was the work of Distributed Proofreaders, a website launched in October 2000 by Charles Franks to share the proofreading of ebooks between many volunteers. Volunteers choose one of the digitized books available on the site and proofread a given page, or several pages, as they wish.

EBooks were also copied on CDs and DVDs. As blank CDs and DVDs cost next to nothing, Project Gutenberg began burning and sending a free CD or DVD to anyone asking for it. People were encouraged to make copies for a friend, a library or a school. Released in August 2003, the Best of Gutenberg CD contained 600 ebooks. The first Project Gutenberg DVD was released in December 2003 to celebrate the first 10,000 ebooks, with the burning of most titles (9,400 ebooks).

In September 2003, Project Gutenberg launched Project Gutenberg Audio eBooks, a collection of human-read ebooks, as well as the Sheet Music Subproject, a collection of digitized music sheet and music recordings. A collection of still and moving pictures was also available.

10,000 to 20,000 ebooks

In December 2003, there were 11,000 ebooks, which represented 110 G, in several formats (ASCII, HTML, PDF, and others, as is or zipped). In May 2004, there were 12,600 ebooks, which represented 135 G. With more than 300 new ebooks added per month (338 ebooks per month in 2004), the number of gigabytes was expected to double every year.

The Project Gutenberg Consortia Center (PGCC) was affiliated with Project Gutenberg in 2003, and became an official Project Gutenberg site. Since 1997, PGCC had been working on gathering collections of existing ebooks, as a complement to Project Gutenberg working on producing ebooks. As explained by Michael Hart in February 2009: “The Project Gutenberg Consortia Center has over 75,000 ebooks rendered as PDF files, and some are really quite stunning. The difference? These files were prepared by other eLibraries, not Project Gutenberg, and are using our worldwide distribution network to be seen.”

In Europe, Project Rastko, based in Belgrade, Serbia, launched Project Gutenberg Europe (PG Europe) and Distributed Proofreaders Europe (DP Europe) in January 2004. 100 ebooks were available in June 2005, in several languages, as a reflection of European linguistic diversity.

In January 2005, Project Gutenberg reached 15,000 ebooks. EBook #15000 was The Life of Reason (1906), by George Santayana.

What about languages? There were ebooks in 25 languages in February 2004, and in 42 languages in July 2005, including Sanskrit and the Mayan languages. The seven main languages – with more than 50 ebooks – were English (with 14,548 ebooks on July 27, 2005), French (577 ebooks), German (349 ebooks), Finnish (218 ebooks), Dutch (130 ebooks), Spanish (103 ebooks), and Chinese (69 ebooks).

In July 2005, Project Gutenberg Australia (launched in August 2001) reached 500 ebooks.

Project Gutenberg PrePrints was launched in January 2006 to collect items submitted to Project Gutenberg which were interesting enough to be available online, but not ready yet to be added to the main Project Gutenberg collection, because of missing data, low-quality files, formats which were not handy, etc. 379 ebooks were available in December 2006, and 2,020 ebooks in February 2009.

In December 2006, Project Gutenberg reached 20,000 ebooks. EBook #20000 was the audiobook of Twenty Thousand Leagues Under the Sea (Vingt mille lieues sous les mers, 1869), by Jules Verne, in its English version.

If 32 years and 3 months, from July 1971 to October 2003, were necessary to produce the first 10,000 ebooks, 3 years and 2 months, from October 2003 to December 2006, were necessary to produce the following 10,000 ebooks. There were ebooks in 50 languages in December 2006.

20,000 to 30,000 ebooks

In December 2006, Mike Cook launched the blog Project Gutenberg News as “the news portal for gutenberg.org”, to complement the existing weekly and monthly newsletters. For example, the blog gave a table of the weekly, monthly and yearly production numbers since 2001.

The weekly production was 24 ebooks in 2001, 47 ebooks in 2002, 79 ebooks in 2003, 78 ebooks in 2004, 58 ebooks in 2005, and 80 ebooks in 2006.

The monthly production was 104 ebooks in 2001, 203 ebooks in 2002, 348 ebooks in 2003, 338 ebooks in 2004, 252 ebooks in 2005, and 345 ebooks in 2006.

The yearly production was 1,244 ebooks in 2001, 2,432 ebooks in 2002, 4,176 ebooks in 2003, 4,058 ebooks in 2004, 3,019 ebooks in 2005, and 4,141 ebooks in 2006.

Project Gutenberg Australia reached 1,500 ebooks in April 2007.

Project Gutenberg Canada (PGC) was launched on July 1st, 2007, on Canada Day, by Michael Shepard and David Jones. Distributed Proofreaders Canada (DPC) started production in December 2007. There were 100 ebooks in March 2008, in English, French, and Italian.

Project Gutenberg sent out 15 million ebooks via CDs and DVDs by snail mail in 2007. A new DVD released in July 2006 included 17,000 ebooks. CD and DVD files were also generated as ISO files (since 2005) to be downloaded for burning CDs or DVDs.

Project Gutenberg reached 25,000 books in April 2008. EBook #25000 was English Book Collectors (1902), by William Younger Fletcher.

Project Gutenberg reached 30,000 books in October 2009. EBook #30000 was The Bird Book (1915), by Chester Albert Reed.

30,000 ebooks onwards

Distributed Proofreaders celebrated its 10th anniversary in October 2010, with more than 18,000 books digitized and proofread during ten years by thousands of volunteeers.

Project Gutenberg offered more than 33,000 high-quality proofread ebooks in December 2010, in various formats for any electronic device (computer, PDA, mobile phone, smartphone, and ebook reader).

(Many thanks to Mike Cook, Michael Hart, and Russon Wooldridge for proofreading some parts in previous versions of this essay.)

Copyright © 2010 Marie Lebert

If you liked this post, say thanks by sharing it.