27 January 2015

More than 25,000 early English texts from 1473-1700 have been released online to members of the public as part of a collaborative initiative led by the University of Oxford's Bodleian Libraries and the University of Michigan Library.

From Shakespeare and Milton to little-known books about witchcraft, cookery and sword fighting, this rich data set comprises fully-searchable text files that can be read online or downloaded in a variety of formats.

This corpus of electronic texts has been created and released by the Early English Books Online Text Creation Partnership (EEBO-TCP), an international collaboration among universities, funders and ProQuest, an information company central to global research. Previously, the texts were only available to users at academic libraries involved in the partnership but the data was released into the public domain on 1 January.

'We are opening up these fantastic books to people who wouldn't normally be able to access them. I'm fascinated to see what people will do with them,' said Michael Popham, Head of Digital Collections at the Bodleian Libraries.

Members of the public, teachers and researchers around the world can now have access to thousands of transcriptions of English texts published during the first two centuries of printing in England. The corpus includes important works by literary giants like Chaucer and Bacon, but also contains many rare and little-known materials that were previously only available to those with access to special collections at academic libraries.

The text-only files are a unique resource for members of the public to browse for curious and interesting topics and titles ranging from witchcraft and homeopathy to poetry and recipes. In addition to browsing and reading text-only versions of these early English books, users of EEBO-TCP can also search the entire corpus, which contains more than two million pages and nearly a billion words. The text has been encoded with Extensible Markup Language (XML), allowing individuals to search for keywords and themes across the entire collection of works, in individual books or even within specific sections of text such as stage directions or tables of contents.

'Searching a record on a library catalogue only takes you so far,' Popham said. 'Now you can search across all 25,000 texts and get results in seconds. It opens up the data for very scholarly uses, for example historical linguists analysing poetry from a specific period, or for members of the public who want to research their family history, their home town or even look up recipes from 400 years ago. The records are available to anybody with a curiosity for history and literature or early English life.'

The release of this huge data set is the culmination of 15 years of work by the EEBO-TCP initiative, which is funded by a worldwide coalition of 150 libraries along with ProQuest and Jisc. The partnership has created standardized, accurate XML-encoded electronic text editions of early printed books. The text files were created by manually keying the full text of each work, based on millions of digital facsimile page images from ProQuest's Early English Books Online resource, a subscription-only database.

'The Bodleian Libraries has been heavily committed for the past 15 years to this extraordinary effort to make texts from this seminal period of history digitally available to members of the public and scholars worldwide,' said Richard Ovenden, Bodley's Librarian. 'We hope this open resource will provide innumerable avenues for new digital forms of scholarly discovery and research.'

Taken together, the entire corpus opens up new research possibilities, particularly in such fields as digital humanities, corpus linguistics and text mining. It also creates opportunities for individuals to create new projects based on the transcriptions. As open data, the files are freely available for individuals to download, manipulate and repurpose or to republish as their own special editions.

Lorraine Estelle, executive director digital resources and divisional CEO of Jisc Collections, said: 'Jisc is proud of the financial support it has provided to the Text Creation Partnership over a number of years. We look forward to the open access transcriptions being used to support new research efforts across the digital humanities, beyond even those that have been made possible by the availability of Early English Books Online. The release of this material is not only a boost to the availability of research data, but a welcome contribution to Jisc's work in support of open access across the disciplines.'