European copyright law should change to help researchers use computer programs to extract facts and data from published research papers, legal experts have urged in a report (PDF) for the European Commission published today.

The recommendations come just as the UK government is about to pass laws mandating similar freedoms, and could loosen a restrictive legal environment that, researchers complain, has enabled subscription publishers to tightly control the way information can be harvested from online papers.

Automated harvesting, known collectively as text and data mining (TDM), promises to accelerate scientific research. Software could speedily make connections between information on genes and diseases distributed across millions of papers, for example. Today’s report notes that TDM “represents a significant economic opportunity for Europe”, and that it would add tens of billions of euros to the gross domestic product, mainly from higher productivity among researchers.

But Europe’s researchers appear to be doing less computer crawling than those in the United States and Asia, asserts the European Commission advisory group chaired by Ian Hargreaves, an intellectual-property specialist at Cardiff University, UK. “This probably reflects, among other factors, disadvantages created by the European legal framework,” the report says.

Copyright and computer-mining

Researchers argue that when they have lawful access to papers as readers, having their computers access the same papers should require no special permissions: “The right to read is the right to mine”.

But under European law, such mining currently requires the permission of a paper’s copyright holder. At first, publishers of non-open-access journals simply blocked programs that did text mining, and engaged in tortuous negotiations with those who asked for permission to perform it. In the past year or so some subscription publishers, including Elsevier, have said that they are making the process easier, but are asking researchers and institutions to sign extra licensing terms and conditions. Keen text miners say this piecemeal publisher-by-publisher approach won’t scale, and leaves too much control in the hands of the publishing industry. Last year, researchers and librarians walked out of European talks on how to encourage TDM, because, they said, only the licensing approach was being discussed.

The report endorses their view. Recent initiatives to make licensing easier are welcome, but “should be seen as a prologue to legal reform, not an end in itself”, it says. Rather, “a specific and mandatory exception to remove text and data mining for scientific purposes from the reach of European copyright and database law should be drafted”, the panel recommends. And even that is a short-term measure — it would be better instead to overhaul European copyright law entirely, because of its “questionable role in the digital age of presenting a barrier to modern research techniques and so to the pursuit of new knowledge”.

Last week, the UK government published regulations that would make text-mining for non-commercial purposes exempt from copyright restrictions. These changes — which were also recommended by Hargreaves — will come into force on 1 June, if Parliament approves them. But they do allow publishers a degree of technological control over how they allow researchers to access papers — so long as they do not prevent mining altogether.