Researchers want reform of European copyright law to allow the use of data mining to harvest facts and data from research papers, a practice which to date has been tightly controlled by journal publishers in Europe.

If passed it could lead scientists to unearthing hidden data connections, perhaps helping to crack intractable research problems, but also dent the core business of publishing companies, which are campaigning for the right to self-regulate how they manage their content.

Publishers automatically block data mining software programmes, which can download and copy vast amounts of papers, whilst distributing special licences to academics and university libraries to use data mining. If copyright law were to include an exemption for researchers, publishers claim intellectual property could be re-sold by unscrupulous researchers and publisher computer systems could be immobilised by the volume of traffic from text miners.

Julia Reda, German Pirate Party member of the European Parliament (MEP), produced an “own initiative report” on copyright reform in June. In an interview with Science|Business, Reda said the European Commission should not shy away from a clash with publishers, as it edges closer to updating copyright to better suit the digital age.

Q. The mandatory exception for text and data mining in your report was reduced to the need “to properly assess the enablement of automated analytical techniques for text and data” after a vote in the Parliament. Some researchers were disappointed with the neutered wording: should they be?

The compromise language is of course a lot less conclusive than my initial draft. Many MEPs still seem afraid of asking for specific expansions of legal activities under copyright law, even though there is a general consensus that the current laws place too high a burden on researchers. I am confident though that the evidence is so strong that the Commission will come out with a mandatory text and data mining exception as part of its copyright reform proposal.

Q. Publishers say the text and data mining market is one they can self-regulate, and there are some sympathetic ears to this idea in the Commission. What do you think?

The idea that the extraction of facts from a text or database should need a permission under copyright law is completely misguided. While text and data mining may be performed on copyrighted works, the facts that are extracted from these works cannot be copyrighted as well, as they don't constitute creative works in their own right. People have been performing text and data mining manually for centuries, for example by counting the frequency of words in texts in order to identify the author.

The only reason why modern text and data mining may conflict with copyright law is because it employs digital technology, which relies on making a copy of a work as a technical by-product of any automated analysis. Just like the temporary copying of files over a network – for example to show a website on a user's computer when she types in the address – is exempted from copyright law, so should be the purely incidental copies made for the purpose of text and data mining.

While it is true that the licencing models offered by publishers are cumbersome and difficult to handle by researchers in practice, the more important argument in favour of a text and data mining exception is that there shouldn't be a need to obtain a licence in the first place, because researchers are already legally accessing the information they want to mine, and in many cases already had to pay publishers a lot of money to be able to do so.

Scientific publishing houses are among the most profitable companies in the world. Their added benefit to society is questionable, however, considering they extract giant sums from our public universities while offloading most of the work related to quality control to researchers employed with taxpayer money. There is absolutely no need to invent additional markets for scientific publishers to exploit.

Q. What has the lobbying effort been like from publishers?

Since the Commission has been quite clear that it wishes to legalise text and data mining through the form of an exception, scientific publishers have focused on lobbying for a limitation of this new exception to non-commercial uses.

There is no factual argument supporting the distinction between commercial and non-commercial use, especially considering that there is no harm to right-holders from text and data mining, that researchers increasingly rely on third-party funding from commercial entities and publish their research results in for-profit journals. Scientific publishers are trying to present a non-commercial restriction as a compromise between the status quo and an exception that actually covers current academic practice. However, after decades of ever-increasing scope of copyright protection, we first need to decisively shift the balance in favour of the free re-use of materials to arrive at a situation that could be described as balanced.

Q. There’s an impression sometimes that the lobbying might of researchers compares poorly with other interests in Brussels. What has been your experience?

Researchers are at a disadvantage when it comes to Brussels lobbying activities because they have to take away time from the research activities they are being paid for, and usually have to travel to hearings in their free time while covering the cost themselves. Industry lobbyists, on the other hand, are paid to influence the policy debates in Brussels.

We ran into this problem on the Parliament's legal affairs committee working group on intellectual property rights, for example. While I regularly push my colleagues to invite more academics to testify instead of just industry groups, several invited experts had to decline because the invitations came too late, the hearings conflicted with their research and teaching obligations and travel costs couldn't be covered. This situation is very dissatisfying because the European institutions have time and again vowed to base their policies on independent scientific evidence, and educational institutions are held in high regard by most politicians.

Academics need to self-organise and communicate to the Commission the urgent need for copyright reform in order for researchers to be able to do their jobs and make efficient use of public funding. Library associations, for example, have been quite successful at making their case for copyright reform and have secured some strong calls for new copyright exceptions in the recent Parliament resolution in June.

The Reda Report is available here . While not binding, it formally sets out Parliament’s position.