The following guest post is by Cynthia O’Murchu, investigative reporter at the Financial Times, and previously their deputy interactive editor. She is a member of the Open Knowledge Foundation’s Working Group on Open Government Data

At its inception, “Europe’s Hidden Billions”, a joint investigation by the Financial Times and the then newly formed Bureau of Investigative Journalism in London seemed straightforward enough. It would entail compiling a database of recipients of the European Union’s European Structural Funds and using the FT’s extensive correspondent network and the BIJ’s reporting teams to write up and film our findings.

At €347bn over a seven-year period, the Structural Funds programme is the second largest EU subsidy. Though they have existed for decades, there has been little transparency about how the funds are used. For the first time, we would shed light on how these funds – which make up a third of the EU’s budget – are spent and publish details of projects and beneficiaries.

Our findings were published as a four-day series in the pages of the Financial Times and as broadcast documentaries on BBC File on Four and AlJazeera’s People&Power. They include the following:

some of the biggest corporate beneficiaries of a programme intended to support small- and medium-sized companies are multinational corporations such as IBM, Fiat and fashion retailer H&M

some companies are using EU funds to help them move factories to countries with cheaper workforces in spite of rules specifically prohibiting this practice

the programme’s decentralised and weak oversight system rarely punishes fraud and misuse and there is little effective central oversight of how the money is spent

millions of euros continue to be siphoned off by organised crime syndicates in spite of warnings going back decades

Over the course of this eight month project it became clear that the concepts of what EU representatives think of as transparency and what actually allows citizens to easily understand how the 27-member bloc spends the Structural Funds are worlds apart. One may call it ‘obscure transparency’.

As part of its push towards transparency, the European Commission had made it a requirement for member states to publish data on their allocations starting from the current funding round, which began in 2007.

It even provided what initially appeared to be a comprehensive and handy set of portals leading to the lists of beneficiaries of the subsidy’s two major sub-programmes: the European Regional Development Fund (ERDF) and the European Social Fund (ESF).

Our quest however, soon became an odyssey into a Kafkaesque bureaucracy, PDF scraping and broken links.

First, though the law mandates that data are to be published at least annually, we found many documents were outdated, in particular those published by a number of German states. Some links from the portals were broken and when programme officials updated the data, they did not provide redirects.

Second, the loose requirements set out by the European Commission (beneficiary, project name, EU and national amount, year) meant that the data were not easily comparable. Most managing authorities published only the sum total of the “EU funding and national contribution”, while a few others separated out the EU funding. To make the data comparable across all 27 member states, we had to devise and write code for a complex set of formulas. These are described in our methodology. Officials – such as those compiling data on the UK’s ESF programme – in some significant cases ascribed millions to “EU and national funding”, which – while technically correct – was entirely misleading, as all of the funds used in these programmes came from the EU. The reverse was also the case, with projects that received no EU funding being published in the registers. To ensure that amounts quoted in our stories were correct, we placed hundreds of phone calls and emails with officials to double check the amounts.

Third, because the programme is highly decentralised, the data were published on more than 100 websites, in nearly 600 documents and in 21 languages. So, while the information was, in principle, freely available, it was not presented in a way that could be meaningfully analyzed. Bulgaria, for example – which has had its fair share of problems with EU funds – provides barely legible and poorly scanned documents:



Greece, meanwhile, published blank PDF documents:



Others though, such as Poland or Estonia, provided Excel spreadsheets or online databases and are to be commended for their efforts.

In the US, where the Freedom of Information Act has been in force for decades, reporters have long understood the value of data analysis and computer assisted reporting. Though reporters there still struggle with data access laws being different from state to state, in Europe we have the additional problem of languages. FT and the Bureau deployed a multi-lingual team whose skill set covered more than half of the EU’s languages, and was able to negotiate with authorities for data (if what was published didn’t meet required standards). We also plugged Google Translate into our database to translate and make accessible foreign language records for the wider public.

Fourth, formats matter. The majority of the nearly 600 documents were PDFs, some hundreds of pages long. Others were locked with passwords, designed to prevent citizens from reviewing the data.

Finally, despite our gargantuan team-effort, the data we gathered, and made available to the public is but a snapshot. With unlimited resources (which news organisation or citizen has those?) one may have been able to write scrapers to continuously update the data. But for now, I’d be happy if EU lawmakers and officials implementing the policy took note of the following:

Transparency helps win citizens’ trust. Redirects are a good thing. PDFs are not a transparent way of publishing data.

To read our four-day series and listen to and view our documentaries go to:

The database is available at http://eufunds.ftdata.co.uk/.