At the heart of the Paradise Papers is data. And there's a lot of it. Included in the colossal leak were 13.4m files, dated from 1950 to 2016 totalling 1.4 terabytes. This is bigger than WikiLeak's 2010 disclosures, the 2013 offshore secrets, and 2015's Luxembourg tax files.

The only leak involving more data than the Paradise Papers was the related 2.6TB Panama Papers, disclosed in 2016. "In this case, it was even more challenging than the Panama Papers because we had different formats," says Emilia Díaz-Struck, lead researcher for the International Consortium of Investigative Journalists (ICIJ), which co-ordinated the reporting on both the Panama and Paradise Papers leaks.


For the 11.5 million Panama Papers, getting the files into a format where they could easily be searched was relatively straightforward. All of the files – including emails, contracts, transcriptions, and scanned documents – were from law firm Mossack Fonseca. This time around, the picture is more complex: leaked files come from Appleby, Asiaciti Trust, and 19 other registries of offshore tax havens.

"Each one has its own formats and many of those are not machine-readable," Díaz-Struck says. As well as PDFs and emails, there were also less-common file types such as ASPs and PSP. "You need to be able to explore the documents to find out what information is there."

Read next Can a British Fox News work? Can a British Fox News work?

As with the Panama Papers, software company Nuix helped to sort and organise the files. All the documents were indexed and converted to be able to be read by machines. This allowed reporters around the world to begin to understand what the giant data dump actually meant.

What's the deal with Milner, Facebook, Twitter and Russia? Silicon Valley What's the deal with Milner, Facebook, Twitter and Russia?


"We also built a knowledge centre to be able to share all the information with our partners," Díaz-Struck says. Reporting the Paradise Papers was a global operation: 95 media partners and more than 380 journalists from around the world were collaborating on the published stories.

The technical ICIJ team working on the project's backend created the central knowledge centre so the teams involved could find links between the various documents. This included matching the sources of information, linking company names and individuals automatically and allowing lists of potential names to uploaded and compared with those listed in the documents. If someone uploaded Bono's name, for example, it would be matched with instances where it appeared in the documents.

"Appleby works in Bermuda but then we also have information from the Bermuda registry. You would find documents from different sources that connect to the same type of story," Díaz-Struck explains. A public version of this type of database exists from the previous offshore leaks co-ordinated by the ICIJ. The organisation has already added some information from the Paradise Papers. In addition, the ICIJ created an internal social network that allowed all the journalists on the project to discuss the data, and research it in groups based on story interests.

"We need to make our technologies user-friendly so that everyone is able to use it," Díaz-Struck adds. "It doesn't matter if you're in Latin America, Europe, Africa, wherever in the world". To help with this the ICIJ employed a chief technology officer in May this year. It had six engineers and developers working on the systems within the Paradise Papers project.


Crucially, with a large number of reporters working with the documents, the organisation had to maintain the confidentiality of source which provided the original information to German newspaper Süddeutsche Zeitung. A key part of this is protecting the documents. If initial copies leaked it could lead to the identification of those who provided it. Whistleblowers behind the LuxLeaks were prosecuted for exposing documents.

Díaz-Struck says all the systems used by the journalists and ICIJ are encrypted and also have two-factor authentication (2FA) enabled. When logging in, a user would have to verify their identity with a unique code. The ICIJ researcher says many of the journalists involved didn't initially use 2FA in their day-to-day work but now do.

"Everyone is a great journalist but they're not also tech-savy," Díaz-Struck says. "One of the key challenges is how to make it user-friendly enough so everyone understands how to use it securely and doesn't compromise the project."