A new book about online research techniques – The Verification Handbook for Investigative Reporting – was released yesterday.

It is sub-titled: 'A guide to online search and research techniques for using UGC [user-generated content] and open source information in investigations.'

And it aims to help journalists sort facts from hearsay when using search engines, social media and publicly-available data to research their stories.

Should Google and Facebook be forced to pay publishers for content? Yes, governments should intervene

No, governments should stay out of it View Results

Loading ... Loading ...

While investigative journalism sounds complicated, and often is, many of the techniques outlined in the book rely more on knowing what to do and having a go.

Press Gazette went through the handbook and selected ten tips for investigative journalists working digitally. The whole book can be read for free online.

1. Know your search engines

There are lots of commands that target your searching more accurately. For example, site:twitter.com will only turn up results from Twitter. (Pictured, an example search from the handbook)

This works with any website (eg. site:pressgazette.co.uk will only search Press Gazette).

Another useful hint, according to Henk van Ess, who trains media media professionals on internet research and wrote the handbook’s chapter on the subject, is adding the word “is” to a search. He uses an example search for information on Ben van Beurden, chief executive of Royal Dutch Shell.

He searches: “van beurden is” AROUND(15) shell.

“The simple two-letter word ‘is’ reveals opinions and facts about your subject. To avoid clutter, include the company name of the person or any other detail you know, and tell Google that both words should be not that far from each other.

The AROUND() operator MUST BE IN CAPITALS. It sets the maximum distance in words between the two terms.”

So this means the word "shell" must appear within 15 words of the words "van beurden is".

More of his tips can be found in chapter two.

2. Don't just rely on Google

There are other search engines apart from Google.

It’s hard to believe, but there are other useful tools for specific purposes. Geofeedia and Echosec are good for finding tweets, Facebook posts, YouTube videos and Instagram photos by location.

Pipl and Spokeo are specialist tools for people research. They search for a subject in multiple databases, social networks and dating websites.

See chapter three of the handbook for more.

3. Everything has a paper trail

Khadija Sharife, senior researcher at the African Network of Centres for Investigative Reporting, writes extensively in chapter four of the handbook about a scam operated by a company called Capital Organisation and its director.

They began with court records relating to previous investigations into the individuals involved with the scam, then moved through Companies House information, databases like Duebil (which allows searches for individuals and corporate directors) and other sources.

Sharife writes that no particular expertise is needed to lift this “corporate veil”. All a journalist needs is curiosity.

4. Verify data twice: at point of gathering and after analysis

Verifying data is necessary before analysis to work out if it is incomplete, needs cleaning or is inaccurate. But Giannina Segnini, a visiting professor at Columbia University New York’s journalism school who wrote the handbook’s chapter on “investigating with databases”, writes that data should be verified after analysis as well.

“It is perhaps the most important verification piece, and the acid test to know if your story or initial hypothesis is sound,” she writes. This is the point at which the journalist should check if their story is correct by following it up with physical evidence.

Segnini adds: “Not even the best data analysis can replace on-the-ground journalism and field verification.”

(Pictured, Segnini's flowchart for investigating data. From the handbook)

5. Explore extreme values of data

Fields marked zero may signal incomplete data, or at least data that could lead to poor conclusions. Segnini takes the example of a World Bank data set on independent evaluations of projects developed by the organisation. Some 53 per cent of entries in the set supposedly had a lending cost of zero.

Segnini writes: “This means that anyone who performs a calculation or analysis per country, region or year involving the cost of the projects would be wrong if they failed to account for all of the entries with no stated cost. The dataset as it’s provided will lead to an inaccurate conclusion.”

6. Geolocate videos using satellite imagery

User-generated content (UGC), like Youtube videos, can be useful to investigative journalists, but should also be treated with caution. If a video claims to show a certain place, a journalist can find satellite imagery of that location. Comparing landmarks (or road layouts) in the video and the satellite image can establish whether the video was indeed filmed where it is supposed to have been, and even the exact position of the camera.

Elliot Higgins, founder of BellingCat and writer of chapter six of the handbook, used this technique to verify videos of rebel victories in the Libyan civil war. He writes that the best way to learn geolocation is to practise it: “Building expertise in satellite map based geolocation was something I did over time, using new tricks and techniques as I moved onto new videos.”

(Pictured, a satellite image of a Libyan town, used by Higgins to verify a video. From the handbook)

7. UGC alone is not enough

In particularly sensitive situations, such as those relating to human rights abuses or war crimes, UGC may not be enough to make a claim.

Christoph Koetti, an adviser on technology and human rights to Amnesty International, writes in his chapter on UGC in human rights investigations: “In a human rights investigation, we compare all facts gathered with relevant human rights norms and laws to make determinations of violations or abuses. Consequently, a single analyst who looks at UGC, such as myself, must be part of a team comprising relevant country, policy and legal experts."

8. Ethical = sustainable

Fergus Bell – head of newsroom partnerships and innovation at SAM, a social media search, curation and storytelling platform designed for the news industry – writes the handbook’s chapter on ethical considerations.

He writes that when communicating with or gathering content from social media sources that might think of themselves as private, even if they are technically open to the public – like subreddits (threads on the website Reddit), Facebook groups or family YouTube channels – it is best to identify yourself as a journalist early on.

When using content from these places, a journalist should ask permission, writes Bell. This also helps as a first step to verifying the ownership and veracity of multimedia content. And all of these steps should help ensure that the community remains open to the journalist for future assignments.

9. When presenting UGC, know what the user wants

After a journalist has got permission to use content from social media, they should ask whether/how it should be credited. Claire Wardle, who designed BBC News’s social media training, writes in chapter nine of the handbook that there is no industry standard when it comes to paying for or crediting UGC.

Best practise is to credit users how they want to be credited, whether that is with full name, twitter handle, or not at all.

10. Check and double-check

A somewhat experimental one to finish. Dr Hauke Janssen, the head of documentation for German news magazine Der Spiegel, writes about how the publication created its “documentation department” from its traditional archive.

The magazine employs 70 people, often experts in their field, to rigorously fact check all articles using reliable sources, previous articles and other documents held by Der Spiegel. Janssen recommends the Der Spiegel model, but says: “Newsrooms that do not have a similar documentation department should emphasise that reporters and editors double-check any story prior to publication.”