“The term 'data journalist' is a bit of a jack-of-all-trade term,” Marianne Bouchart told delegates at the news:rewired conference in London on Tuesday.

“Some call us computer-assisted reporters, journalist programmers, journo geeks… unicorns. It varies.”

Bouchart is communications director and Data Journalism Award Manager at the Global Editors Network – and the founder of Hei-Da.org, a not-for-profit organisation set up earlier this year which specialises in open data-driven projects.

She is also the founder and editor of the Data Journalism Blog, and gave workshop attendees some expert advice on how to source and use data sets for storytelling.

Here is a list of sources recommended by her:

You can also find datasets directly on Google by using the following search operators:

Filetype:CSV and filetype:XLS f or Excel spreadsheets

Filetype:shp for geodata

Filetype: MDB , filetype: SQL , filetype:DB for database extracts

You can even look for filetype:pdf – for example, site:Adidas-group.com filetype:pdf

inurl:downloads filetype:xls, which allows you to find not only documents made public by companies or organisations, but also information they have shared internally

For more advanced data journalism, try data scraping with Google. Bouchart’s one line magic formula to use in Google Spreadsheets for scraping data from HTML tables is =importHTML(“”,”table”,N).

She also recommended Berkeley’s tutorial on spreadsheets, as well as the Centre for Investigative Journalism’s Data Journalism Handbook for further information on interrogating data using spreadsheets.

Finally, don’t forgot to clean your data! Bouchart said that holes in data sets mean the information could be wrong and unreliable.

She advised using Open Refine, a free and open source tool that doesn’t necessarily require an internet connection once the software has been downloaded on your computer.

Free daily newsletter

If you like our news and feature articles, you can sign up to receive our free daily (Mon-Fri) email newsletter (mobile friendly).