For those of you who’ve been listening to the show for a while, it is fairly obvious that there is, quite literally, a ton of data out there related to development initiatives and humanitarian assistance. If you had the time, money and desire, you could find data about almost any aspect assistance: things like baseline data about a population, damage assessments, geospatial data, demographics of the people affected by a crisis, or things like which organizations, governments and companies are on the ground helping. The problem is, in the humanitarian sector, organizations don’t have the time, money and people power to hunt down this data. And, even more of a problem is the fact that the data is locked in spreadsheets on individual laptops, only captured in written notes or, unfortunately, kept hidden as a potential competitive advantage.

Sarah Telford, my guest for the 129th episode of the Terms of Reference Podcast, is on a mission to change all of this. She is the Chief of Data Services at the United Nations Office for the Coordination of Humanitarian Affairs (OCHA), and oversees the continuing development of a global open data platform called the Humanitarian Data Exchange. The goal of HDX is to make humanitarian data easy to find and use for analysis, and, as of July 2014, has been accessed by users in over 200 countries and territories.

IN TOR 129 YOU’LL LEARN ABOUT

The extent of the effort that must take place to aggregate humanitarian data, from large institutional sources, research teams and ground activity.

The many steps to make data useful: gathering and collecting, cleaning, converting, validating, unifying…

The fully open, no strings attached approach of HDX to share their stock of information, and the steps that must be taken to guarantee its financial viability.

The role HDX played in the 2014 West Africa Ebola epidemic and the role the epidemic played on HDX going forward.

Details on the selection and validation of data, and the promotion of data keeping standards.

The role community building plays in guaranteeing quality, relevance and usefulness of data.

OUR CONVERSATION FEATURES THE FOLLOWING

Organizations:

Topics:

Data collection and aggregation

Data cleaning and validation

Data visualization

Machine-readable data

Artificial Intelligence

Internet of Things

Vulnerability Assessment Mapping

User driven design

Data spaces

Community building

Data file formats: PDF, spreadsheets

Data anonymization

2014 West Africa Ebola crisis

Cash as better way of giving

Places

Wetchester, New York

Nairobi, Kenya

The Hague, The Netherlands

Colombia

EPISODE CRIB NOTES

The problem with data

Unstandardized

Scattered in spreadsheets everywhere

Often outdated

02:35

Chief of Data Wizardry at OCHA

“There are a lot of hierarchies on UN”

“Titles don’t fully reflect the job”

HDX collects crisis data from several organizations

Launches in the German summer of 2014

HDX plea is to make data available and useful

Humanitarian Innovation Fund started the support, others have been joined

Data is distributed, as it is the sector. Many organization scattered

“There is no command and control”

OCHA is a lighthouse more than a panopticon

Data is used for one-time issues, generating “data mass graves”

While designing for crisis situation has informed the design, the origin of HDX is academic

World Bank is working with GeoNodes that collects spatial and other data

Interesting, but it is not maintained. Most local governments don’t prioritize this

To make something sustainable is to develop the architecture of maintenance

Metadata is important!

WB understood this, but thinking about data right at a time of crisis tends chaotic easily: gathering, unifying, wrangling and dewrangling…

“It’s all too common. We can do better. It should be easier”.

Basic multi-purpose streams (geography, population) are difficult also

And going deep into the community, there was frustration aplenty

14:10

Diplomatic exchange of data gifts among aid organizations

People can visit HDX and access data, no sign-up needed

To contribute with data, the user must contact on behalf of an organization

“Individuals don’t collect data” (…)

Organizations sizes vary from large players to university research teams

Submitting data from an organization is the first quality filter

To their surprise, organizations are not usually tidy on their data. Submitters are often ‘data activists’ from within, who take charge of making it usable in addition to their contractual duties

Some steps are taken to validate, clear and anonymize the data HDX receives

19:52

True stories about how HDX made a difference

“We have an idea of what gets viewed and downloaded, which sets are more popular, but we do not follow on how the data is used”

An item with high regard are the HDX visualizations

HDX played a key role in the 2014 Ebola crisis. They had records of infections and casualties. HDX made sure it was machine-readable data

The Ebola crisis data is by far the most popular set, and researchers still downloaded today to study the epidemics, response, etc

“Ebola put us on the map”. Since then other HDX initiatives have followed on this experience

24:38

It’s the simple things

Stephen: The big impact of sharing data in XLS instead of PDF

A DataLab in Nairobi performs data collection duties from 40 agencies. They asked HDX to help. First discovery: Data was stored in PDF

Furthermore, it was not standardized, hence it did not lend itself for comparison

“Big Data is not our focus, but smaller spreadsheets with key crisis response information”

Which does not preclude algorithmic efforts to link datasets and visualize correlations, even from different organizations

Data cleaning and validation are perhaps the most critical problems for HDX today

30:27

5-year qualitative forecast

Data spaces, starting in The Hague. An idea risen from a conference

Connecting all the levels, from head to field including all decision-making levels

Realizing not all humanitarian people is data people is important

May the new generations be more data literate. For the time being though

“The best way to guarantee quality and comparability of data across organizations and context, is community building”

Get people engaged around the story of data

Interaction and collaboration efforts will allow “to tackle problems we were incapable to before”

34:05

Do or do not with a solution in mind, there is no try

“We understood the problem” before delving into it

And there was investment in researchers and design thinking

Designers went to the ground in Africa and Colombia

“User driven” was an assumption that validated itself

Data pipelines must be joined with organization model around a product

“Fun!”

Users were interviewed about HDX personality: mature? playful? lumberjack?

This has help to overcome the hurdle of popularity

Small armies of trainers on HDX go everywhere to instruct on its use

User research will still be pushed as HDX missionary activity.

Data cleaning too (taking care of the data basement makes the whole house work)

It’s all about creating the process (technical, logistical, organizational) that best informs decisions through data

41:38

Traditional and innovative HDX funding

“We don’t ever want to charge people for our service. I have never seen it work in this field”

Value-added products can be set up, like workshops

Some solutions by request. Or funders have a voice on how products should be done

44:09

Where Sarah gets her data

“Reading about everything”

OpenStreetMap

Vulnerabiliy Assessment Mapping

HIF Elhra’s Journey to Scale

“There’s so much”

Westworld. “To an extent, HDF is an intelligence, performs automated tasks on our behalf”

Techonomy conference. This year: IoT

“The biggest disruption is cash” contributions, over in specie. With digital, we can track what people does with the money

Please share, participate and leave feedback below!

If you have any feedback you’d like to share for me or Sarah, please leave your thoughts in the comment section below! I read all of them and will definitely take part in the conversation.

If you have any questions you’d like to ask me directly, head on over to the Ask Stephen section. Don’t be shy! Every question is important and I answer every single one.

And, if you truly enjoyed this episode and want to make sure others know about it, please share it now:

Also, ratings and reviews on iTunes are very helpful. Please take a moment to leave an honest review for The TOR Podcast!