Open Data Kit — Mobile Data Collection

Big data is often discussed in the humanitarian community and there are some great use cases, from using openstreetmap data to processing and analysing satellite imagery. The reality is that in many programs and operations, organisations struggle to wield the amount of data they already have available.

I myself have been involved in baseline surveys and monitoring surveys where only around 20% to 30% of the data collected was actually used. In addition, I hear regular complaints about inconsistencies between monitoring surveys making comparisons impossible.

Humanitarians, unlike our Silicon Valley friends, do not work in an environment where data is easy to buy and gather. Organisations spend a lot of time generating the data themselves which is often rather costly. Is this large cost worth it when a lot of the data is never used?

We are used to thinking about minimising sample sizes for surveys, but not always about minimising (and appropriately refining) the questions asked. Often there are small bits of information that provide huge insights. In theory, collecting more data helps to build a more nuanced overview. What if, for some, chasing this nuanced view jeopardises using the data correctly? Many organisations often conflate large data collection exercises with rigour when, in reality, a solid, reliable process with less data may be much more useful.

Where is the optimal point to stop collecting data?

Some realise that access to certain communities and contexts might be rare and jump at any opportunity to collect as much data as possible, just in case they might need it later and I’m sure there are a multitude of other reasons for over collecting.

Should organisations be looking at the opposite end of the spectrum of big data? Would it be more valuable to collect less data or the minimum viable data needed?

Minimum viable data — The least amount of data needed to make an effective decision.

With less data:

It is easier to ensure consistency

There are smaller costs

It is quicker to analyse

It is easier to build on the basics

It is easier to repeat often

Less data at risk of exposure

So what is the minimum viable data for your project? Are all questions being asked necessary? Is minimum viable data a better approach?