At the 2017 Swiss Data Day, we had the opportunity to share and discuss best practices in Data Science. One of these good practice is to start with the problem to be solved and not with the data. We talked about concrete examples of companies that started with the data and unfortunately ended up with a product that did not bring much value. During the networking part of this event, one of the participants told me a similar story in his company: after having set up a Hadoop cluster, the IT had store in it a large part of the internal data, and at the end this initiative was stopped because no significative benefits had been generated for the company.

Define the problem

The best way to maximize the return of investment in data mining projects is to start with the problem, not with the data. How can we guarantee a good level of data requirement? At First Utility, we use a method similar to the "user story" in the SCRUM methodology. The initiator of the request must define the need while being responsible for the value generated:

As <<describe your role here>>,

I will << keep at least one of the items below and please complete them>>

{maximise the gross margin by £X}

{increase the revenue by £X}

{reduce the cost by £X}

{mitigate this risk}

By having <<describe here the data analytic work needed>>

For <<describe here the timeline>>

The advantage of this method is triple: we have a good working base, a way of prioritising our work and our efforts, and finally the calculation of the team's return on investment is facilitated.

How to stay innovative if you start with the question?

By analyzing a huge amount of data, the Data Science also aims to discover realities that are otherwise very difficult to perceive. How can we allow this discovery if we have to start with the question?

The purpose of the requirement is to drive our efforts towards a goal that will generate value. This requirement does not necessarily relate to an answer to bring. Moreover, in some cases, the analysis of the data shows that the value initially estimated is not forthcoming.

Thus, innovation is strongly recommended, it must nevertheless be guided by a business goal. Before starting a data project, it is necessary to ask: what is the aim we want to achieve? Without this, a "data innovation" is more likely to be inapplicable or even completely inappropriate.

Original article