I have a seashell on my desk. It serves as a personal reminder to always think out of the box and widen my viewpoint when solving complex Big Data Analysis challenges.

In this article you learn:

Why Data Scientists carry risks like fishermen?

What is Chicken-Egg problem of Data Science?

How does the Chinese fishing nets in India get lifted up?

Each time I look at the shell, I’m brought back to the day I found it; it was a fascinating, most intriguing art condensed to extract value from mass fishing with the least effort. This is pretty much where I found it:

Can Big Data Analysis learn from Indian fishermen in Kerala?

Big Data Science in big oceans – the correlation

Procuring my very own shell was similar to uncovering, and appreciating, the value of Big Data Analysis; this shell origins from an ocean of artifacts, and the value of Big Data Ocean is an ocean of data.

Data scientists’ casts their own lift nets, just like these fishermen with their large nets, fish for information in the data oceans, and once finding a valuable artifact, they proudly present them to their fellow peers.

Perhaps the difference between these two activities are that the ‘oceans’ which these Data Scientists casts their nets in are massive; I’d personally use trawling nets for my endeavours.

My very own “out-of-the-box” shell from the shores of Kerala

How does lift net architecture relate to Big Data Analysis and Data Lakes?

Finding the correlation with the ocean was my a-ha moment for Big Data Analysis. I was near Cochin, India when I saw these lift nets. I was fascinated by it and looked and watched these fishermen worked with them.

While I studied how these immense fishing tool worked, a fisherman gestured me to come over and showed me how it works. He then gifted me my first shell.

The kind fisherman who made me relate fishing to Big Data Science.

In a (nut)shell, this lift net is an apparatus to assist in fishing, where a huge net is lowered into the water and then pulled up by a lever attached to a rope wound around a motorbike. Once lifted, the content of the net becomes visible to the fisherman. See this technology yourself in action:

Demonstration of a lift net (also called a Chinese fishing net) in Kerala. This teaches us what we can achieve when combining simple tools in Big Data Science together.

This is not so different to a Big Data Analysis setup; we collect information out of various systems and make it accessible for Big Data Analysis. All information is collected, provided and automated via Big Data Analysis tools.

Ultimately, the net which extracts the information is then given access to the Data Scientists who examines and investigates the Data Lake for Big Data Analysis.

In order to visualise this better, I have made the following picture. Here, you see the various data sources which are integrated to do Big Data Analysis .

The data is then structed, cleaned and stored into models, enlisted in a repository and flagged as master or transactional data for analysis. Data Scientists then use these information to examine and investigate thoroughly to make their catch worthwhile.

From this we can make the following conclusions:

Using the right tool is essential to fish in the Big Data Ocean for Big Data Analysis

for Big Data Analysis Using automated processes and tools enables Data Science talents to mine and do Big Data Analysis easily

The architecture on how Data Scientists use Big Data and Data lakes .

In Data Science, we have a Big Data Ocean full of velocity, variety and massive humongous of information that one needs to prepare in to accessibility to mine facts from it.

Experienced Data Architects and experts will also determine the positions to install the right tools to extract and integrate the most amount of information. Subsequently, Data Engineers then construct Big Data Analysis tools like the lift net to integrate and access the information stored in the ocean.

Despite building these infrastructures with a small pool of talent due to shortage, Data Engineers construct automated processes to extract, monitor, mesh and model data integration and provision the necessary tooling to investigate the data.

Once all necessary tools are installed, Data Scientists can operate these ‘fishing device’ and lift out the Big Data Analysis information and investigate its results, the content.

At most times the information is inadequate, and some additional information needs to be extracted. Once this picture is complete, it would be enough for the catch to be presented to influence business decisions.

We can take from this the following lesson:

Ensure your tooling is reliable and already in production

Establish standard procedures how value should be extracted from the Big Data Ocean

The main challenge and the chicken-and-egg problem of Data Science

Like a fisherman, the Data Scientist and also the Data Engineer, together with the Data Architect would not know if they will find value in the Data Ocean. All the Data Engineers tooling for Big Data and the explorative work of the Data Scientist is just done on an assumption that they likely will find value. This makes their mutual trust with each other an essential value to share in this line of job.

Furthermore, this uncertainty makes it hard to start Data Scientist projects and scale them; first, there needs to be evidence that there is value to get full commitment, sponsorship and unswerving support from business departments. Without evidence and full management commitment, it is hard to get sponsors and support.

This ultimately is a chicken-and-egg problem. As it is for the sea, the problem of the fisherman there is solved centuries ago: the fisherman carries the risk of having a catch or not. Hence, the fisherman takes care of his work equipment very carefully and train themselves to utilise their tools perfectly.

The same logic applies to Data Scientists and engineers: they learn and sharpen their skills step by step to deliver quick turnovers and ultimately Big Data Analysis results once data is available.

Ultimately, there is no certainty for a catch in each execution. The ultimate goal of a project must be to provide the initial value to overcome the chicken-and-egg problem. This is the crucial first step to advance tooling, procure results and to scale.

In order to get a project running, we learn from the fishermen’s craft:

To provide the first value in form of Big Data Analysis results as quickly as possible as it has a real impact on the sponsors

Big Data Science Talent need to enhance their skills and tools continuously to ensure success quickly when a new project gets started

The small things in tooling and processes matter to do achieve a big impact in Big Data Analysis . See how a simple old bike can support fishermen. The same is the case for Big Data Science.

Conclusion

I believe that the analogy between the fishermen of Kerala and the Big Data Analysis is an interesting one. We have discussed several similarities and saw different challenges and obstacles Big Data Science teams need to overcome.

Thereby, we found that the tooling and processes of the fisherman can be inspiring for our Big Data Science projects. Here’s what our key learnings are:

Automated Big Data Tools and processes are essential to run in production

Providing first productive value as quickly as possible as it has a real impact to this ecosystem

Big Data Science Talent need to enhance their skills and tools continuously to ensure a catch quickly, when a new project comes in

Sum-Up FAQ

Find below the most importnat questions and answers of this article.