What’s the number one buzz-word in the IT and business world in the current times? You guessed it—it is “Big Data”!

Data has been generated for centuries and the understanding of its potential to enhance lives, processes, and businesses is not new. However, due to previous technological limitations in storing, analyzing, and interpreting data, it couldn’t be exploited to the extent to which it can be now! But what was it that makes handling Big Data a challenging task? Is it it’s large volume? Or is it the fact that data collected from multiple sources is often unstructured and inconsistent. Well, the answer is both and much more. The following is what distinguishes Big Data from being just data, also known as the 3 Vs of Big Data:

Volume : Big Data has humongous volume that can range up to ZetaBytes, which is 1,000,000,000,000,000,000,000 bytes of data.

Variety : Data may be collected from multiple sources such as transactional systems, file systems, online articles, journals and so on. Not just this, the data may be present in varied formats across these multiple sources such as relational data, text, images, videos or audio.

Velocity : High velocity data refers to streaming data. Data obtained from sensors can be one such form of data.

Data that may cover either one or more of the above aspects can be termed as Big Data. Big Data scientists have come up with two more Vs associated with Big Data from a business perspective:

Veracity : Veracity refers to the conformance of the source data with actual facts and figures. This is especially important since many domains will base their decisions on the findings obtained from analyzing the data. If the foundations of decision making, i.e. the source data, are not be correct, then than any decisions make based upon them have a near 100% probability of failing.

Value : Value refers to the value-add of the data in making strategic decisions. Data (or Big Data) storage, processing and analysis require a lot of investment in terms of human resources, time and money. If the data adds no or very little value to the decision making process and its outcomes, then the value associated with the data to the process is insignificant.

With the advancements in the IT world and more people shifting to online processes for various activities, such as shopping, socializing, ordering food, payment of taxes and other utilities, the data being generated is at an all time high! It is said that the data generated in the last 5 years is more than the data generated in the past century! Well, this indeed is something that we can term as Data Explosion! Where handling such huge volumes and rich variety of data can be challenging, the insights that can be discovered from it would be priceless. Big Data has the potential to change the way businesses, governments, researchers and almost everyone works by mining out the unseen, non-trivial but highly significant and potentially useful patterns from what might seem just a huge mountain of data.

Before being able to analyze Big Data and gain useful insights from it, it is very essential that the data is validated and cleaned. But what exactly do we mean by this?

Big Data is obtained from multiple sources. The way it is collected and stored across all these sources may vary syntactically as well as semantically. Also, there is a possibility that some highly significant attributes or subsets of data may be missing. Not only this! The situation can actually get worse in that all of this data may not be consistent and correct. Let’s explore this with an example of how this can affect our analysis:

Retalia is a retail shop. It has stores in 20 countries all over the world. Each branch of the store maintains its own transactional system (i.e. the database containing the details of its customers and the transactions made). Suddenly, Retalia notices that of these 20 countries, its sales are dropping drastically in 5 of the countries. Here comes the role of Big Data.The executives at Retalia will try to find the patterns that might have caused their customers to leave them using the data from all the transactional stores and accordingly devise a solution to gain their customer base back. However, they start facing a problem. Of all the branches, the stores in U.S.A. and Canada do not contain the age and gender of their customers. Moreover, at the other branches, some store the gender as Male/Female while others as M/F.

In the example above, age (an attribute describing the customer) is missing. In cases where the drop in customer numbers might be due to customers belonging to a particular age group not coming to Retalia anymore, this can’t be analyzed and even if we try to analyze this on the basis of whatever data is available, the results might not be accurate. Furthermore, while performing analysis, it is essential that Female & F are given the same semantic meaning, otherwise this again may lead to corrupt results. Similarly for Male & M.

Hence, before performing actual analysis on our Big Data, it is very important to perform Data Cleaning by pre-processing the data so that the data obtained from various resources is complete, consistent and correct in all respects.

But just analyzing data is not enough. What is done using the outcome of the analysis holds equal importance. How are the results used to make the processes (irrespective of the domain, be it politics, medicine, education, marketing, sports or others) more efficient and effective is extremely essential!

There are various examples across all the domains in the real world where the results obtained from the analysis of Big Data have benefitted the concerned authority a great deal and enabled them to set an example for their competitors! We will have a look at some of these cases and how Big Data helped them:

Politics :

The former U.S. President, Barack Obama used analytics on data obtained from campaign surveys conducted in the past and found his potential supporters, thus pitching to these voters rather than wasting time, resources and money on the non-supporters. Also, he found the most influential medium of campaigning for different sets of voters to leave maximum impact on them.

Similarly, current U.S. President, Donald Trump used text analysis (a form of Big Data analytics used to analyze large volumes of text) to find the words which voters find most appealing and used them in his campaign.

Currently, White House is investing millions of dollars on researches related to Big Data and its analytics.

Healthcare and Medicine :

This industry has huge loads of data! This data is available in multiple forms, from textual data (in the form of doctor prescriptions) to graphs and images. Hence, it was very difficult for this sector to come up with solutions to its Big Data.

However, now with the best technologies available for Big Data storage, analytics and visualization, many hospitals are able to diagnose their patients on the basis of the symptoms reported by their patients and the analytics obtained from this huge set of data rather than asking each of their patients to go for highly expensive pathology tests. This has resulted in reduced time and expenditure for patient treatment.

Along with diagnosing the patients of the diseases they are currently suffering from, genomic pattern analysis can predict the potential diseases a person may suffer from in his life and suggest proper care and precautions before the actual emergence of the disease!

Retail and the e-commerce industry :

These industries have mined patterns from data available on customers and customer transactions to understand the services they need to improve or the additional services they need to provide for a happier customer base.

The suggestions seen on most of the e-commerce websites such Amazon are customized according to the customer (its age, gender, previous transactions, if any, etc) to improve their shopping experience.

Similarly, many retail companies such as Macy’s have been able to reduce the decline is their customer base and increase in net customer satisfaction.

Sports :

In earlier times, players for teams (especially the ones for club tournaments) were chosen on the basis of the experience and intuition of the people in the selection panel. However, now with Big Data Analytics coming into the picture, panels don’t select the players just on the basis of their intuition but also on the analysis of the data obtained from the past playing patterns of the players. This enables them to form a team with such players who complement each other, with minimum investment and maximum profit expected.

Teams are able to plan better strategies before the game on the basis of weather conditions such as temperature, humidity and grass conditions, among others.

Banking and Insurance :

Big Data analysis can be used to predict frauds and customers who may breach Bank policies. For credit card transactions, a sudden and drastic change in customer transaction patterns could mean something suspicious. If handled in time, both the bank and the customer can save themselves from huge losses in reputation and money. Also, based on past records, banks may identify customers who may be faulty in repaying their loans or insurance premiums and take appropriate measures accordingly to ensure timely payment.

Big Data Analytics can also be used to increased customer satisfaction and reduce the number of customers leaving the bank by creating the policies that benefit the customer, customizing their services according to user needs and so on.

Education :

Big Data analytics in the field of Education is pretty new. A university in Australia uses analysis of Big Data to recommend courses to students according to their goals and previous academic performance, and to identify patterns on the progress of a student versus the time spent by him or her on a particular website.

Many online education portals use the clicking patterns of the attendees to identify how boring they find the course.

Big Data Analytics can also be used to identify the effectiveness of a teacher in a particular course and accordingly recommend the teacher to the students for various courses.

Data is an entity which will remain for the foreseeable future, and the scope associated with it and its analytics is huge! Indeed, this is a field which has a long way to go!