The Disturbing State of Big Data

The current state of big data is worrisome, here’s why.

Chances are that sometime in the last couple of years, you have heard about ‘big data’. Big data, alongside the technological advancements of machine learning and data science, seems to be where the world is heading. The revolution of working with data is supposed to be introducing new changes in regards to how you see your advertisements, the way police force(s) function, and, more broadly, global development! At least that’s what supporters of the implementation of big data feel about the topic.

All of these changes are supposedly great and offer insight into the new developments in technology across the world, but there is one primary issue (and many others):

Who is overseeing the developments in the first place?

Photo by ev on Unsplash

Everywhere you go, you leave a footprint. That footprint is not one of a purely physical nature, but comparatively a massive digital footprint that has loads of value in the data marketplace as well. Governments and corporations alike are consistently trying to grab as much data as possible, purely because of the sheer value that you possess.

But you might be wondering, what makes this so-called footprint so valuable? Well, first of all, and as cliché as it sounds, knowledge is power. The more a company knows about you in the present, the more it can predict your future. They will always be three steps in front of you, regardless of if you are even aware of it. The idea is that people’s lives can become much more manageable when factoring in the vast amount of knowledge that other, more powerful individuals or groups can know about you.

Developments in Privacy Statements:

Most likely, you have given a given application or company the rights to your data for a specific reason that is rather broad. How many times have you gone through an installer only to rapid-fire click through all of the check-boxes because you wanted to use a piece of software a few seconds faster? Probably quite often. In turn, the company asks for your information through a vague statement such as the following:

“We share your shipping address for the fulfillment of your order.”

It seems rather elementary, but nothing is preventing them from using that data later on for something else entirely different due to the lack of specificity. After a while, the above statement can turn into the following though an ‘update’ with their privacy policy:

“We share your shipping address, as needed.”

The change from ‘order fulfillment’ to ‘as needed’ sparks several concerns, including the following two:

The already small amount of information provided in the first example is degraded even more. The term ‘as needed’ is effectively the granting of permission to a company to use the address given for whatever purpose they need.

Before the digital era, you were mostly able to circumvent snooping due to limitations in technology. However, with the rise of the Information Age, it is seemingly impossible to participate in society without using various social media, methods of digital communication, and more.

Issues Regarding Cross-Referencing and Discrimination:

Again, who is overseeing the developments in the first place?

With the introduction of this rather grandiose concept of big data, companies are now collecting various sets of data that are so immensely large in almost every way imaginable. Effectively, there is no real oversight in regards to the limits that companies have on their collection of your data.

Photo by Marius Masalar on Unsplash

When these massive companies obtain this information, alongside the collection from a third-party selling your information from somewhere else, they cross-reference all of the points observed about you with extreme precision. A single list by itself does not foster much concern. However, many people might have poor purchasing habits, and when that is cross-referenced with many other lists, an incredibly detailed image of who you are and what you do can be painted, all through the generation of data alone.

A further issue in regards to big data and the identification of various data points includes the room for discrimination based on the majority of data-sets and in contrast, outliers. The boxes that you as an individual are placed in are primarily meant for the broader society and those perceived as within the ‘norm’.

In 2014, there was a White House Report that focused on big data and the issues of “perfect personalization” in regards to browser searches that pitted white-identifying names against black-identifying names.

…web searches involving black-identifying names (e.g., “Jermaine”) were more likely to display ads with the word “arrest” in them than searches with white-identifying names (e.g., “Geoffrey”)

While the research does go on to say that it was not possible to “to determine exactly why a racially biased result occurred”, it still sheds light on the various implications that big data as a whole can have for minority groups within societies.

Another issue that arises with the complications of big data is the fact that while technology can be used to produce results of insane accuracy, they can also be entirely off the target. In the case of counter-terrorism and state agencies, big data plays a significant role in the determination of who may or may not be a terrorist. Timme Bisgaard Munk, Ph.D., is a postdoctoral researcher at the University of Copenhagen who came to the conclusion that “algorithms don’t work for detecting terrorism” and that they are “ineffective, risky and inappropriate”. Another conclusion found by Dr. Munk is that there are “potentially 100,000 false positives for every real terrorist that the algorithm finds”.

Photo by David Beneš on Unsplash

Conclusion:

There are many issues to be had with big data, despite all the progress that the fields involved have been making. Not only are there severe privacy implications with big data and its various uses and purposes, but people who already face discrimination can also face the brunt of the issues with it. One of the fundamental sources of error in the world of big data is the notion that data is equivalent to fact or truth when it is not the case whatsoever. Cross-referencing information is not a source of authenticity due to the lack of ultimate verification that is always going to be there.

The disturbing state of big data should not be ignored, but rather should be made more public for people and groups to take matters into their own hands and decide what they want for themselves. Big data does offer many advantages in regards to automating processes, but the negative implications should not be ignored.