In 2017 alone, US companies spent upwards of $10 billion on third party audience data. And while generally speaking, people were once indifferent or oblivious to the way personal data was collected, the tide has certainly turned.

In the wake of the Facebook/Cambridge Analytica scandal, people are more aware than ever of the tenuous nature of data collection, and more concerned about how securely tech and financial institutions catalog this data.

And with good reason: according to the Identity Theft Resource Center, there were 1,579 data breaches in 2017, an alarming 44.7 percent uptick from the year before.

The world of data is opaque – and it can be unsettling for citizens to hear about this personal data infringement in the media. But as someone’s who’s worked very closely with data, I’ve learnt a fair share about it over the years:

Nobody has any idea about how their data is being collected and used

According to Pew Research, nine in 10 adults feel that control over personal information collection is ‘very important’ And yet, only nine percent of people feel they have a strong command over the information being collected about them.

Some data collection is pretty obvious, even to the most casual user. Google catalogs a record of search history. Spotify tracks music listening habits. There’s no surprises there.

However what about other types of applications – say, location services? Most people aren’t aware how their location data is being collected and used by companies like Facebook and Google.

Typically, companies are just trying to build better products to survive in a dynamic, data-driven landscape – and well, ultimately please the end user.

Information about how data is collected is usually available in an application’s ‘Terms and Conditions’ that users agree to when they sign up for an app.

However, only one percent of people actually take the time to read what’s in there. The documents are dense and long-winded.

Especially considering the current data privacy landscape, companies should be working to present this information to users as simply as possible – whether it be during the on boarding process of the app, or through a concise blog post.

And for companies handling the data of EU residents, doing so is now mandatory. The General Data Protection Regulations (GDPR) became enforceable on May 25th, giving residents unprecedented control over their personal data, as well as the requirement to ‘opt in’ to have their data collected.

There is a huge lack of transparency in the entire industry

While some companies make an effort to explain how data is collected, there are many companies that operate mysteriously. More than half of the companies studied in the 2018 Corporate Accountability Index failed to adequately disclose information about the way they collect data.

Facebook was singled out as one of the worst among the offenders when it comes to transparency in data collection practices. This likely comes as little surprise, considering the Cambridge Analytica scandal.

But it turns out, this incident was far from the only data and privacy violation committed by the social media behemoth. In fact, Facebook recently announced that roughly 200 apps have been suspended from using the site until Facebook has conducted audits about the apps’ data privacy practices.

But Facebook was far from the only site that’s been cited as a transparency offender. Several other major internet and mobile companies – including Microsoft, Twitter, Oath (formerly Yahoo!), and Google – were found to disclose little to no information about how they collect information from third-party websites.

And then there’s the accusation leveled by Oracle, that Google tracks Android users’ location even when tracking functions have been muted and when there is no SIM card present in the device. Google has denied these claims.

In short, a lack of transparency has become a hallmark of the tech and data industries, and it’s what makes users so uneasy about logging on.

Now more than ever, it behooves any company collecting data for commercial or product development purposes to explain in clear language what they are doing. Users also deserve the choice of opting out of having their data collected in the first place.

Data is fractured and its availability is limited, making it hard to use effectively

As ubiquitous and relevant as data is, it’s still a sloppy business. Data is simply not as readily available as you might believe — nor is it often ‘whole’ or ‘complete.’

For businesses that want to create new products using data, they can try to acquire that data from large tech players. But even these data sets can be fractured or porous.

For this reason, some companies seek out data from elsewhere. There are organizations like Kaggle and Drivendata that crowdsource data and data science services, for example.

But once data is collected from varied sources, it has to be cleansed and processed in order to be serviceable in engineering a new product.

As you might imagine, this process tends to be costly and resource-intensive, since no single ‘standard’ exists for how data is collected. In the absence of available public data sets, a company would also need to collect data from just about everywhere in order to have something useful.

This includes social media, invoicing data, content from client chat portals, the weather – and the list goes on. This process is so labor-intensive that even the most prominent, well-endowed institutions lack the data sets needed to research and innovate.

When companies use incomplete or poor data, the information they collect is inaccurate and uneven.

In the case of using analytics to improve content and user experience on a website, an incomplete dataset could lead to changes that are not actually improvements for customers — which could very well lead to losing those customers.

Building an effective navigation app, for example, requires very accurate and almost real-time traffic data — and a lot of it. Otherwise, building an effective product is not feasible.

The era of mass data collection is still in its infancy, and many ‘unknowns’ persist, even among the most informed experts on the topic.

However with the advent of GDPR, the hope is for tech companies to collect data more responsibly — and to be transparent about their practices.

This also means the onus of data collection will shift to the users of social media, apps and well – every website all together. They can either choose to opt out of data collection (or using the app) or reap the benefits data offers.

Read next: Zuckerberg's failures must be your lessons