Big data has been branded as - we're throwing up in our mouths as we say this – the "oil" of what has annoyingly become known as the "fourth industrial revolution."* Strip that down, and we're in part talking about the way individuals' data is used to knit new, virtual businesses.

It's the basis of the app economy and corporations have been getting wise to it.

The app economy has thrived on the basis of "free" platforms like Facebook granting access to their users' data to third parties.

In the corporate world, your social feeds may be sucked into business systems to assist financial services firms looking into your financial status and habits and making decisions about whether to offer you a loan or some other product.

With Facebook admitting to having "improperly" shared the data of 87 million users with Cambridge Analytica, the question must be asked if it is safe for you to continue to rely on what has ostensibly been "free" data. That question has currency as the European Union's General Data Protection Regulation (GDPR) arrives in May, and with it comes weaponised rules over permissions and fines.

Data, data everywhere

Facebook has tried to close the stable door here, while treading a line between the privacy of users and a desire to keep developers onside.

But Facebook is not the only source of personal data online that's open to startups and corporate developers - bazillions of data sets exist online. They're perfectly legal, and indeed many are uploaded and maintained by governments and their representatives. What makes them legal? In most cases you can't use them to identify individuals. And if you can't identify people from the data, you can do what you like, GDPR-wise.

If you're collecting data for major crunching, do you need to identify the people it's about? If not, ask for it with the personally identifiable elements excluded.

Is it private?

I've just been digging in some big data sites and browsed data full of people's names. The first entry concerns a Mr Sehwag, a Mr Lee and a Mr Pathan, for example. And you know what? That's perfectly legal, as is the information that a Mr Tendulkar was named "man of the match". It's the results data from a 2008 Australia/India test match, and it's in the public domain.

Just because data's publicly available, however, doesn't mean it's fair game. It may have been stolen – just think of all those massive breaches of recent times.

But what if the data is personal, identifiable and not public? First, you have to make clear to the subjects where you got the data and what you're doing with it, and – most importantly – why you have the right to do what you're doing. If you're the controller (i.e. defining how the data is processed) it's up to you to ensure that the data subjects are aware. And if you don't have some other lawful reason for having the data and you're relying on the subject's consent to do so, getting and managing that consent is your problem.

Who's liable? One of the new bits with GDPR is that the controller and the processor are now both liable in cases of data misuse (previously it was just the controller who got the kicking). Just because you're on the receiving end of dodgy data doesn't absolve you of liability any longer.

If you're the controller and providing data to others, you need to dictate to those others – contractually – what they can do with it, and you need to tell the subjects what's been agreed. If nothing else, it helps your case if the other party misuses the data.

In this context, then, let's have look at the Facebook incident. Did Cambridge Analytica steal data? No. As Zuck posted on 21 March on Facebook, the app that hoovered the data "was installed by around 300,000 people who shared their data as well as some of their friends' data. Given the way our platform worked at the time this meant [the app's author] was able to access tens of millions of their friends' data."

An easy-breezy attitude to sharing personal data is the only thing keeping the app economy alive READ MORE

It was working as designed, even if hindsight shows that the design wasn't the greatest – something Facebook acknowledged by changing the way it worked in 2014. Now, when they made this latter design change they also told the app developer and the now dismantled and rebranded Cambridge Analytica to delete the data they'd previously acquired. "It is against our policies for developers to share data without people's consent, so we immediately banned [the] app from our platform, and demanded that [the app author] and Cambridge Analytica formally certify that they had deleted all improperly acquired data. They provided these certifications."

Facebook allowed too much data to be slurped; it made changes to stop this happening; it told app developers to remove the excess data; and says that at least one app developer fibbed when claiming to have done so.

In the GDPR world we'll all be living in after 25 May, it's simple: if the data controller tells a third-party data processor (to which it has provided data) that consent no longer exists for the party to hold that data and to delete it, then they have to do it. And as long as the controller made reasonable efforts to ensure that this is done, it's the third party that carries the can.

Which means if you're that third party, you need to be damned sure of your ongoing right to use the data.

Breaches are so last year

Now then. Perhaps you've been distracted by all the hype and press excitement around influencing elections and the Senate giving Facebook CEO Mark Zuckerberg a not so hard time. So distracted that you've not noticed that I've yet to mention the "b" word.

We're used to "breaches" being all about people stealing data: hackers or malware writers breaking in or deploying malware to cause the exfiltration of data from company systems, thus rendering the system owners liable to punishment. And one's first instinct when reading the definition of "personal data breach" in the GDPR text is to think of a company getting hacked. Understandably so, because it says: "'Personal data breach' means a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to, personal data transmitted, stored or otherwise processed."

Critically, though, you don't need to have data stolen to commit an offence under GDPR. Simply failing to follow the basic principles for processing (lawfulness, fairness and transparency) or failing to tell the data subjects what you're doing is enough – you don't even need to have passed the data on to anyone to be deemed to have broken the law.

Not only that, but look at the definition of "processing" in the text of GDPR and you'll see words like "collection", "structuring" and "storage" in there. Yes, storage. Just having it counts as processing. So having it without good reason counts as unlawful processing.

The bottom line

If you're processing big data, you need to be ever mindful that although data thefts make big news, there are plenty of other ways for data misuse to incur big penalties. And they're summed up simply: if there's no lawful reason for you to have that data, you're not allowed to use it, so get rid of it right now – preferably yesterday. Once GDPR is here I expect more prosecutions and fines for misuse of personal data than for disclosure or theft.

If you have a hack on your repository and can show the ICO you took all reasonable steps to protect it, the ICO will treat you proportionately. But if the ICO SWAT team finds you've ignored the rights and freedoms of the subjects by working diligently, efficiently and securely with terabytes of data that you shouldn't have, things probably won't go in your favour. ®