This shouldn’t surprise you! This post is a reminder and demonstration that:

If you put your data out there, be it on Facebook, Twitter, OkCupid, Reddit or wherever, it will get sliced and diced and analysed to profile you.

As Sandy Parakilas, who was responsible for policing data breaches by third-party software developers at Facebook, stated yesterday:

“The ease with which it was possible for anyone with relatively basic coding skills to create apps and start trawling for data was a particular concern”

So, let’s see how easy it is

Thought experiment: I want to make you vote for Donald Trump.

I will gather everything I can to profile and influence you. I have no ethics.

Target n°1: Facebook

How did Cambridge Analytica get hold of 50+ million profiles? It wasn’t a “data breach” as some publications keep repeating. They published an app and paid users a few bucks to install it. The app would then ask your permission to share data about yourself and all of your friends.

(So ultimately if your data ended up in Analytica’s hand, it’s because one of your friends sold it for a few bucks. No hack involved, no stolen passwords: simple human greed.)

But the Facebook API was actually updated in the meantime. The “friend permissions” that let developers suck out all the details about your friends is no longer available. Parakilas states that Cambridge Analytica’s app was “one of the very last to have access to friend permissions”.

So is my evil plan foiled? No!

Instead I will make a simple Chrome Extension pretending to “gather social behavior data for research”, then pay people to use it. Cambridge Analytica used Mechanical Turks to find willing users : this still sounds like a good plan.

From this Chrome Extension I can now siphon ALL of your friends data and even more, no problem!

(relevant: Browser Extensions Are a Privacy Nightmare)

Next: OkCupid

A dating website: is there a better place to find juicy, super personal info about you?

Two years ago OkCupid had a scandal of his own:

A Danish academic harvested and publicly published the profile of 70'000 users.

The scraping process took him 2 days and his tools are still publicly available.

So I will just create a few fake accounts, unleash the scrapers, and sit back.

Next: Reddit

With Reddit things are almost too easy.

You can just download a full data dump of all user comments since the beginning of time. And everything is already setup on Google’s cloud for in-place analysis. That’s just perfect.

The fun part: Tying it all together

But my accounts are not tied together! You don’t know who I am on Reddit from my Facebook profile.

Well that is good on principle, but it won’t save you. There are lots of ways to link these records together.

A simple one is to look for reused pictures. Did you use any of your Facebook profile pic on OkCupid? bam, linked.

If this doesn’t work I can take out the big guns and use machine learning to recognize your face from one site to another:

import face_recognition

face_a = face_recognition.face_encodings(person_a)[0]

face_b = face_recognition.face_encodings(person_b)[0]

results = face_recognition.compare_faces([face_a], face_b)

Here! 4 lines of python code to run a model with a 99.38% accuracy.

Another elegant solution would be stylometry.

Stylometry: The statistical analysis of variations in literary style between one writer or genre and another.

By looking at your Reddit comments, and at the charming prose on your OkCupid profile, I am able to recognize your writing style and therefore to link your accounts together.

A good tool for this would be the open-source framework JGAAP, which was used among other things to find the book JK Rowling had written under a pseudonym.