Prologue

On March 2018, Cambridge Analytics (2013–2018) scandal was exposed to the public. Information about 50 million Facebook users had been “leaked” to the analytics company “Cambridge Analytics” who allegedly used it for election propaganda purposes. Only two months later, my email was flooded with messages from all sort of websites regarding an update of privacy policy according to the GDPR (General Data Protection Regulation — the European Union’s regulation governing the rights of the individual to his own information), which allows one to access all the information that has been stored about him.

GDPR emails

Due to these events, I wanted to find out precisely what information was being kept on my profile and how it could tell my story.

The Data

I opened my Facebook account and with a single-digit-number-of-clicks on the "Settings" tab, I already had an impressive directory tree on my desktop, full of juicy information about my digital self.

Data downloaded from Facebook

This data includes (taken from Facebook webpage):

Data Stored on Facebook

At first glance, I saw a file with only one line, stating I was profiled as “Starting his adult life”. in another file, I saw a list of all the ad keywords that match my interests (they were quite accurate). that way or another, I could have already tell that I would find interesting pieces of information.

Now, I’m officially excited.

The Hard Work

The first step was to manually go over the data, a total of about 1200 files, with only 47 of them actually requiring field mapping (if you wonder why only 47 required mapping, that’s because there are 1150 identically-formatted files. Each file stands for a different person/group who I ever had chatted with).

Armed with the programming language “Python” and data analysis software “Tableau,” I began to process all the information. It took a few hours of code writing to standardize all the different pieces of information in a uniform manner (dealing with the Hebrew language, extracting name patterns from large texts, grouping information by topic, creating summary tables, etc.). To those who are not familiar with such tasks - I had to turn this:

Raw Data

Into this:

(Coherent information)

And now that all the information is in its right position, we are all set to have some fun!

Playing Around

Overview

As a proud representative of the Y generation who has been using Facebook extensively for over 10 years, it was easy to guess (correctly) that Facebook kept a decent amount of information on my profile (ignoring media, there were 21mb of data on me, which is quite a bit). But what does this information say about me?

First, I wanted to check the extent of my activity on the social network over time, so I created an ordinal rating mechanism that weighs how active I was per year, taking into consideration the number of likes, chat days, responses and posts that I have initiated over the years.

The extent of my activity in Facebook over years

For those who are wondering - A possible explanation for the increased activity in 2011 would be the amount of free time during this period as I’d just finished high-school and had a long vacation before being recruited to the army. Additionally, 2010-2011 was Facebook’s best years in Israel, a fact which is supported by the Google search trend in Israel of the term “Facebook.”:

Amount of the search term “פייסבוק” (“Facebook” in Hebrew) on google `over the years in Israel

Anyhow, I can say that I have been fairly active in the social network over the last decade, and therefore, I will allow myself to attribute high reliability to long-term statistics with my Facebook data.

Tell me who your friends are, and I’ll tell you who you are

Today, my Facebook account has about 1,500 friends. A quick calculation will yield that in the last decade I have added an average of 12 (and a half?) friends per month. In order to focus my exploration, I decided (almost arbitrarily) to set the number 23 as a threshold which defines a "highly social month." For every month in which I added 23 or more friends, I tried to describe the phenomenon according to major events that occurred in that month (as much as my memory enabled me), but I also tried to see whether there was a correlation between my monthly activity on Facebook and my “friend-trend.”

Monthly friend-trend over the past 10 years. the green color indicates a section above the threshold; the gray graph in the background represents my monthly activity on Facebook (according to the same ordinal rating I presented before). The sharp-eyed people will notice sections where there is considerable congruence between the trends of the two graphs.

It's nice to see where I acquired most of my Facebook friends over the past few years. But is there more to that? What if Facebook can characterize milestones in my life according to only friend trends and connections to them? For example, they can show me this information in the form of a funny video with annoying music (as they often do)? Or perhaps, for example, Facebook would map all users with similar friend trends in their 18-21s (Military age in Israel) and finds cliques of people who are most likely to share the same military background? That’s nicer. But you get my point; the possibilities are endless.

Grade-a-friend

I do not easily give away “likes” on Facebook. I only “like” things that I really love. Speaking of love, the “love” reaction on Facebook is also a rare commodity. To be more specific, I think that a “love” reaction is worth five times more than a “like” reaction, whereas a comment is worth two “likes,” and the number of distinct times I spoke to someone in a chat is equal to one comment interaction in its value. Of course, that’s only how I see things, but hey! It seems like I have unconsciously created a rating model to measure my affection towards Facebook friends!

So, I ran a test using the above-mentioned rating method (Disclaimer - responses different from “like” only came into use in 2016). After clearing some noise from users who had a low score, I had a reasonable amount of data to analyze, and thus I automatically compiled a list of people that could be defined as “close Facebook friends.” This is without even looking at the content I have. (Response patterns, keywords in chat, number of tagging comments, etc.). This means that I could map my digital social friend group based only on the very existence of the interactions I mentioned above.

Top friends according to my rating system, arranged in bubbles. The size of the circle as the size of the score; the color of the circle according to the score (from darker to brighter)

Even more interesting, I can see the change in my (digital) social friend group over the years.

Same as the graph above, per year 2009–2018 (Yes, it looks like an image from a Biology book.)

While I am thrilled by how much this information tells my life story correctly, I cannot help thinking about how to take it to the next obvious stage, relationships.

As a proof of concept, I also examined the amount of social interactions with my past adult relationships:

For privacy reasons, you will not see names/dates/numbers here. different colors indicate different relationships on a timeline, and the number of points is the number of times the interaction score has exceeded a certain threshold.

Indeed, the information presented above corresponds with the actual relationships. This means that it is very feasible to map out "significant relationships" and how long they lasted. If I considered additional parameters such as gender, age, keywords, message send/receive ratio, tagging, disqualifying profiles that liked the movie “Interstellar” (must) and so on (be creative, Facebook has all kinds of data), such Facebook algorithms can not only offer me potential partners, but they can even approximate sexual orientation, predict the likelihood that relationships will succeed, and, in fact, they can do anything... thinking about it, there is nothing to prevent Facebook from becoming the best dating platform (turns out, I’m pointing the obvious).

Location, Location, Location

Unfortunately, I did not allow Facebook to access my device location over the years, so this information does not exist on Facebook’s database. But in order not to spoil our fun journey, I went to Google Timeline (for those who don’t know, here is your life on a map) and downloaded all the information that has been collected from my device over the years.

The next thing I did was to map all the Facebook events I had ever responded to on Facebook, pin them on a timeline, and correlate the information with my Google location history. Here is how it looks:

Timeline bars = Events marked as ‘going’. Points on the map = Location from these very same dates

Specific example — my route on the day G&R performed in Tel Aviv — Welcome to the jungle baby!

If you give credibility to the arrival status I responded to every event, it is fairly easy to pinpoint the location of the event on the map, or vice versa - it is easy to determine whether I was present at the event or not. So, what? Well, maybe Facebook will offer me events that I'm likely to be interested in based on my past behaviors, or perhaps Facebook will work together with Google to map out which age groups are located in which places and in what times, so they could sell the information to such and such businesses. Again – it just depends on how creative Facebook want to get.

Some other stuff

There are many other findings, but at this stage, I feel that the matter has been exhausted and, therefore, I will not devote more than few quick screenshots for the following data types:

Major activity hours on Facebook (there is nothing like a coffee break at 14:00)

Posts on my wall by months (It’s highly visual that my friends remember my birthday in January every year)

The most common words I make use of — mainly Hebrew pronouns, unsurprisingly. Imagine taking all my chat history and feed some machine learning algorithm with it (Did someone say Black Mirror S02E01?)

Unfortunately, I cannot offer you interesting analyzes of my search history because I clear it every now and then, but the smart reader can just imagine what treasures he might find in his own search history.

Epilogue

In my humble opinion, the user world is roughly divided into three: the indifferent, the anxious, and the permissive type of people.

The first group - if you have come this far down this article, well done.

The second group - I hope that at least now that you’ve read this review, you’ll know exactly what kind of information is stored somewhere deep in Facebook databases. For your consideration, in any organization that complies with the GDPR regulations, you may request that your personal information will be deleted (up to a number of exceptions).

The third group - you are simply invited to get excited about the wealth of information and opportunities it holds, but also try and think about how it can be used to benefit the users even better. If we are already living in a reality where there are ads in every square centimeter, and we are barely starting conversations with strangers face to face, wouldn’t we prefer to get the most relevant ad to our needs? Wouldn’t we prefer the most promising match?

It is highly important to emphasize that the information that is stored on each of us is very different in purpose, volume and meaning. There are many factors which influence the integrity of the data, the most dominant of which are our different patterns of behavior on the Internet, how long we are active on the social network, and which privileges we granted our apps. In addition, Facebook cannot be viewed as a single coherent source of information over the years. It is important to consider the awakening of other applications such as Instagram, WhatsApp and Twitter over the years and the way they have affected our behavior.

I do not pretend to have any great knowledge of social networks / statistical models / behavioral sciences. All the research presented here was based solely on my intuition and personal point of view and does not rely on previous research that many people better than me may have already done, but I do hope that I have achieved my goal to make this information accessible and illustrate its possible uses.

Written by Yoav Tepper, AKA:

· “Starting his adult life”

· “Interacts with ads about music, technology, art and vacation topics”

· Owning the unique fingerprint: