TL;DR

What you’ll find:

Word similarity analysis of Trump’s twitter using Neural Networks

Great jokes

An interactive tool to explore the data yourself

A possibility to do the same analysis on your own Facebook chat

A comprehensible intro to Google’s Word2Vec

Trump Tweets? Really?

I know, I know — “another analysis of Trump Tweets?”, but maybe it’s not that bad? “Hey, maybe I’m the first to do this type of analysis?”

“Well, yeah, but it doesn’t mean that it’s data science — anyone can just take a simple look and make conclusions.”

“… okay, well what about the EXACT thing that I’m doing?”

You know, now that I’ve spent at least 5 semi-full days doing this (most of which was trying to get the graph to look good enough for my perfectionist tastes), I feel kind of stupid. I even found a great article explaining almost the same kind of analysis that I did. No, seriously, check it out here — it’s great.

… Anyways. Now that we’ve got “this has been done literally 57 million times before and your analysis is rubbish” out of the way, we can get into it.

Word2Vec and Clustering

I really love the idea of clustering — as similar things are almost always together, therefore it’s possible to group or cluster them together.

For example, if a woman drives an SUV, has children and has that iconic haircut, she’s probably going to belong to the “can I speak to the manager” category.

Can I speak to the manager?

Clustering is also a perfect match for numerical categories: age, salary, job title, etc. Using that, companies can identify high income — high spending groups to bombard them with non stop ads after every single YouTube video (looking at you Google).

But what about text clustering?

It might seem obvious to you that ‘see you later’ could be followed by ‘alligator’ or ‘for a while’ by ‘crocodile’, but to a computer, this kind of logic is just nonsensical: “IT’S A KILLER ANIMAL AND A GOODBYE WHAT DO YOU MEAN THEY’RE SIMILAR BEEP BOOP” (read in robot voice). This is where Word2Vec algorithm shines.

Skipping over almost every single detail, what this algorithm does is look for words that have been used in similar context and create their similarity level based on that.

If you were to let the computer make the decision based on the words used by 4th grade English class, it might seem reasonable to it that ‘see you later’ and ‘alligator’ are almost identical phrases, but if you were to take it into a news station, it would match ‘alligator’ and ‘brutally murders a little kid’ together.

This is where Word2Vec shines — you can train it on any data you want! You can create word similarity clouds of your Facebook chat history, Trump tweets or even product reviews and based on context, it might put words ‘fake’ and ‘news’ really really close together.

Trump Meets Word2Vec

Word similarity cloud is fairly simple — the closer the words are together, the more similar they are. Note, that not all the words are included. Yes, covfefe is also missing.