Representing Words

Computers think in 1s and 0s. With a bit of programming, we can use those 1s and 0s to represent virtually any number. Letters aren't that hard either - you just map each letter to a number, and that's that. But representing words in a way that computers can understand has always posed a challenge. Representing words as a list of individual letters works well to an extent - it's great for humans. However, just reading a list of letters and trying to infer meaning from it is a difficult task for a computer.

Say you are trying to determine if two words mean the same thing. These words are represented by lists of letters, so you don't know anything about what they actually mean. The words are:

eat

and

consume

If you just compare the letters in these words, they seem nothing alike. They only have one letter in common, and that letter isn't even in the same place in both words. As one can see, letters are a less than conveinent way to represent the meaning of words (at least in the eyes of a computer). So why don't we replace the letters in these words with something a computer can understand much better - numbers!

Normally, when a word is dictated to a computer, each letter is translated on a one-to-one basis to a number. So a computer sees the word

king

as:

107 105 110 103

The number that each character corresponds to does have any meaning: it's just a number that was chosen by a human back in the days when computers were first emerging. What if, to better represent the word, we replaced the characters with a series of numbers representing characteristics about the word.

Let's say we'll represent some words using two different numbers, each representing different things:

number 1: femininity (1 for something that directly pertains to women, or for someone who is a woman, -1 for something that directly pertains to men, or for someone who is a man)

number 2: power (1 for being the ruler of the world, 0 for having no power whatsoever)

So to represent the word king, you'd use the following numbers:

femininity: -1, power: 0.7 (a king is the ruler of a country, not the world)

Now, you can represent many more words in a way that shows how they all relate to one another (ex. the word queen has the same power number as the king, but a femininity number is 1)

queen - femininity: 1, power: 0.7

duke - femininity: -1, power: 0.4

duchess - femininity: 1, power: 0.4

peasant - femininity: 0 (as a peasant could have any gender), power: 0.05

Each word translates to a list of two numbers - or a word vector that consists of 2 features. While 2 features might be enough to represent a few dozen words - what about the entire english language? 2 features certaintly isn't enough. What about 10? 50? 100? 300? It could be any number, but one of the most popular set of word vectors chose to use 300 features.

Choosing Feautures

When we were using only two features to describe words, it was pretty easy to choose meanings for our features. But what about coming up with 300 features that can describe all the words in the english language? It's a nigh-impossible task - for any human that is. Google developed a technique called word2vec that would determine what the 300 features represented, and the values of the features for each word.

How can 300 features be assigned to represent meaningful facets of a word? One word: context. A lot of meaning can be derived about a word simply by looking at the surrounding words. Words that mean the same thing will turn up in similar contexts. For example:

I consumed the food.

I ate the food.

If you train an AI to guess a word based off context, and to guess context based of a word, you can derive the intrinsic meanings of words (to a certain extent). The AI will learn to create meaningful features that are consistent across words all on its own!

After analyzing a massive piece of text (called a corpus), the AI will generate three hundred features for each individual word. These word vectors can be used for a number of cool applications, as we'll see in the next two sections.

Uses

The main use of word vectors is of course NLP (natural language processing). A word vector can be used as the input to virtually any machine learning algorithim, and it's common for neural networks to use vectors produced by word2vec for sentiment analysis, text comprehension, and text generation. Massive sets of word vectors often contain important relationships between words: If you subtract France from the Paris and add Italy , you get Rome .

Trivial (but fun) Uses

I loaded 1000 different word vectors into this site, and there's many different applications for them. One of the simplest ones is taking any given word vector and finding the vectors closest to it in 300-dimensional space. This often finds synonym and related words. Try it out below:

Search for Nearest Words

In addition, if you compress the 300-dimensional data onto a two dimensional plane (using a method called tsne, read more about it here), it leads to a stunning visualization of the relations between words:

And that's word vectors in a nutshell - a beautiful way to meaningfully represent words as lists of numbers.