

"Word embeddings" (also the foundation of the word2vec, glove and similar algorithms) are a way to capture the meaning of a word by looking at its neighbourhood words.

Consider the following sentences:



I fed my pet.

I fed my dog.

I fed my cat.

There are two ways to extract relationship information :

"pet", "dog" and "cat" are somehow similar as they are replacable in an identical context. ( replacability )



"fed" relates to "pet", "dog", "cat". (neighbourhood)

When running this kind of analysis on huge amounts of text (millions of documents), one obtains a (weighted) set of related words for each word analyzed. Results of word embedding analysis can be used to construct a vector space allowing for famous and often cited word-level topical algebra such as 'king-man=queen'.



Issue is, if a word has multiple, context-dependent meanings, those word vectors reflect that ambiguity resulting in inaccuracy of similarity/related-word metrics.