Let’s make a sentiment classifier!¶ ¶

Sentiment analysis is a very frequently-implemented task in NLP, and it’s no surprise. Recognizing whether people are expressing positive or negative opinions about things has obvious business applications. It’s used in social media monitoring, customer feedback, and even automatic stock trading (leading to bots that buy Berkshire Hathaway when Anne Hathaway gets a good movie review).

It’s simplistic, sometimes too simplistic, but it’s one of the easiest ways to get measurable results from NLP. In a few steps, you can put text in one end and get positive and negative scores out the other, and you never have to figure out what you should do with a parse tree or a graph of entities or any difficult representation like that.

So that’s what we’re going to do here, following the path of least resistance at every step, obtaining a classifier that should look very familiar to anyone involved in current NLP. For example, you can find this model described in the Deep Averaging Networks paper (Iyyer et al., 2015). This model is not the point of that paper, so don’t take this as an attack on their results; it was there as an example of a well-known way to use word vectors.

Here’s the outline of what we’re going to do:

Acquire some typical word embeddings to represent the meanings of words

to represent the meanings of words Acquire training and test data , with gold-standard examples of positive and negative words

, with gold-standard examples of positive and negative words Train a classifier , using gradient descent, to recognize other positive and negative words based on their word embeddings

, using gradient descent, to recognize other positive and negative words based on their word embeddings Compute sentiment scores for sentences of text using this classifier

for sentences of text using this classifier Behold the monstrosity that we have created

And at that point we will have shown “how to make a racist AI without really trying”. Of course that would be a terrible place to leave it, so afterward, we’re going to: