Sentiment Analysis in Node.js

In this tutorial, we'll be exploring what sentiment analysis is, why it's useful, and building a simple program in Node.js that analyzes the sentiment of Reddit comments.

What it is

Sentiment analysis is the process of extracting key phrases and words from text to understand the author's attitude and emotions. So, why is it useful? Companies can use it to make more informed marketing decisions. For example, they can analyze product reviews, feedback, and social media to track their reputation. Additionally, social networks can use sentiment analysis to weed out poor quality content.

How it works

There are two main approaches to sentiment detection: knowledge-based and statistical.

Knowledge-based approaches usually compare words in text to a defined list of negative and postive words. Finn Årup Nielsen from The University of Denmark published AFINN, a list of postive and negatives words, and a magnitude score of each on a scale between -5 and 5. For example, "gloom" has a score of -1, while "awful" has a score of -3. The score of all known words are added up to determine the overall sentiment of the text.

Statistical approaches make use of machine learning by analyzing known sentiments, and determining the unknown based on the knowns. For example, Amazon could create a machine learning model that analyzes the text and the 1 through 5 star rating of each product review. Then, they would be able to make an assumption about the star rating of a new review that doesn't have a star rating yet.

With any approach, a score is typically given to each body of text that is analyzed. A negative score implies the text has a mostly negative attitude, and a positive score implies the text has a mostly positive attitude.

Potential problems

There can be some challenges in analyzing text. Because of this, sentiment analysis will never be completely accurate. Here's a brief list of potential scenarios that can be tricky to analyze:

Double negatives : "I do not dislike running"

: "I do not dislike running" Inverted double negatives : "Not going to practice isn't really my thing"

: "Not going to practice isn't really my thing" Adverb modifying adjective : "I really hate when people cut me off"

: "I really hate when people cut me off" Possible sarcasm : "I love running with a knee injury"

: "I love running with a knee injury" Slang terms: "He ran a sick race!"

Our project

We'll be making a Node.js app that calculates the sentiment of comments from a Reddit post asking how peoples' days are going, and then displays the results in a webpage.

We're going to be creating a Node.js app, so make sure you have it installed. Then:

Create an empty folder

cd into that directory (with cd ~/Desktop/folder for example)

for example) run npm init to go through the creation wizard

to go through the creation wizard Install the depenencies we need from npm by running npm install express ml-sentiment

Download the comments.json file and put it into the folder you created

File structure

Now that our dependencies are installed, let’s create and open a server.js file in the folder you created.

var express = require ( "express" ); var app = express(); var ml = require ( "ml-sentiment" )(); var redditComments = require ( "./comments.json" ); const listener = app.listen( 3000 , function ( ) { console .log( "Your app is listening on port " + listener.address().port); });

What does this file do right now? The first block sets up Express, a web server library. The second block tells the program to import our sentiment analysis library, and the JSON data file of the Reddit comments. The last block starts our server and tells us which port it is listening on. There is nothing for the server to show though, because we haven't defined any "routes" for Express to use yet.

The Node library we're using for sentiment analysis, ml-sentiment , has documentation that tells us how we can use it:

var ml = require ( "ml-sentiment" ); ml.classify( "Rainy day but still in a good mood" );

This library uses AFINN-111, which has the ratings of 2477 words and phrases. The library simply looks at the words in the parameter of the .classify function, and compares each to AFINN-111. If a word like "not" or "don't" precedes the word, it uses the absolute value of the score. For example, "anxious" has a score of -2, while "not anxious" has a score of 2.

This is by no means a comprehensive library, but it's quick to implement, runs fast and works reliably on simple examples.

Let's create a function that loops through all of the Reddit comments, uses the ml.classify function to get a sentiment score, and saves that value into the redditComments array.

redditComments.forEach( function ( comment ) { comment.sentiment = ml.classify(comment.body); if (comment.sentiment >= 5 ) { comment.emoji = "😃" ; } else if (comment.sentiment > 0 ) { comment.emoji = "🙂" ; } else if (comment.sentiment == 0 ) { comment.emoji = "😐" ; } else { comment.emoji = "😕" ; } });

Now, our redditComments variable is an array of objects with the link , body , author , emoji , and sentiment keys. For example, here's how one object in the array looks:

{ "link": "https://reddit.com/r/AskReddit/comments/6szu5h/reddit_how_was_your_day/dlgtei6/", "body": "It was so nice day. it was my memorable day. ", "author": "Gemma_Youl", "sentiment": 3, "emoji": "🙂" } ...

Next, we'll define two routes in Express that sends our redditComments data in a webpage. Routes have to be defined after app is defined, but before app.listen is called.

app.get( "/" , function ( req, res ) { res.sendFile(__dirname + "/index.html" ); }); app.get( "/data" , function ( req, res ) { res.json(redditComments); });

This first route says that when the / directory receives a GET request, Express should send the index.html file. The second route says that when the /data directory receives a GET request, Express should send a JSON response of the redditComments variable.

Here's how the server.js file looks now:

var express = require ( "express" ); var app = express(); var ml = require ( "ml-sentiment" )(); var redditComments = require ( "./comments.json" ); redditComments.forEach( function ( comment ) { comment.sentiment = ml.classify(comment.body); if (comment.sentiment >= 5 ) { comment.emoji = "😃" ; } else if (comment.sentiment > 0 ) { comment.emoji = "🙂" ; } else if (comment.sentiment == 0 ) { comment.emoji = "😐" ; } else { comment.emoji = "😕" ; } }); app.get( "/" , function ( req, res ) { res.sendFile(__dirname + "/index.html" ); }); app.get( "/data" , function ( req, res ) { res.json(redditComments); }); const listener = app.listen(process.env.PORT, function ( ) { console .log( "Your app is listening on port " + listener.address().port); });

It doesn't work just yet! We haven't created the index.html file yet. Make a new file called index.html . Code this into the file:

< head > < link href = "https://cdnjs.cloudflare.com/ajax/libs/bulma/0.7.4/css/bulma.min.css" rel = "stylesheet" /> < style > #main { margin : 2rem ; } .big { font-size : 1.2rem ; } </ style > </ head > < body > < section class = "hero is-success" > < div class = "hero-body" > < div class = "container" > < h1 class = "title" > How was your day? </ h1 > < h2 class = "subtitle" > Sentiment analysis demo </ h2 > </ div > </ div > </ section > < div id = "main" > < table class = "table is-fullwidth" > < thead > < tr > < th > Feeling </ th > < th > Score </ th > < th > Author </ th > < th > Comment </ th > </ tr > </ thead > < tbody id = "sentimentTable" > </ tbody > </ table > </ div > < script > var request = new XMLHttpRequest(); request.open( "GET" , "/data" , true ); request.onload = function ( ) { if (request.status >= 200 && request.status < 400 ) { var table = document .getElementById( "sentimentTable" ); var data = JSON .parse(request.responseText); data.forEach( function ( comment ) { var newRow = table.insertRow(table.rows.length); newRow.insertCell( 0 ).innerHTML = comment.emoji; newRow.insertCell( 1 ).innerHTML = comment.sentiment; var rowLink = document .createElement( "a" ); rowLink.innerHTML = comment.author; rowLink.href = comment.link; newRow.insertCell( 2 ).appendChild(rowLink); newRow.insertCell( 3 ).innerHTML = comment.body; }); } else { alert( "Could not retrieve data" ); } }; request.onerror = function ( ) { alert( "Could not retrieve data" ); }; request.send(); </ script > </ body >

How does this work? In the HTML page, a script is defined that sends a web request to /data , and creates a new row in a table for each sentiment we analyzed.

Everything is good to go! To run your program, go back to the terminal and run node server.js . Make sure you are still in your project's directory. Now, go to your browser and open localhost:3000 . You should see our new webpage with the sentiment of each Reddit comment!

Notice how some comments have negations, like "not bad", and the sentiment has a postive value. This is because the sentiment library we used has basic support for negation.

What's next

Try running your own text through the sentiment analyzer. For example, download your Twitter archive and analyze the sentiment of your tweets. Let us know your projects in the comments below!