Businesses can better serve customers by determining how a phone call is going in real-time either with a machine learning model and platform, like TensorFlow, or with an API. This post will show how to perform entity sentiment analysis in real-time on a phone call using Twilio Programmable Voice, Media Streams, and the Google Cloud Speech and Language APIs with Node.js.

Prerequisites

Document-Level versus Entity-Level Analysis

Entities are phrases in an utterance or text that are commonly-known such as people, organizations, artwork, organizations, or locations. They're used in Natural Language Processing (NLP) to extract relevant information. This post will perform both document-level sentiment analysis where every word detected is considered and entity-level sentiment analysis where an utterance or text is parsed for known entities like proper nouns and trying to determine how positive or negative that entity is.

Given the text "Never gonna give you up, never gonna let you down. Never gonna run around and desert you. - Rick Astley", "Rick Astley" could be parsed out and recognized as a "PERSON" with a score of zero and "desert" could be parsed out as a "LOCATION" with a negative score.

Why would you choose entity-level over document-level or vice versa? Sometimes small words can detract or take away from more important information. Take the sentence "The new phone, though it's setting new records, is causing violence and chaos amongst consumers waiting in long lines" for example. Document-level analysis may not be able to provide a real sense of it because it has both negative and positive components, but entity-level analysis may be able to detect polarities or distinctions toward the different entities in the document.

This post will perform both entity- and document-level sentiment analysis so you can compare them and make an opinion for yourself.

Setup

On the command line run