by Tobias Trelle

In this article I’d like to give you a short introduction to a subset of Google’s machine learning capabilities: the natural language API. This API processes text snippets and can apply several analysis algorithms:

analyze-entities : detects entities (proper nouns such as public places, art, etc.), and returns information about those entities.

: detects entities (proper nouns such as public places, art, etc.), and returns information about those entities. analyze-sentiment : identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral.

: identifies the prevailing emotional opinion within the text, especially to determine a writer’s attitude as positive, negative, or neutral. analyze-entity-sentiment : combines both entity analysis and sentiment analysis and attempts to determine the sentiment (positive and negative) expressed about the entities.

: combines both entity analysis and sentiment analysis and attempts to determine the sentiment (positive and negative) expressed about the entities. analyze-syntax : extracts linguistic information, breaking up the given text into a series of sentences and tokens and providing further analysis on those tokens.

: extracts linguistic information, breaking up the given text into a series of sentences and tokens and providing further analysis on those tokens. classify-text: analyzes a document and returns a list of content categories that apply to the text found in the document.

Natural Language API

For the beginning, the easiest way to start is the gcloud CLI from the Cloud SDK. (I described how to set up the SDK in this article.):

gcloud ml \ language analyze-entities \ --content "Sentence to be analyzed" gcloud ml \ language analyze-entities \ --content "Sentence to be analyzed"

You can also use a HTTP API (but you need to generate an API key for that):

curl -X POST https: // language.googleapis.com / v1 / documents:analyzeEntities? key = [ YOUR_API_KEY ] \ -H "Content-Type: application/json" \ -d @ - << 'EOF' { "encodingType": "UTF8", "document": { "content": "Sentence to be analyzed", "type": "PLAIN_TEXT" } } EOF curl -X POST https://language.googleapis.com/v1/documents:analyzeEntities?key=[YOUR_API_KEY] \ -H "Content-Type: application/json" \ -d @- << 'EOF' { "encodingType": "UTF8", "document": { "content": "Sentence to be analyzed", "type": "PLAIN_TEXT" } } EOF

For the following examples I’ll use the gcloud tool for the sake of brevity. I will walk you to two of the analysis methods.

Entity Analysis

We will start with this sentence:

gcloud ml \ language analyze-entities\ --content "The Louvre is the home of the beautiful Mona Lisa" gcloud ml \ language analyze-entities\ --content "The Louvre is the home of the beautiful Mona Lisa"

The result looks like this:

{ "entities": [ { "mentions": [ { "text": { "beginOffset": 4, "content": "Louvre" }, "type": "PROPER" }, { "text": { "beginOffset": 18, "content": "home" }, "type": "COMMON" } ], "metadata": { "mid": "/m/04gdr", "wikipedia_url": "https://en.wikipedia.org/wiki/Palais_du_Louvre" }, "name": "Louvre", "salience": 0.9340278, "type": "LOCATION" }, { "mentions": [ { "text": { "beginOffset": 40, "content": "Mona Lisa" }, "type": "PROPER" } ], "metadata": { "mid": "/m/0jbg2", "wikipedia_url": "https://en.wikipedia.org/wiki/Mona_Lisa" }, "name": "Mona Lisa", "salience": 0.06597222, "type": "PERSON" } ], "language": "en" } { "entities": [ { "mentions": [ { "text": { "beginOffset": 4, "content": "Louvre" }, "type": "PROPER" }, { "text": { "beginOffset": 18, "content": "home" }, "type": "COMMON" } ], "metadata": { "mid": "/m/04gdr", "wikipedia_url": "https://en.wikipedia.org/wiki/Palais_du_Louvre" }, "name": "Louvre", "salience": 0.9340278, "type": "LOCATION" }, { "mentions": [ { "text": { "beginOffset": 40, "content": "Mona Lisa" }, "type": "PROPER" } ], "metadata": { "mid": "/m/0jbg2", "wikipedia_url": "https://en.wikipedia.org/wiki/Mona_Lisa" }, "name": "Mona Lisa", "salience": 0.06597222, "type": "PERSON" } ], "language": "en" }

The language is identified as English. We also get a list of recognized entites, in our case these two: Louvre and Mona Lisa.

This example clearly shows that the entity detection is more than just scanning for nouns. The algorithm understood that the words “Louvre” and “home” refer to the same thing. Pretty smart, ain’t it?

For each entity all of its mentions in the text are listed. We have two mentions for Louvre and one for Mona Lisa. The salience (a value in the range [0,1]) of an entity denotes its importance within the sentence. So this sentences is mainly about the Louvre, since its salience is close to 1. Entities are also classified by their type. Louvre is a LOCATION, and Mona Lisa is a PERSON.

If available, the analysis also provides meta data about entities. Up to now these are the IDs from the Google Knowledge Graph Search API and a Wikipedia link.

Entity Sentiment Analysis

When running a sentiment analysis on the same sentence …

gcloud ml \ language analyze-entity-sentiment\ --content "The Louvre is the home of the beautiful Mona Lisa" gcloud ml \ language analyze-entity-sentiment\ --content "The Louvre is the home of the beautiful Mona Lisa"

… each entity and all of its mentions have an additional sentiment score looking like this:

"sentiment": { "magnitude": 0.7, "score": 0.3 } "sentiment": { "magnitude": 0.7, "score": 0.3 }

The score ranges from [-1.0, 1.0] which means from negative to positive sentiment. The magnitude ranges from 0.0 to infinity and denotes the strength of the emotion (both negative and positive).

Now we are playing around a little bit to make things more understable. I will list the overall sentiment values for the two entities with several variations of our sentence.

Input Louvre

score/magnitude Mona Lisa

score/magnitude 1 The Louvre is the home of the beautiful Mona Lisa 0.3 / 0.7 0.9 / 0.9 2 The famous Louvre is the home of the beautiful Mona Lisa 0.6 / 1.3 0.9 / 0.9 3 The Louvre is the home of the Mona Lisa 0.0 / 0.0 0.0 / 0.0 4 The boring Louvre is the home of the ugly Mona Lisa -0.8 / 1.6 -0.9 / 0.9

Rows 1 and 2 are clearly positive statements, all scores are higher than 0. Please note that Louvre in the first row has already a positive score although there is no positive adjective.

The third row shows a neutral statement and row 4 has an overall negative sentiment.

Summary

You learned how to use the natural language API and how to interpret the results for the entity analysis and the entity sentiment analysis.

In one of my next articles I will show you how to access this API from a Google Cloud Function.

If you are interested in AI and machine learning, have a look at our codecentric.AI channel on YouTube.

MerkenMerken