A Beginner’s Guide To Using Natural Language Processing In Web Development

How to build real software with natural language processing

In 2019, it seemed like every major tech company involved in machine learning released a new natural language processing (NLP) model:

OpenAI released GPT-2

Microsoft released DialoGPT

Facebook released RoBERTa

Google released ALBERT

Salesforce released CTRL

And those are just the most famous. While there are many papers and reports dedicated to explaining how these models work under the hood, this article has a different focus.

In this piece, I’m going to focus exclusively on how you can build software with these models, using a design pattern that will be immediately familiar to any web developer.

Realtime Inference: Models-as-Microservices

The most familiar pattern for using machine learning in production—at least, to a web developer who has used a JSON API before—is by using realtime inference, in which your model is deployed as a microservice.

To illustrate how this works in simple terms, imagine a Netflix-esque recommendation engine (note that this is a purposefully simple example—the real Netflix recommendation engine’s architecture is very complex). When a user logs in, your application might:

Retrieve the user’s profile information from a database

Query a recommendation API—which is your machine learning model—with the user’s profile information (last five movies watched, age, etc.)

Receive a list of recommendations from the API, which your app parses and displays on the frontend

In this pattern, your machine learning model interacts with your application just like any other web API. You can imagine your machine learning API as a simple predict() function, which accepts a query (in this case, a user’s profile information) and returns a prediction.

While it is possible to run your predict() function locally within your application, the usual advantages of microservices apply doubly here. Models often need frequent retraining and updating, something that can be done much more easily when they’re deployed independently from your application. Models are also typically the purview of a dedicated data science team, and decoupling your prediction API from the rest of your application allows them to work independently from your broader engineering team.

In the following examples, we’re going to look at some popular NLP models, and demonstrate how you can build real world software by deploying them as microservices.

1. Autocomplete with OpenAI’s GPT-2

Autocomplete is a fairly ubiquitous feature among applications that accept text input. Traditionally, autocomplete is implemented by comparing the text inputted so far against a dictionary of acceptable words, and returning words which the user could potentially be typing.

However, with machine learning, we can do better. We can predict a user’s next word, phrase, or even sentence.

Case in point—Hugging Face’s Write With Transformer project, which allows you to write in a text editor using ML-powered autocomplete:

What your architecture might look like:

To implement autocomplete as in the example above, all you need is:

An API that accepts a string of text and predicts the next text sequence

A frontend that accepts text input from the user, queries your API, and displays the response

What model you should use:

The best model for this task (and the one being used in the above GIF) is OpenAI’s GPT-2, the model which is famously so powerful that OpenAI had to release it in stages to ensure it wouldn’t be abused by bad actors.

2. Customer support bot with Microsoft’s DialoGPT

Similar to autocomplete, support bots traditionally haven’t been built on machine learning. Instead, support bots have operated more similarly to the “automated attendant” at your bank (“Press 8 for more options”).

Now, however, machine learning is powering support bots that can do more than answer pre-selected questions. These new support bots can field a user’s question and, using your FAQs and documentation, respond with a personalized answer:

According to an internal study, Intercom found that their ML-powered Answer Bot was able to handle 29% of customer questions without human intervention—which, particularly at a large scale, means many hours and salaries saved.

What your architecture might look like:

The architecture of a simple support bot is fairly straightforward. You’ll need:

A database of help documents for your API to reference

An API which accepts a question posed by a user and produces an answer from your documents

A chat interface that accepts questions from a user, queries your API, and displays the response

What model you should use:

AllenNLP’s ELMo-BiDAF.

If you’re interested in how ELMo-BiDAF works under the hood, you can read my full article on it here, but on a high level, LMo-BiDAF is an ensemble model, meaning it combines the outputs of multiple base models. BiDAF is a machine comprehension model, meaning it is designed to answer queries about a subject it analyzes. ELMo is an NLP model that uses word vectors—long vectors of floating point decimals that measure a word or character’s statistical relationship to other words—to “map” the meaning of a word or character.

By combing the two, we have a state of the art of model for answering questions about a given body of text, one which is implemented very easily using AllenNLP’s library.

3. Automatic content summarizer with BERT

Imagine for a second that you’re a media outlet tasked with reporting on every single high school football game on the east coast. The data is all there—schools post their scores—but how do you possibly produce content at that scale?

As it turns out, media companies have already solved this exact problem using machine learning. The below is a report produced by The Washington Post’s robot reporter, Heliograf:

Let’s imagine you need to build something similar.

What your architecture might look like:

To build a text summarizer that monitors your niche and produces short, relevant summaries, you might need:

A scraper to parse reports from relevant outlets

An API that can analyze a body of text and return a summary

A dashboard or feed to display your summaries

What model you should use:

Google’s BERT.

Google’s BERT is a popular NLP model, and more importantly, it is implemented in a community project called the BERT Extractive Summarizer which uses BERT for text summarization. You can implement the summarizer in a few lines of code, and instantly have a text summarization API.

If you can use microservices, you can use machine learning

Machine learning is interesting—but intimidating—to many web developers. It doesn’t need to be.

While designing a new model from scratch will require some advanced data science expertise on your part, getting started with pre-trained model will not. You can think of it as the difference between using a library or package, versus designing one yourself.

The unavoidable challenge in model serving, regardless of whether you use a pre-trained model or not, is scale.

Serving predictions from machine learning models requires significant computing resources, and as traffic scales up, managing a model as a microservice becomes a serious infrastructure challenge. You need to auto scale your instances to handle traffic fluctuations, handle updates and failovers gracefully, and implement at least basic monitoring.

Regardless of what you work on, there’s no need to be intimidated by machine learning as a web developer. If you understand how to build an app using microservices, you can add machine learning functionality to your software.

If you’re interested in deploying a model as a production API, check out Cortex (Full disclosure: I’m a contributor).