Have you ever wondered how media organizations are able to produce the raw volume of content they output?

How is that the Associated Press, in addition to all of their other coverage, is able to cover 4,400 quarterly earning reports each year? How does The Washington Post run such hyperlocal coverage — like covering every high school football game in a town? How does Bloomberg report on so many companies at once?

The answer, it turns out, is machine learning — and more precisely, text summarization.

The Washington Post has Heliograf. The AP uses Wordsmith. Bloomberg has the creatively-named Cyborg. Instead of assigning reporters to these beats, these outlets hook up their data pipelines—like their queue of quarterly earnings reports or high school football scores—to machine learning models that can summarize the information and produce short, valuable articles.

And the use cases of text summarization go beyond media. Want to auto-generate summaries of meeting notes? Legal contracts? Lectures? All of the above can be done using text summarization, and the technology is surprisingly accessible.

To illustrate this accessibility, we’re going to build a text summarization backend. In this tutorial, I’ll be using my text summarization API to summarize press releases from PR News Wire, but this is just illustrative — you can implement it with any software you’re working on.

Let’s dive in.

Step 1. Scoping our project

To start our project, we need to do some scoping. This tutorial will be focused on building our backend, so there won’t be any frontend instruction — though I’ll share screenshots of my app for context.

In terms of our technical needs, we need three things:

A simple way to implement a text summarizer model.

A way to deploy our model as an API easily.

A solution for scaling and monitoring our deployment automatically.

There are many ways to approach all of the above. You could build and train your own model. You could use an end-to-end machine learning service like Amazon’s SageMaker. But in this tutorial, we’ll be prioritizing two things: simplicity and cost.

Towards that end, we’ll be using a library that allows us to initialize our model with a single line of code and serve predictions with a simple predict() function. Similarly, we’ll be using a tool that allows us to deploy our model as an API with a single terminal command. That same tool will automate most of the infrastructure work needed to maintain our API.

Let’s start with our model.

Step 2. Using the Bert Extractive Summarizer

In order for our model to summarize text accurately, it needs to “understand” it. Google’s BERT (Bidirectional Encoder Representations from Transformers) is the perfect place to start for natural language understanding.

Without getting too in the weeds, BERT learns from unlabeled text by analyzing the preceding and subsequent text around each word (hence “bidirectional”). From this basis of general language understanding, BERT can be easily fine-tuned for more specific use cases, like summarizing text.

We are not going to fine-tune BERT for text summarization, because someone else has already done it for us. Derek Miller recently released the Bert Extractive Summarizer, which is a library that gives us access to a pre-trained BERT-based text summarization model, as well as some really intuitive functions for using it.

What we’re going to do is write a Python script for our API that will load the model and service requests. We’ll call the script predictor.py , and it will be four lines of code:

That is literally all the code we need to write to handle inference, because of how simple the Bert Extractive Summarizer makes things.

Step 3. Deploying our model

As we touched on earlier, there are many ways to serve a model. Our priorities in picking a deployment option are:

Reliability . This is production software, not a toy project, and so we need something that won’t break under traffic.

. This is production software, not a toy project, and so we need something that won’t break under traffic. Simplicity . We don’t want to spend hours wrangling config docs just to deploy one model. We want a simple “deploy” command that takes us from model to API.

. We don’t want to spend hours wrangling config docs just to deploy one model. We want a simple “deploy” command that takes us from model to API. Cost. We don’t want to foot the bill for a solution as expensive as SageMaker.

Towards that end, we’re going to be using Cortex. For those who are unfamiliar, Cortex is an open source tool that deploys models as web APIs on AWS.

To start, you’ll need to install Cortex. The install process should take roughly 10 minutes and will require you to have/create a free AWS account.

The beauty of Cortex, for our purposes, is that all it needs is a simple configuration file to be set up, and then you can deploy models whenever you want—just run cortex deploy from your command line. It takes care of containerizing your model, it automates Kubernetes to manage your containers, and it handles autoscaling and monitoring.

The configuration file, which serves as a blueprint for your deployment, should be titled cortex.yaml , and it can look like this:

There are many other configuration options possible with Cortex — you can read more about them here—but for the sake of minimalism, this is all we really need. This file tells Cortex which model serving API we’ll be using (in this case, Cortex’s Predictor API), where our predictor.py script is located, and what kind of memory we’ll need for our compute needs.

We’ll also need to add a requirements.txt file, to supply Cortex with the libraries for our predictor.py script. You can just download requirements.txt here.

Once you have cortex.yaml and requirements.txt set up, go to your command line and run cortex deploy :

$ cortex deploy deployment started

Give it a minute to spin up, and then check the health of your deployment by running cortex get :

$ cortex get status up-to-date available requested last update avg latency

live 1 1 1 9m -

Assuming your deployment is live and available, we’re ready to move on to the final step.

Step 4. Serve predictions from our model

Before we connect our backend to our application, we should test our API. Because our API is accessible through a simple HTTP endpoint, we can test it using any method capable of querying and receiving responses. For the sake of simplicity, we’ll use curl.

First, use cortex get to get your endpoint, and then query it with curl:

$ cortex get summarizer url: http://***.amazonaws.com/text/summarizer

-X POST -H “Content-Type: application/json” \

-d ‘{“text”: “Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.”}’ $ curl http://***.amazonaws.com/text/summarizer -X POST -H “Content-Type: application/json” \-d ‘{“text”: “Machine learning (ML) is the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence.”}’

By separating our backend out as a standalone API, we decouple the logic behind our machine learning from the frontend of our application. We can make any changes we want to our app without changing how our model is served.

For example, I have a simple service that scrapes and displays press release headlines from PR News Wire on a single page for me, similar to an RSS feed.

Now, I’ve added an extra step in the display process, where instead of displaying just the headline, my app queries my API and returns a summary of the release:

Any changes I make to the frontend — maybe I want to scrape from a different source — will not affect the text summarizer.

Text summarization in any application

What makes this approach so nice is that you can use it to incorporate text summarization in any piece of software. If there is some piece of text you would like to summarize, you simply ping the API and get a summarized response to use in your app.

The simplicity of this approach leads to a big increase in interoperability. No matter what language or framework you use to build your application — or whatever logic underpins its functionality — as long as you have the ability to send and receive HTTP requests, you can use machine learning-powered text summarization in your application.

If you have any questions about this tutorial or the tools used in it, feel free to ask in the comments — and if you build anything with text summarization, share it as well!