Integrating a voice or chatbot interface into a product used to require a Natural Language Understanding (NLU) cloud service. Today, we are open sourcing Snips NLU, a Private by Design, GDPR compliant NLU engine. It can run on the Edge or on a server, with minimal footprint, while performing as good or better than cloud solutions.

Natural Language Understanding

2017 was arguably the year of the AI assistant. From the 60 million messages Facebook bots process every day, to the tens of millions of users now talking to an Alexa or Google-powered device, natural language has become a preferred mode of interaction between people and machines. A new skill is being added to the Amazon Alexa skill store every 90 minutes, making voice assistants grow faster than smartphone app stores did.

Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant. Voice requires an additional step, which is to transcribe the voice of the user into the corresponding text, prior to running the NLU.

The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.

NLU engine output example

Most chatbots and voice assistants rely on cloud services for their NLU. The most common ones include Dialogflow (Google, ex API.ai), Amazon Lex, Amazon Alexa, Luis.ai (Microsoft), Wit.ai (Facebook), and IBM Watson.

One thing all these solutions have in common is that they are fully centralized, running on the provider’s servers. This means they can access all the data being sent to their service, and reuse it at will. Looking at the terms of service shows just that, such as here for Amazon Lex:

Data and Security

Q. Are voice and text inputs processed by Amazon Lex stored, and how are they used by AWS? Amazon Lex may store and use voice and text inputs processed by the service solely to provide and maintain the service and to improve and develop the quality of Amazon Lex and other Amazon machine-learning/artificial-intelligence technologies. Use of your content is necessary for continuous improvement of your Amazon Lex customer experience, including the development and training of related technologies.(…)

Let’s take a step back. The software industry had traditionally converged towards analytics and potential crash reports as general good practices to foster continuous improvement of products. But it was always limited to cases where things failed. The systematic and permanent data collection that has become the norm in AI is a new thing. Supposedly, collection and storage of this content is deemed necessary for the development of all AI technologies.

This is in fact a false dichotomy. At Snips, we built a voice platform for connected devices that includes Wakeword detection, Speech Recognition (ASR), Natural Language Understanding (NLU), and dialog. The unique thing about Snips is that everything runs on the Edge, with the voice of the user being processed directly on the device they are speaking to. We never touch, process or collect any user data, making our platform the first Private by Design alternative to traditional voice assistants.

In an effort to provide more transparency and privacy, we are progressively open sourcing the core components of our technology, starting today with Snips NLU.

Snips NLU is a Python library that can be used to easily train models, and use trained models to make predictions on new queries. See the documentation here. In addition, we are also open sourcing Snips NLU-rs, a Rust implementation focused on the prediction (a.k.a. inference) part. This library can be used on most modern architectures: on small connected devices, on mobile, on desktop, or on a server. It can currently handle 5 languages (English, French, German, Spanish and Korean), with more added regularly.

Let’s now take a look at how Snips NLU fares. We will focus on performance and accuracy, showing that our technology performs on par or better than cloud-based solutions.

Inference running time

One typical argument for using machine-learning cloud services is infrastructure costs. That’s what the cloud is for, to alleviate developers from complex and costly operations, so they can concentrate on what they do best.

We understand this is a critical aspect, in particular since our platform has to run on tiny devices. We have optimized our Snips NLU-rs inference engine to run literally anywhere, from a 5$ Raspberry Pi Zero to an AWS EC2 free-tier instance. This is the average running time for processing a query in a typical assistant:

Average time to parse a query with Snips NLU-rs

In average, it takes less than 2 milliseconds to parse a query on a 2015 Macbook Pro with 2.5GHz Core i7. Because our platform is optimized for performance and doesn’t require network access, the typical gain of time can be upwards of 2 orders of magnitude compared to using a cloud service!

Memory has also been optimized, ranging from a few hundreds KB of RAM for common cases to a few MB for the most complex assistants. This means the assistant can fit in a Raspberry Pi, a mobile app or an Amazon Free Tier. It also means more powerful servers can handle hundreds of parallel instances of the NLU engine!

To achieve this level of performance, we had to completely rethink how to build an NLU engine, both in terms of engineering and machine learning.

For instance, we implemented in Rust everything that wasn’t already written in a low-level language. In case you are not familiar with Rust, it’s a brilliant language offering memory safety while performing as fast as C++.

On the machine learning side, we tried dozens of different models, from CNNs to bi-LSTMs, but ended up using a more traditional flat model called a linear chain Conditional Random Field (CRF). We also replaced the heavy word embeddings by a carefully crafted set of features that capture semantic and structural signals from the sentence. We found that there was no significant gain using deep learning versus CRFs for Natural Language Understanding tasks. This is not true of ASR though, where deep learning is mandatory to achieve high accuracy (more on that soon).

Accuracy

To verify that the Snips NLU engine worked well, we benchmarked it against cloud services, including API.ai (now DialogFlow, Google), Wit.ai (Facebook), Luis.ai (Microsoft), and Amazon Alexa. Every solution was trained using the same dataset, and tested on the same out-sample test set. Results showed that our NLU is as accurate or better than cloud solutions at slot extraction tasks, regardless of how much training data was used.