In the previous article of this multi-part tutorial, I implemented a very simple Natural Language Classifier (NLC) in Core ML. This NLC is capable of detecting intents from utterances entered as text or captured using Apple’s speech-to-text API (SFSpeechRecognizer).

In the first version of the classifier, I used the Keras/TensorFlow API and the Apple NSLinguisticTagger API to build simple one-hot encoded sentence vectors for both training and inference phrases.

The diagram below summarizes how it was possible to create, from a very small training dataset, a word table for assigning a number to any single word found in the sample phrases. It also illustrates how these numbers are used as indexes to create the one-hot encoded vector that is passed as an input layer to the neural network.

This model worked in the simple test in part 1, but one-hot encoding has serious limitations in terms of scalability. It’s also inefficient because it creates very big, sparse vectors where it basically isn’t possible to use the classic self-discovery capabilities of a Deep Learning network from an input dataset.

In regards to the scalability issue, you can immediately see from the above picture that if you want to use a serious training dataset with several sample phrases, the word table dimensions and input layer vectors will increase considerably. As a consequence, this will significantly impact the overall size of the final model.

But even more important than these performance and dimension issues, these sparse vectors basically miss the time/order information of how the single words are used: First in the training dataset, and later in the sample utterances used for inference. This will drastically limit the automatic learning of many natural language rules, like synonym detection.