Custom Entity Recognition

If we want our tagger to recognize Apple product names, we need to create our own tagger with Create ML. First, download the JSON file called Products.json from this repository. Take the file and drag it into the playground’s left sidebar under the folder named Resources .

A quick briefing about JSON files — JSON is a great way to present data for ML applications, especially for text-based algorithms. You’ll see JSON files quite frequently because of its simplicity in presentation and accessibility.

If you open the file, you can see an array of tokens and labels. Here, tokens are each word/punctuation mark of a sample sentence. Labels are what we define each token as. You can see that words like “Apple Music” and “iPad” correspond to the PROD label.

A Look at Products.json

Now that we have this, let’s use Create ML to create our tagger. Head back to the playground. Right under import NaturalLanguage , import the packages we need:

import CreateML

import Foundation

Import the packages which we need

Below the code where we analyze our sentence, type the code that will create our Core ML model based on our Product Tagger.

let trainingData = try MLDataTable(contentsOf: Bundle.main.url(forResource: "Products", withExtension: "json")!) let model = try MLWordTagger(trainingData: trainingData, tokenColumn: "tokens", labelColumn: "labels")

let metadata = MLModelMetadata(author: "Sai Kambampati", shortDescription: "A custom NLP tagger to recognize Apple Produced as entities in a chunk of text.", license: "MIT", version: "1.0") try model.write(to: URL(fileURLWithPath: "/Users/SaiKambampati/Desktop/ProductTagger.mlmodel"), metadata: metadata)

Creating a model using CreateML

Make sure that Products.json is in the resources folder and that you replace the author name in metadata and path where you write the model to your names and your account. Run the code.

Watching the creation of the machine learning model

The Core ML model should be saved to your Desktop. Drag it over to the Resources folder in your playground. We’ll now see how the tagger performs.

You may have noticed that our accuracy is quite low: 77%. You can improve this by adding more training data to increase the iteration count, and thus, the overall accuracy.

Head to the playground and change your code to look like the image below.

Cleaning up the code

Basically, we’re removing the code we won’t use. Now, time for the magic! Type the following code:

// 1

var productTagScheme = NLTagScheme("Product")

var productTag = NLTag("PROD")



// 2

let modelURL = Bundle.main.url(forResource: "ProductTagger", withExtension: "mlmodelc")!

let productTaggerModel = try! NLModel(contentsOf: modelURL)



// 3

let productTagger = NLTagger(tagSchemes: [.nameType, productTagScheme])

productTagger.setModels([productTaggerModel], forTagScheme: productTagScheme)



// 4

productTagger.string = text

productTagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { (tag, tokenRange) -> Bool in

if let tag = tag, tags.contains(tag) {

print("\(text[tokenRange]): \(tag.rawValue)")

}

return true

}



productTagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: productTagScheme, options: options) { (tag, tokenRange) -> Bool in

if tag == productTag {

print("\(text[tokenRange]): PROD")

}

return true

}

This code should look very familiar since we’re using most of the same code from the previous section. Here’s a quick explanation.

We define productTagScheme and productTag as the tag schemes and tags of our product tagger. We reference the URL where our model exists. Remember to have your Core ML model in the Resources folder of your playground. Also, make sure the model extension reads mlmodelc , as Xcode will produce an error without the ‘c’. We change this Core ML model to be of type NLModel — a machine learning model specifically designed for Natural Language tasks. Just like earlier, we create a tagger called productTagger to have schemes of nameType and now productTagSchemes . We assign the model to our productTagger . We pass text to our tagger and just like before, we enumerate from start to end. This time, we have two enumerateTags() methods because one method searches for the .nameType tag scheme, and the other searches for the productTagScheme tag scheme.

Creating our new tagger

That’s it! Build and run your code to get an output like below!