This tutorial introduces you to Core ML and Vision, two cutting-edge iOS frameworks, and how to fine-tune a model on the device.

Update note: Christine Abernathy updated this tutorial for Xcode 11, Swift 5 and iOS 13. Audrey Tam wrote the original.

Apple released Core ML and Vision in iOS 11. Core ML gives developers a way to bring machine learning models into their apps. This makes it possible to build intelligent features on-device like object detection.

iOS 13 added on-device training in Core ML 3 and unlocked new ways to personalize the user experience.

In this tutorial, you’ll learn how to fine-tune a model on the device using Core ML and Vision Framework. To learn this, you’ll start with Vibes, an app that generates quotes based on the given image. It also allows you to add your favorite emojis using shortcuts after training the model.

Getting Started

To get started, click the Download Materials button at the top or bottom of this tutorial. Inside the zip file, you’ll find two folders: Starter and final. Now double-click Vibes.xcodeproj in the starter project to open it in Xcode.

Build and run the project. You’ll see this:

Tap the camera icon and select a photo from the library to view a quote. Next, tap the sticker icon and select a sticker to add to the image. Move the sticker around to any desired location:

There are two things you can improve:

The quote is randomly selected. How about displaying a quote that’s related to the selected image? Adding stickers takes too many steps. What if you could create shortcuts for stickers you use the most?

Your goal in this tutorial is to use machine learning to tackle these two challenges.

What is Machine Learning?

If you’re new to machine learning, it’s time to demystify some common terms.

Artificial Intelligence, or AI, is the power added to a machine programmatically to mimic human actions and thoughts.

Machine Learning, or ML, is a subset of AI that trains machines to perform certain tasks. For example, you can use ML to train a machine to recognize a cat in an image or translate text from one language to another.

Deep Learning is one method of training a machine. This technique mimics the human brain, which consists of neurons organized in a network. Deep Learning trains an artificial neural network from the data provided.

Say you want the machine to recognize a cat in an image. You can feed the machine lots of images that are manually labeled cat and not cat. You then build a model that can make accurate guesses or predictions.

Training With Models

Apple defines a model as “the result of applying a machine-learning algorithm to a set of training data”. Think of a model as a function that takes an input, performs a particular operation to its best on the given input, such as learning and then predicting and classifying, and produces the suitable output.

Training with labeled data is called supervised learning. You want lots of good data to build a good model. What does good mean? It means the data represents the use cases for which you’re building.

If you want your model to recognize all cats but only feed it a specific breed, it may miss others. Training with biased data can lead to undesired outcomes.

Training is compute-intensive and often done on servers. With their parallel computing capabilities, GPUs typically speed things up.

Once training is complete, you can deploy your model to production to run predictions or inferences on real-world data.

Inference isn’t as computationally demanding as training. However, in the past mobile apps had to make remote calls to a server for model inference.

Advances to mobile chip performance have opened the door to on-device inference. The benefits include reduced latency, less network dependency and improved privacy. But you get increases in app size and battery drain due to computational load.

This tutorial showcases Core ML for on-device inference and on-device training.

Apple’s Frameworks and Tools for Machine Learning

Core ML works with domain-specific frameworks such as Vision for image analysis. Vision provides high-level APIs to run computer vision algorithms on images and videos. Vision can classify images using a built-in model that Apple provides or custom Core ML models that you provide.

Core ML is built on top of lower-level primitives: Accelerate with BNNS and Metal Performance Shaders:

Other domain-specific frameworks that Core ML works with include Natural Language for processing text and Sound Analysis for identifying sounds in audio.

Integrating a Core ML Model Into Your App

To integrate with Core ML, you need a model in the Core ML Model format. Apple provides pre-trained models you can use for tasks like image classification. If those don’t work for you, you can look for models created by the community or create your own.

For your first enhancement to Vibes, you need a model that does image classification. Models are available with varying degrees of accuracy and model size. You’ll use SqueezeNet, a small model trained to recognize common objects.

Drag SqueezeNet.mlmodel from the starter Models directory into your Xcode project’s Models folder:

Select SqueezeNet.mlmodel and review the model details in Project navigator:

The Prediction section lists the expected inputs and outputs:

The image input expects an image of size 227×227.

input expects an image of size 227×227. There are two output types: classLabelProbs returns a dictionary with the probabilities for the categories. classLabel returns the category with the highest probability.

Click the arrow next to the model:

Xcode auto-generates a file for the model that includes classes for the input, output and main class. The main class includes various methods for making predictions.

The standard workflow of Vision framework is:

First, create Core ML model. Then, create one or more requests. Finally, create and run a request handler.

You’ve already created your model, SqueezeNet.mlmodel. Next, you’ll create a request.

Creating a Request

Go to CreateQuoteViewController.swift and add the following after the UIKit import:

import CoreML import Vision

Vision helps work with images, such as converting them to the desired format.

Add the following property:

// 1 private lazy var classificationRequest: VNCoreMLRequest = { do { // 2 let model = try VNCoreMLModel(for: SqueezeNet().model) // 3 let request = VNCoreMLRequest(model: model) { request, _ in if let classifications = request.results as? [VNClassificationObservation] { print("Classification results: \(classifications)") } } // 4 request.imageCropAndScaleOption = .centerCrop return request } catch { // 5 fatalError("Failed to load Vision ML model: \(error)") } }()

Here’s a breakdown of what’s going on:

Define an image analysis request that’s created when first accessed. Create an instance of the model. Instantiate an image analysis request object based on the model. The completion handler receives the classification results and prints them. Use Vision to crop the input image to match what the model expects. Handle model load errors by killing the app. The model is part of the app bundle so this should never happen.

Integrating the Request

Add the following at the end of the private CreateQuoteViewController extension:

func classifyImage(_ image: UIImage) { // 1 guard let orientation = CGImagePropertyOrientation( rawValue: UInt32(image.imageOrientation.rawValue)) else { return } guard let ciImage = CIImage(image: image) else { fatalError("Unable to create \(CIImage.self) from \(image).") } // 2 DispatchQueue.global(qos: .userInitiated).async { let handler = VNImageRequestHandler(ciImage: ciImage, orientation: orientation) do { try handler.perform([self.classificationRequest]) } catch { print("Failed to perform classification.

\(error.localizedDescription)") } } }

Here’s what this classification request method does:

Gets the orientation of the image and the CIImage representation. Kicks off an asynchronous classification request in a background queue. You create a handler to perform the Vision request, and then schedule the request.

Finally, add the following at the end of imagePickerController(_:didFinishPickingMediaWithInfo:) :

classifyImage(image)

This triggers the classification request when the user selects an image.

Build and run the app. Tap the camera icon and select a photo. Nothing changes visually:

However, the console should list the raw classification results:

In this example, the classifier has a 27.9% confidence that this image is a cliff, drop, drop-off. Find classificationRequest and replace the print statement with the code below to log the results:

let topClassifications = classifications.prefix(2).map { (confidence: $0.confidence, identifier: $0.identifier) } print("Top classifications: \(topClassifications)")

Build and run the app and go through the steps to select a photo. The console should log the top results:

You can now use the extracted prediction details to show a quote related to the image.

Adding a Related Quote

In imagePickerController(_:didFinishPickingMediaWithInfo:) , remove the following:

if let quote = getQuote() { quoteTextView.text = quote.text }

This displays a random quote and is no longer needed.

Next, you’ll add logic to get a quote using the results of VNClassificationObservation . Add the following to the CreateQuoteViewController extension:

func processClassifications(for request: VNRequest, error: Error?) { DispatchQueue.main.async { // 1 if let classifications = request.results as? [VNClassificationObservation] { // 2 let topClassifications = classifications.prefix(2).map { (confidence: $0.confidence, identifier: $0.identifier) } print("Top classifications: \(topClassifications)") let topIdentifiers = topClassifications.map {$0.identifier.lowercased() } // 3 if let quote = self.getQuote(for: topIdentifiers) { self.quoteTextView.text = quote.text } } } }

Here what’s going on in the code above:

This method processes the results from an image classification request. The method extracts the top two predictions using code you’ve seen before. The predictions feed into getQuote(for:) to get a matching quote.

The method runs on the main queue to ensure that the quote display update happens on the UI thread.

Finally, call this method from classificationRequest and change request to the following:

let request = VNCoreMLRequest(model: model) { [weak self] request, error in guard let self = self else { return } self.processClassifications(for: request, error: error) }

Here, your completion handler calls your new method to process the results.

Build and run the app. Select a photo with a lemon or lemon tree in it. If necessary, download one from the browser. You should see the lemon quote selected instead of a random quote:

Verify that the console logs a matching classification:

Test the flow a few times to verify the consistency of the results.

Great stuff! You’ve learned how to use Core ML for on-device model inference. :]

Personalizing a Model on the Device

With Core ML 3, you can fine-tune an updatable model on the device during runtime. This means you can personalize the experience for each user.

On-device personalization is the idea behind Face ID. Apple can ship a model down to the device that recognizes generic faces. During Face ID set up, each user can fine-tune the model to recognize their face.

It doesn’t make sense to ship this updated model back up to Apple for deployment to other users. This underscores the advantage of the privacy that on-device personalization brings.

An updatable model is a Core ML model that’s marked as updatable. You also define the training inputs that you’ll use to update the model.

k-Nearest Neighbors

You’ll enhance Vibes using an updatable drawing classifier model. The classifier recognizes new drawings based on k-Nearest Neighbors, or k-NN. K-what?

The k-NN algorithm assumes that similar things are close to each other.

It does this by comparing feature vectors. A feature vector contains important information that describes an object’s characteristics. An example feature vector is RGB color represented by R, G, B.

Comparing the distance between feature vectors is a simple way to see if two objects are similar. k-NN categorizes an input by using its k nearest neighbors.

The example below shows a spread of drawings classified as squares and circles. Let’s say you want to find out what group the new mystery drawing in red belongs to:

Choosing k = 3 predicts that this new drawing is a square:

k-NN models are simple and fast. You don’t need many examples to train them. Performance can slow down though, if there’s lots of example data.

k-NN is one of the model types that Core ML supports for training. Vibes uses an updatable drawing classifier with:

A neural network that acts as a feature extractor. The neural network knows how to recognize drawings. You need to extract the features for the k-NN model. A k-NN model for on-device drawing personalization.

In Vibes, the user can add a shortcut by selecting an emoji then drawing three examples. You’ll train the model with the emoji as the label and the drawings as the training examples.

Setting Up Training Drawing Flow

First, prepare the screen to accept user input to train your model by:

Adding a screen to be shown on selecting an emoji. Add action on tapping save. Removing UIPanGestureRecognizer from stickerLabel .

Open AddStickerViewController.swift and in collectionView(_:didSelectItemAt:) replace the performSegue(withIdentifier:sender:) call with the following:

performSegue(withIdentifier: "AddShortcutSegue", sender: self)

This transitions to the examples drawing view when the user selects an emoji.

Next, open AddShortcutViewController.swift and add the following code to implement savePressed(_:) :

print("Training data ready for label: \(selectedEmoji ?? "")") performSegue( withIdentifier: "AddShortcutUnwindSegue", sender: self)

This unwinds the segue to go back to the main screen when the user taps Save.

Finally, open CreateQuoteViewController.swift and in addStickerToCanvas(_:at:) remove the following code:

stickerLabel.isUserInteractionEnabled = true let panGestureRecognizer = UIPanGestureRecognizer( target: self, action: #selector(handlePanGesture(_:))) stickerLabel.addGestureRecognizer(panGestureRecognizer)

This removes the code that allows the user to move stickers around. This was only useful when the user couldn’t control the sticker location.

Build and run the app then select a photo. Tap the sticker icon and select an emoji. You’ll see your selected emoji as well as three drawing canvases:

Now, draw three similar images. Verify that Save is enabled when you complete the third drawing:

Then, tap Save and verify that the selected emoji is logged in the console:

You can now turn your attention to the flow that triggers the shortcut.

Adding the Shortcut Drawing View

It’s time to prepare the drawing view on image by following these steps:

First, declare a DrawingView. Next, add the drawing view in the main view. Then, call the addCanvasForDrawing from viewDidLoad() . Finally, clear the canvas on selecting an image.

Open CreateQuoteViewController.swift and add the following property after the IBOutlet declarations:

var drawingView: DrawingView!

This contains the view where the user draws the shortcut.

Next, add the following code to implement addCanvasForDrawing() :

drawingView = DrawingView(frame: stickerView.bounds) view.addSubview(drawingView) drawingView.translatesAutoresizingMaskIntoConstraints = false NSLayoutConstraint.activate([ drawingView.topAnchor.constraint(equalTo: stickerView.topAnchor), drawingView.leftAnchor.constraint(equalTo: stickerView.leftAnchor), drawingView.rightAnchor.constraint(equalTo: stickerView.rightAnchor), drawingView.bottomAnchor.constraint(equalTo: stickerView.bottomAnchor) ])

Here you create an instance of the drawing view and add it to the main view. You set Auto Layout constraints so that it overlaps only the sticker view.

Then, add the following to the end of viewDidLoad() :

addCanvasForDrawing() drawingView.isHidden = true

Here you add the drawing view and make sure it’s initially hidden.

Now, in imagePickerController(_:didFinishPickingMediaWithInfo:) add the following right after addStickerButton is enabled:

drawingView.clearCanvas() drawingView.isHidden = false

Here you clear any previous drawings and unhide the drawing view so the user can add stickers.

Build and run the app and select a photo. Use your mouse, or finger, to verify that you can draw on the selected image:

Progress has been made. Onwards!

Making Model Predictions

Drag UpdatableDrawingClassifier.mlmodel from the starter’s Models directory into your Xcode project’s Models folder:

Now, select UpdatableDrawingClassifier.mlmodel in Project navigator. The Update section lists the two inputs the model expects during training. One represents the drawing and the other the emoji label:

The Prediction section lists the input and outputs. The drawing input format matches that used during training. The label output represents the predicted emoji label.

Select the Model folder in Xcode’s Project navigator. Then, go to File ▸ New ▸ File…, choose the iOS ▸ Source ▸ Swift File template, and click Next. Name the file UpdatableModel.swift and click Create.

Now, replace the Foundation import with the following:

import CoreML

This brings in the machine learning framework.

Now add the following extension to the end of the file:

extension UpdatableDrawingClassifier { var imageConstraint: MLImageConstraint { return model.modelDescription .inputDescriptionsByName["drawing"]! .imageConstraint! } func predictLabelFor(_ value: MLFeatureValue) -> String? { guard let pixelBuffer = value.imageBufferValue, let prediction = try? prediction(drawing: pixelBuffer).label else { return nil } if prediction == "unknown" { print("No prediction found") return nil } return prediction } }

This extends UpdatableDrawingClassifier which is the generated model class. Your code adds the following:

imageConstraint to make sure the image matches what the model expects. predictLabelFor(_:) to call the model’s prediction method with the CVPixelBuffer representation of the drawing. It returns the predicted label or nil if there’s no prediction.

Updating the Model

Add the following after the import statement:

struct UpdatableModel { private static var updatedDrawingClassifier: UpdatableDrawingClassifier? private static let appDirectory = FileManager.default.urls( for: .applicationSupportDirectory, in: .userDomainMask).first! private static let defaultModelURL = UpdatableDrawingClassifier.urlOfModelInThisBundle private static var updatedModelURL = appDirectory.appendingPathComponent("personalized.mlmodelc") private static var tempUpdatedModelURL = appDirectory.appendingPathComponent("personalized_tmp.mlmodelc") private init() { } static var imageConstraint: MLImageConstraint { let model = updatedDrawingClassifier ?? UpdatableDrawingClassifier() return model.imageConstraint } }

The struct represents your updatable model. The definition here sets up properties for the model. These include locations to the original compiled model and the saved model.

Note: Core ML uses a compiled model file with an .mlmodelc extension which is actually a folder.

Loading the Model Into Memory

Now, add the following private extension after the struct definition:

private extension UpdatableModel { static func loadModel() { let fileManager = FileManager.default if !fileManager.fileExists(atPath: updatedModelURL.path) { do { let updatedModelParentURL = updatedModelURL.deletingLastPathComponent() try fileManager.createDirectory( at: updatedModelParentURL, withIntermediateDirectories: true, attributes: nil) let toTemp = updatedModelParentURL .appendingPathComponent(defaultModelURL.lastPathComponent) try fileManager.copyItem( at: defaultModelURL, to: toTemp) try fileManager.moveItem( at: toTemp, to: updatedModelURL) } catch { print("Error: \(error)") return } } guard let model = try? UpdatableDrawingClassifier( contentsOf: updatedModelURL) else { return } updatedDrawingClassifier = model } }

This code loads the updated, compiled model into memory. Next, add the following public extension right after the struct definition:

extension UpdatableModel { static func predictLabelFor(_ value: MLFeatureValue) -> String? { loadModel() return updatedDrawingClassifier?.predictLabelFor(value) } }

The predict method loads the model into memory then calls the predict method that you added to the extension.

Now, open Drawing.swift and add the following after the PencilKit import:

import CoreML

You need this to prepare the prediction input.

Preparing the Prediction

Core ML expects you to wrap the input data for a prediction in an MLFeatureValue object. This object includes both the data value and its type.

In Drawing.swift, add the following property to the struct:

var featureValue: MLFeatureValue { let imageConstraint = UpdatableModel.imageConstraint let preparedImage = whiteTintedImage let imageFeatureValue = try? MLFeatureValue(cgImage: preparedImage, constraint: imageConstraint) return imageFeatureValue! }

This defines a computed property that sets up the drawing’s feature value. The feature value is based on the white-tinted representation of the image and the model’s image constraint.

Now that you’ve prepared the input, you can focus on triggering the prediction.

First, open CreateQuoteViewController.swift and add the DrawingViewDelegate extension to the end of the file:

extension CreateQuoteViewController: DrawingViewDelegate { func drawingDidChange(_ drawingView: DrawingView) { // 1 let drawingRect = drawingView.boundingSquare() let drawing = Drawing( drawing: drawingView.canvasView.drawing, rect: drawingRect) // 2 let imageFeatureValue = drawing.featureValue // 3 let drawingLabel = UpdatableModel.predictLabelFor(imageFeatureValue) // 4 DispatchQueue.main.async { drawingView.clearCanvas() guard let emoji = drawingLabel else { return } self.addStickerToCanvas(emoji, at: drawingRect) } } }

Recall that you added a DrawingView to draw sticker shortcuts. In this code, you conform to the protocol to get notified whenever the drawing has changed. Your implementation does the following:

Creates a Drawing instance with the drawing info and its bounding square. Creates the feature value for the drawing prediction input. Makes a prediction to get the emoji that corresponds to the drawing. Updates the view on the main queue to clear the canvas and add the predicted emoji to the view.

Then, in imagePickerController(_:didFinishPickingMediaWithInfo:) remove the following:

drawingView.clearCanvas()

You don’t need to clear the drawing here. You’ll do this after you make a prediction.

Testing the Prediction

Next, in addCanvasForDrawing() add the following right after assigning drawingView :

drawingView.delegate = self

This makes the view controller the drawing view delegate.

Build and run the app and select a photo. Draw on the canvas and verify that the drawing is cleared and the following is logged in the console:

That’s to be expected. You haven’t added a sticker shortcut yet.

Now walk through the flow of adding a sticker shortcut. After you come back to the view of the selected photo, draw the same shortcut:

Oops, the sticker still isn’t added! You can check the console log for clues:

After a bit of head-scratching, it may notice that your model has no clue about the sticker you’ve added. Time to fix that.

Updating the Model

You update a model by creating an MLUpdateTask . The update task initializer requires the compiled model file, training data and a completion handler. Generally, you want to save your updated model to disk and reload it, so new predictions make use of the latest data.

You’ll start by preparing the training data based on the shortcut drawings.

Recall that you made model predictions by passing in an MLFeatureProvider input. Likewise, you can train a model by passing in a MLFeatureProvider input. You can make batch predictions or train with many inputs by passing in an MLBatchProvider containing multiple feature providers.

First, open DrawingDataStore.swift and replace the Foundation import with the following:

import CoreML

You need this to set up the Core ML training inputs.

Next, add the following method to the extension:

func prepareTrainingData() throws -> MLBatchProvider { // 1 var featureProviders: [MLFeatureProvider] = [] // 2 let inputName = "drawing" let outputName = "label" // 3 for drawing in drawings { if let drawing = drawing { // 4 let inputValue = drawing.featureValue // 5 let outputValue = MLFeatureValue(string: emoji) // 6 let dataPointFeatures: [String: MLFeatureValue] = [inputName: inputValue, outputName: outputValue] // 7 if let provider = try? MLDictionaryFeatureProvider( dictionary: dataPointFeatures) { featureProviders.append(provider) } } } // 8 return MLArrayBatchProvider(array: featureProviders) }

Here’s a step-by-step breakdown of this code:

Initialize an empty array of feature providers. Define the names for the model training inputs. Loop through the drawings in the data store. Wrap the drawing training input in a feature value. Wrap the emoji training input in a feature value. Create a data point for the training input. This is a dictionary of the training input names and feature values. Create a feature provider for the data point and append it to the feature providers array. Finally, create a batch provider from the array of feature providers.

Now, open UpdatableModel.swift and add the following method to the end of the UpdatableDrawingClassifier extension:

static func updateModel( at url: URL, with trainingData: MLBatchProvider, completionHandler: @escaping (MLUpdateContext) -> Void ) { do { let updateTask = try MLUpdateTask( forModelAt: url, trainingData: trainingData, configuration: nil, completionHandler: completionHandler) updateTask.resume() } catch { print("Couldn't create an MLUpdateTask.") } }

The code creates the update task with the compiled model URL. You also pass in a batch provider with the training data. The call to resume() starts the training and the completion handler is called when training finishes.

Saving the Model

Now, add the following method to the private extension for UpdatableModel :

static func saveUpdatedModel(_ updateContext: MLUpdateContext) { // 1 let updatedModel = updateContext.model let fileManager = FileManager.default do { // 2 try fileManager.createDirectory( at: tempUpdatedModelURL, withIntermediateDirectories: true, attributes: nil) // 3 try updatedModel.write(to: tempUpdatedModelURL) // 4 _ = try fileManager.replaceItemAt( updatedModelURL, withItemAt: tempUpdatedModelURL) print("Updated model saved to:

\t\(updatedModelURL)") } catch let error { print("Could not save updated model to the file system: \(error)") return } }

This helper class does the work of saving the updated model. It takes in an MLUpdateContext which has useful info about the training. The method does the following:

First it gets the updated model from memory. This is not the same as the original model. Then it creates an intermediary folder to save the updated model. It writes the updated model to a temporary folder. Finally, it replaces the model folder’s content. Overwriting the existing mlmodelc folder gives errors. The solution is to save to an intermediate folder then copy the contents over.

Performing the Update

Add the following method to the public UpdatableModel extension:

static func updateWith( trainingData: MLBatchProvider, completionHandler: @escaping () -> Void ) { loadModel() UpdatableDrawingClassifier.updateModel( at: updatedModelURL, with: trainingData) { context in saveUpdatedModel(context) DispatchQueue.main.async { completionHandler() } } }

The code loads the model into memory then calls the update method you defined in its extension. The completion handler saves the updated model then runs this method’s completion handler.

Now, open AddShortcutViewController.swift and replace the savePressed(_:) implementation with the following:

do { let trainingData = try drawingDataStore.prepareTrainingData() DispatchQueue.global(qos: .userInitiated).async { UpdatableModel.updateWith(trainingData: trainingData) { DispatchQueue.main.async { self.performSegue( withIdentifier: "AddShortcutUnwindSegue", sender: self) } } } } catch { print("Error updating model", error) }

Here you’ve put everything together for training. After setting up the training data, you start a background queue to update the model. The update method calls the unwind segue to transition to the main screen.

Build and run the app and go through the steps to create a shortcut.

Verify that when you tap Save the console logs the model update:

Draw the same shortcut on the selected photo and verify that the right emoji shows:

Congratulations, you machine learning ninja!

Where to Go From Here?

Download the completed version of the project using the Download Materials button at the top or bottom of this tutorial.

Check out the Machine Learning in iOS video course to learn more about how to train your own models using Create ML and Turi Create. Beginning Machine Learning with Keras & Core ML walks you through how to train a neural network and convert it to Core ML.

Create ML app lets you build, train and deploy machine learning models with no machine learning expertise required. You can also check out official WWDC 2019 sessions on What’s New in Machine Learning and Training Object Detection Models in Create ML

I hope you enjoyed this tutorial! If you have any questions or comments, please join the discussion below.