Apple brought in a lot of exciting new developments to its Vision framework during WWDC 2019. Not only have they improved face tracking and image classification, but they’ve also introduced interesting new features such as Saliency, built-in animal classification models, and enhanced APIs for working with Core ML classification models. Among the newer releases, the ability to compare face capture quality in a set of images is one of the most promising features that’s come out this year.

The introduction of face capture quality has given Vision’s face technology a major boost. It showcases how much Apple has been investing in the field of computer vision to make photo capturing and processing smarter and easier than ever before.

The face capture quality metric uses a model that’s been trained on a wide range of images( different exposures, lighting, facial expressions, etc). The Vision request analyses the image in one shot and assigns it a metric score. The score depends on facial expressions (negative ones get a lower score), lighting, focus, and blurriness in the image.

Using these metric scores, we can compare different images to find the one in which the face looks the best. This is something that’s going to arrive soon in many custom selfie-based applications.

Face capture quality not only helps in building smarter camera-based applications, as demonstrated in the docs, but it also helps in bringing machine learning intelligence to video processing. The goal of this article is to make Live Photos (more on this later) smarter by leveraging face capture quality in our iOS Applications.

Live Photos was introduced in iOS with the iPhone 6s and is one of the most loved modes of the camera. It redefined the way we look at still images by providing a live motion effect.

Scope

The idea is to find the best frame from a Live Photo that has a human face. We’ll be using the new VNDetectFaceCaptureQualityRequest class in order to run our Vision requests on a number of live photos that were intentionally captured in a bad/blurry state in order to extract the best frame from it.

However, you can also extend the same code and concept to videos as well. Live Photos essentially contain videos, as we shall see next.

Live Photos: Under the Hood

Live Photos are made up of an image and a video strip containing the actions performed during the capture of the image. This brings a sense of being there in the moment while viewing them.

Under the hood, live photos consist of a key photo paired with the video resource asset file. We can change the key photo by selecting any of the video frames from the preview-editing mode in the Photos application.