face-api.js — JavaScript API for Face Recognition in the Browser with tensorflow.js

A JavaScript API for Face Detection, Face Recognition and Face Landmark Detection

I am excited to say, that it is finally possible to run face recognition in the browser! With this article I am introducing face-api.js, a javascript module, built on top of tensorflow.js core, which implements several CNNs (Convolutional Neural Networks) to solve face detection, face recognition and face landmark detection, optimized for the web and for mobile devices.

As always we will look into a simple code example, that will get you started immediately with the package in just a few lines of code. If you want to play around with some examples first, check out the demo page! But don’t forget to come back to read the article. ;)

Let’s dive into it!

Note, the project is under active development. Make sure to also check out my latest articles to keep updated about the latest features of face-api.js:

First face-recognition.js, now yet another package?

If you have read my other article about face recognition with nodejs: Node.js + face-recognition.js : Simple and Robust Face Recognition using Deep Learning, you may be aware that some time ago, I assembled a similar package, e.g. face-recognition.js, bringing face recognition to nodejs.

At first, I did not expect there being such a high demand for a face recognition package in the javascript community. For a lot of people face-recognition.js seems to be a decent free to use and open source alternative to paid services for face recognition, as provided by Microsoft or Amazon for example. But I also have been asked a lot, whether it is possible to run the full face recognition pipeline entirely in the browser.

Finally it is, thanks to tensorflow.js! I managed to implement partially similar tools using tfjs-core, which will get you almost the same results as face-recognition.js, but in the browser! Furthmore, face-api.js provides models, which are optimized for the web and for running on resources mobile devices. And the best part about it is, there is no need to set up any external dependencies, it works straight out of the box. As a bonus it is GPU accelerated, running operations on a WebGL backend.

This was reason enough to convince me, that the javascript community needs such a package for the browser! I’ll leave it up to your imagination, what variety of applications you can build with this. ;)

How to solve Face Recognition with Deep Learning

If you are that type of guy (or girl), who is looking to simply get started as quickly as possible, you can skip this section and jump straight into the code. But to get a better understanding about the approach used in face-api.js to implement face recognition, I would highly recommend you to follow along, since I get asked about this quite often.

To keep it simple, what we actually want to achieve, is to identify a person given an image of his / her face, e.g. the input image. The way we do that, is to provide one (or more) image(s) for each person we want to recognize, labeled with the persons name, e.g. the reference data. Now we compare the input image to the reference data and find the most similar reference image. If both images are similar enough we output the person’s name, otherwise we output ‘unknown’.

Sounds like a plan! However, two problems remain. Firstly, what if we have an image showing multiple persons and we want to recognize all of them? And secondly, we need to be able to obtain such kind of a similarity metric for two face images in order to compare them…

Face Detection

The answer to the first problem is face detection. Simply put, we will first locate all the faces in the input image. Face-api.js implements multiple face detectors for different usecases.

The most accurate face detector is a SSD (Single Shot Multibox Detector), which is basically a CNN based on MobileNet V1, with some additional box prediction layers stacked on top of the network.

Furthmore, face-api.js implements an optimized Tiny Face Detector, basically an even tinier version of Tiny Yolo v2 utilizing depthwise seperable convolutions instead of regular convolutions, which is a much faster, but slightly less accurate face detector compared to SSD MobileNet V1.

Lastly, there is also a MTCNN (Multi-task Cascaded Convolutional Neural Network) implementation, which is mostly around nowadays for experimental purposes however.

The networks return the bounding boxes of each face, with their corresponding scores, e.g. the probability of each bounding box showing a face. The scores are used to filter the bounding boxes, as it might be that an image does not contain any face at all. Note, that face detection should also be performed even if there is only one person in order to retrieve the bounding box.

Face Landmark Detection and Face Alignment

First problem solved! However, I want to point out that we want to align the bounding boxes, such that we can extract the images centered at the face for each box before passing them to the face recognition network, as this will make face recognition much more accurate!

For that purpose face-api.js implements a simple CNN, which returns the 68 point face landmarks of a given face image:

From the landmark positions, the bounding box can be centered on the face. In the following you can see the result of face detection (left) compared to the aligned face image (right):