Source code available at https://github.com/sumantrajoshi/Face-recognition-using-deep-learning

Deep learning is revolutionizing the face recognition field since last few years. Thanks to the ever-increasing computational efficiency of GPU, in 2015, Google researchers published a paper on a new deep learning system called as FaceNet which achieved nearly 100-percent accuracy on a popular face recognition dataset named “Labeled Faces in the Wild(LFW)”. This paper literary paved the way for creating the next generation of facial recognition systems using machine learning.

Difference between face verification and face recognition

Face verification technique is used to verify whether the input image is of the legitimate claimant(1 to 1 mapping). However, face recognition is used to recognize whether the input face image is from a set of an authorized group of individuals (1 to M mapping).

Basic face recognizer using a pre-trained model

Difference between face recognition and face spoofing detection

As shown in the above screen grab of the application, I have only demonstrated basic face recognition, which can recognize the faces from digital photos, videos, and 3 D modeled faces also. Preventing attackers from using non-live faces for accessing the privileged system is called face spoofing detection, which is out of the scope of this application. Face spoofing detection can be achieved by various techniques such as liveness detection, contextual information, user interaction, and texture analysis.

Training a new deep convolution neural network (CNN) for face recognition is extremely difficult because of the complexity of the data set and the enormous requirement of computing power. Hence, I will be creating a basic facial recognition application using a pre-trained model which is open source.

Using a pre-trained neural network for face recognition

I will be using OpenFace, which is an open face deep learning facial recognition model. It’s based on the paper: FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. OpenFace is implemented using Python and Torch which allows the network to be executed on a CPU or with CUDA.

I wanted to implement the application in Keras(using Tensorflow backend), and to do that I have used a pre-trained model known as Keras-OpenFace by Victor Sy Wangwhich is an open source Keras implementation of the OpenFace.

Below is a small video of the real-time face recognition using laptop’s webcam that has been made using Keras-OpenFace model and some elementary concepts of OpenFace and FaceNet architecture.

Demo of the face recognition application

Now let’s understand what has been done to create above face recognition application.

Basically, I have used Keras-OpenFace pre-trained model for feeding the face images to generate 128 dimensions embedding vector. I assume that the readers have knowledge of deep learning and how Convolutional Neural Network (CNN) works.

Challenge in using CNN as a face recognition classifier

Applying CNN classifier to face recognition is not a great idea because, as a group of people (like employees of a company) increases or decreases, one has to change the Softmax classifier function. There are many different ways by which we can create a face recognition system, and in this application, I have used facial recognition using one-shot learning by a deep neural network.

What is one shot learning?

In one shot learning, only one image per person is stored in the database, which is passed through the neural network to generate an embedding vector. This embedding vector is compared with the vector generated for the person who has to be recognized. If there exist similarities between the two vectors then the system recognizes that person, else that person is not there in the database. This can be understood by the below picture.

One shot learning

How does a neural network learn (triplet loss function) face recognition?

Here we are using OpenFace pre-trained model for facial recognition. Without going into much details on how this neural network identifies two same faces, let’s say that the model is trained on a large set of face data with a loss function which groups identical images together and separate non-identical faces away from each other. Its also known as a triplet loss function.

Triplet loss function

Training the neural network for face recognition is not a “one-shot learning” task

Its also important to know that while training the neural network, we require multiple images of the same person for optimizing the triplet loss function. Hence, training a neural network for face recognition is not a one-shot learning task.

Understanding the basic design

Let’s visualize how to create basic facial recognition application using a pre-trained deep neural network. Training of the network has already been done as shown in the below diagram.

OpenFace’s training module

I am using this pre-trained network to compare the embedding vectors of the images stored in the file system with the embedding vector of the image captured from the webcam. This can be explained by the below diagram.

Facial recognition using one-shot learning

As per the above diagram, if the face captured by webcam has similar 128-bit embedding vector stored in the database then it can recognize the person. All the images stored in the file system are converted to a dictionary with names as key and embedding vectors as value.

When processing an image, face detection is done to find bounding boxes around faces. I have used OpenCV’s Haar feature-based Cascade Classifiers for extracting the face area. Before passing the image to the neural network, it is resized to 96x96 pixels as the deep neural network expects the fixed (96x96) input image size.

Calculating the similarity between two images

To compare two images for similarity, we compute the distance between their embeddings. This can be done by either calculating Euclidean(L2)distance or Cosine distance between the 128-dimensional vectors. If the distance is less than a threshold (which is a hyperparameter), then the faces in the two pictures are of the same person, if not, they are two different persons.

I have got pretty decent results by just comparing the Euclidean distance to recognize a face. However, if one wants to scale the application to a production system then, one should consider applying Affine transformations also before feeding the image to the neural network.

What is an Affine transformation?

Pose and illumination have been a long-standing challenge in face recognition. A potential bottleneck in the face recognition system is that the faces could be looking in different directions, which can result in generating a different embedding vector each time. We can address this issue by applying an Affine transformation to the image as shown in the below diagram.

Affine transformation to normalize the face

An affine transformation rotates the face and makes the position of the eyes, nose, and mouth for each face consistent. Performing an affine transformation ensures the position eyes, mouth, and nose to be fixed, which aid in finding the similarity between two images while applying one-shot learning on face recognition.

Cheers, and do check out the simple yet scalable application on my GitHub page. Would love to get some suggestions, improvements, and optimizations in and around the application.

References: