There Is a Range of Tasks Your Face Recognition App Can Be Designed to Perform If You Use the Right Face Recognition Methods. We’d Like to Share Our Experience

The Facial Recognition technology has been one of those, gaining ground fastest over recent years, and one that is still obviously pretty far from its heyday. Invented to virtually enhance, or rather, extend one of the 6 human senses, it is finding new, often, critically important (for example, public security-related) uses, becoming more wide-spread globally by the day.

According to Marketsandmarkets.com, the total worth of the global Face Recognition software market is estimated to have constituted some USD 3.2 billion in 2019. It is also predicted to reach USD 7 billion in 2024, thus showing a 16.6% growth. This can only mean that while giving those better equipped with Face Recognition apps an edge and an additional means of control, the rapidly developing Facial Recognition technology is also becoming a competitive factor for businesses in various industry sectors. Simultaneously, the technology is also creating more challenges and opportunities for IT companies, engaged in developing Face Recognition applications.

In this article, we’ll dwell on the challenges we’ve encountered and dealt with while developing our Face Recognition system. We’ll also tell you about the various opportunities, associated with the Facial Recognition technology, and share several interesting insights we’ve gleaned. These insight can be of interest both to those looking to develop a Face Recognition app and to Face Recognition app developers.

So, get buckled up and here is our story…

The Task at Hand: Reliably Securing the Office Premises of a Mid-Sized Eastern-European IT Company

Our AI team was created when the rise of the AI technologies we have all witnessed over recent years had just begun. Face Recognition became a natural choice for us here: it was relatively easy to foresee many of the technology’s numerous forthcoming applications.

Although we’ve been involved in Face Recognition app development for close to two years now, our first internal project, focusing on the development of a Face Recognition system to secure access to our company premises, has become a major experimentation ground for our AI development team. This project is still helping us choose and test the recognition approaches that are more optimal for each of the various types of objects (including a human body, physical objects or facial mimics).

Our spacious office is located centrally in Downtown Kharkiv. Due to this, ensuring its security was not our primary goal. However, from the very outset, we wanted to acquire a consummate grasp of the Facial Recognition technology and opted for a full-blown, feature-rich and high-performance solution that would be on the cutting edge of technology. More specifically, the objectives of our Face Recognition app development endeavor can be summed up as follows:

Finding out more precisely how the different Face Recognition algorithms work. Cutting down on our security-related costs. Obtaining office attendance stats. Entering a budding business niche and establishing a business presence in it.

Our ultimate goal was developing a Face Recognition system that would not only identify faces unfailingly and give an alarm if a face is not recognized, but would also be integrated with a card system, block access to the company premises under certain circumstances, and, finally, serve as the only means of security in the longer run (statistically, a well-developed Face Recognition app proves to be, at least, 10% more “observant” than a human guard, who can easily get diverted or become less attentive because of weariness).

Although, we, currently, still have all the required physical security in place, we’ve envisaged two options here: a system that is intended to reinforce our physical security, and one that is capable of reliably securing the premises all on its own.

We’ve extensively used both Face Recognition (SVM, and, more specifically, the version with the multi-classification with probability) and Face Detection (HOG from Dlib) for achieving the objectives under the above goal. It took us quite a spell to gauge the efficiency of these tools and techniques in solving the different tasks that needed to be solved. Those interested in the technical details of our quest for the right solution can see some of us sifting through the mullock and whatever insights we’ve managed to dig up in the following table. Those, who are not interested in the technical details, can, probably, skip this one right to the conclusion we’ve made, presented here below.

Face Recognition and Detection Methods Method Usage Area Advantages Disadvantages Languages & Libraries Used Haar Classifier Face Detection (FD) OpenCV provides pre-trained FD and LCP models that can be trained on one’s own data. The model has to be trained on a lot of examples, both negative and positive. Unlike with OpenCV, there are no pre-trained models provided by Dlib.

While Dlib makes it possible to train a model, this is a labor-intensive and time-consuming process. OpenCV

Dlib

C++

Python

Java LBP (Local Binary Patterns) Face Detection OpenCV provides a pre-trained FD model that can be trained on one’s own data.

The required computations are simple and quick to perform. A relatively short training period is required. The model has to be trained on a lot of examples, both negative and positive. OpenCV

C++

Python

Java HOG (Histogram of Oriented Gradients). Face Detection Dlib provides pre-trained models.

The HOG from Dlib seems to be more accurate that Haar Classifier and LBP and works in the 5 following modes:

HOG front, HOG left, HOG right, HOG rotate left, and HOG rotate right.

The HOG is faster than the OpenCV CNN and Dlib DNN. The model has to be trained on a lot of examples, both negative and positive. No pre-trained model is provided by OpenCV.

HOG is less accurate than OpenCV CNN and Dlib DNN. OpenCV

Dlib

C++

Python

Java Deep Neural Network (DNN) Face Detection, Face Recognition and more.

Each of the different DNN models allows one to perform one of the required operations: Face Detection, Face Recognition (the detection of 5 or 68 landmarks, depending on the model variation) and Facial Characteristics’ Detection. Dlib provides several powerful models for Face Detection and the performance of the invariant calculations for a face to be recognized.

The Face Detection results are more accurate than those, produced by the other methods (Haar Classifier, HOG, LBP). The computations are complex and slow.

For real-time, a video card with CUDA is required.

Training one’s own model requires a very powerful computer. Dlib

C++

Python

Java

Classification Methods Method Usage Area Advantages Disadvantages Languages & Libraries Used Support Vector Machine (SVMs, also support vector networks). A Classification and Regression analysis method. The method yields good classification results when applied to large data volumes.

Re-training is fast (32 000 faces has taken up to 1 minute to re-train).

In our case, the SVM has also proven to be more accurate than Random Forest. OpenCV contains the version of Libsvm without probability.

Dlib supports only the probability version of the binary classification model and the non-probability version of the svm_multiclass model.

Libsvm’s latest version supports probability.

No universal parameter selection methods are provided and the selection process can become quite complicated. Libsvm

OpenCV

Dlib

C++

Python

Java Random Forest A Classification and Regression analysis method. It is a supervised learning algorithm that creates ”Random Forest” (or a collection) of Decision Trees. Supports multi-classification with probability. In our specific case, Random Forest has failed to fully train itself on our learning data set (the result was 97%, while the SVM produced a 100% positive result). OpenCV

Python

So… Having done quite a bit of R&D, we, eventually, arrived at the conclusion that the following combination of tools and techniques would be more optimal for reaching the objectives we’ve outlined above (building a robust Face Recognition app to take full control over office premises in a multi-story building, and creating an alternative to physical security):

HOG – preferable for face detection purposes.

Dlib NN –optimal for obtaining a face descriptor (the unique identifier of a face that consists of 128 facial features or points).

SVM – preferable for comparing the obtained visual information with the face descriptors, stored in the application database.

However, the efficiency of a Face Recognition solution does not hinge solely upon the right choice of the more optimal one in a combination of technologies when solving a specific constituent task. During our system’s implementation, we discovered several strategically important subtleties that must necessarily be duly taken care of by Face Recognition apps’ developers. What are they?

Your Face Recognition app must be able to identify not only faces but also objects. This would be especially topical for major companies with a very large number of employees, and those businesses that have heightened security requirements due to the nature and specifics of their production process. For example, someone carrying a chain is unlikely to be welcome at a stadium during a soccer match, and someone carrying a rifle would, most probably, not be so welcome at an airport. A Face Recognition app can be trained to identify such objects and, in theory, can sometimes even be used to prevent pilferage at production facilities. Most often, a person’s head does not remain immobile as they walk. The quality with which a Face Recognition application can identify a face depends on the camera’s distance to this face, the position of this face and its movements. The bag of tricks that can help you solve the distance-to-the-face problem includes: Positioning the cameras of one’s Facial Recognition solution so, that their resolution will be optimal for the distance to the future object (s). Certainly, there are cameras that are expensive enough to be able to ensure sufficiently good image quality, but the above approach is a great deal more preferable unless you want your bill to skyrocket.

Installing a sufficient number of light sources in the right places.

Using one’s Face Recognition app to enhance an image received by it so, that this image will fit a certain pattern, suitable for recognition purposes.

Using one’s Face Recognition app in such a way that it will take into account how much an incoming image has had to be adjusted. For instance, if, as required by the recognition algorithm, a 30x30 face image has had to be zoomed up to 150x150, the application should assign a lesser score to this image. As far as the face movement problem is concerned, one can solve this problem through the implementation of an object-tracking capability. In this case, the app will be tracking the face for as long as it is captured by the camera. Often, visitors enter premises in a group, all at once. This can well throw a not so well-thought-out Face Recognition system for a loop. Can this be prevented? Positive. It will be necessary to implement an object-tracking capability and the ability to follow multiple photo sets simultaneously. Face-turn angle matters a whole lot too, and it can, often, be rather difficult to handle. As mentioned previously, ideally, it takes 128 different facial features or points to immediately identify a human face. The further from the front-face view the turn-angle of this face is, the harder it is for your Face Recognition app to identify the person correctly. We’ve done a whale of research by experimenting with a variety of turn-angles and images that featured various facial parts to arrive at the conclusion that no single pure approach works well enough. That is why, training your neural network solely on your employees' facial parts is not going to be of assistance. You will still need something closer to the front-face angle and that angle occurs relatively seldom if your camera isn’t positioned in a well-suited spot. That is why one should always try to position his cameras optimally in order to be able to capture as many high-quality front face-angle images, as possible. Moreover, a lots more efficient approach to the face recognition process is the combined one we have opted for: we take the full set of the variously angled images, captured by the camera, determine how close to the front-face angle they are (we calculate the turn-face angle for each of the images), and measure the distance to the object for those of the images, that can be considered front-face ones. The images that have the more optimal front-face angle-distance ratio take precedence over the rest of the images. However, the rest of the images are used in the recognition process too. In other words, your Face Recognition app must be designed to use, at first, as many front face samples, as you have in your database, and then proceed by using other samples, preferably, those that are as close to the front face view, as possible. Ideally, your Face Recognition application should make a decision solely based on the obtained front face images and corresponding front face samples. Simultaneously, we’ve enhanced our app’s recognition ability through the use of the “best available” images, i.e. those of the obtained snapshots, in which the combination of the face-turn angle and distance to the object corresponds to the training photo set the most (and which are, consequently, the most suitable ones for recognition purposes). Each time a new person is added, a new cluster of no less than 50 images (including front face view ones, left head-turn images and right head-turn images) must be added to your Face Recognition app’s database. After this, the SVM must be re-trained to accommodate this information and be able to use it. If you are an IT entrepreneur, eyeing up the Facial Recognition business niche, or a software development outfit, just probing into this area of Artificial Intelligence with a view to making it one of your near-future specialties, it is imperative that the Administration Module of your first Face Recognition app has enough flexibility to cater to the varying needs of your different clients. This means that the parameters the system uses for identification purposes must be easily configurable in order to fit the environment and conditions in which it is utilized by an individual client. For instance, in our Face Recognition solution, one can configure most anything, including the Neural Network, used for recognition purposes. You can, simply, switch from the SVM to some other neural network that is more fitting for your needs. A Face Recognition system that can potentially serve a broader range of purposes in the company where it will be installed must also be able to perform Body Recognition (and our Face Recognition solution is, incidentally, capable of performing this task). For example, if you run a supermarket chain, this ability will turn your system into a valuable Marketing Intelligence asset that will allow you to count visitors to your different departments. Moreover, if you don’t have too many premises to secure, this can become a great way to help you cover the cost of your Face Recognition app’s purchase and implementation. Talking of Marketing, there is just another potential ability of the Facial Recognition technology you or your clients may appreciate greatly if you or they are in the brick-and-mortar Retail business. A more advanced Face Recognition solution can allow you to perform what is, normally, referred to as Basket Analysis. Simply put, you can identify those products that sell well together, and place them accordingly on your merchandise shelves. Similarly, you can identify products that sell poorly when placed next to one another.