Introduction¶ OpenCV (Open Source Computer Vision) is a popular computer vision library started by Intel in 1999. The cross-platform library sets its focus on real-time image processing and includes patent-free implementations of the latest computer vision algorithms. In 2008 Willow Garage took over support and OpenCV 2.3.1 now comes with a programming interface to C, C++, Python and Android. OpenCV is released under a BSD license so it is used in academic projects and commercial products alike. OpenCV 2.4 now comes with the very new FaceRecognizer class for face recognition, so you can start experimenting with face recognition right away. This document is the guide I’ve wished for, when I was working myself into face recognition. It shows you how to perform face recognition with FaceRecognizer in OpenCV (with full source code listings) and gives you an introduction into the algorithms behind. I’ll also show how to create the visualizations you can find in many publications, because a lot of people asked for. The currently available algorithms are: Eigenfaces (see createEigenFaceRecognizer() )

) Fisherfaces (see createFisherFaceRecognizer() )

) Local Binary Patterns Histograms (see createLBPHFaceRecognizer() ) You don’t need to copy and paste the source code examples from this page, because they are available in the src folder coming with this documentation. If you have built OpenCV with the samples turned on, chances are good you have them compiled already! Although it might be interesting for very advanced users, I’ve decided to leave the implementation details out as I am afraid they confuse new users. All code in this document is released under the BSD license, so feel free to use it for your projects.

Face Recognition¶ Face recognition is an easy task for humans. Experiments in [Tu06] have shown, that even one to three day old babies are able to distinguish between known faces. So how hard could it be for a computer? It turns out we know little about human recognition to date. Are inner features (eyes, nose, mouth) or outer features (head shape, hairline) used for a successful face recognition? How do we analyze an image and how does the brain encode it? It was shown by David Hubel and Torsten Wiesel, that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Since we don’t see the world as scattered pieces, our visual cortex must somehow combine the different sources of information into useful patterns. Automatic face recognition is all about extracting those meaningful features from an image, putting them into a useful representation and performing some kind of classification on them. Face recognition based on the geometric features of a face is probably the most intuitive approach to face recognition. One of the first automated face recognition systems was described in [Kanade73]: marker points (position of eyes, ears, nose, ...) were used to build a feature vector (distance between the points, angle between them, ...). The recognition was performed by calculating the euclidean distance between feature vectors of a probe and reference image. Such a method is robust against changes in illumination by its nature, but has a huge drawback: the accurate registration of the marker points is complicated, even with state of the art algorithms. Some of the latest work on geometric face recognition was carried out in [Bru92]. A 22-dimensional feature vector was used and experiments on large datasets have shown, that geometrical features alone my not carry enough information for face recognition. The Eigenfaces method described in [TP91] took a holistic approach to face recognition: A facial image is a point from a high-dimensional image space and a lower-dimensional representation is found, where classification becomes easy. The lower-dimensional subspace is found with Principal Component Analysis, which identifies the axes with maximum variance. While this kind of transformation is optimal from a reconstruction standpoint, it doesn’t take any class labels into account. Imagine a situation where the variance is generated from external sources, let it be light. The axes with maximum variance do not necessarily contain any discriminative information at all, hence a classification becomes impossible. So a class-specific projection with a Linear Discriminant Analysis was applied to face recognition in [BHK97]. The basic idea is to minimize the variance within a class, while maximizing the variance between the classes at the same time. Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of the input data only local regions of an image are described, the extracted features are (hopefully) more robust against partial occlusion, illumation and small sample size. Algorithms used for a local feature extraction are Gabor Wavelets ([Wiskott97]), Discrete Cosinus Transform ([Messer06]) and Local Binary Patterns ([AHP04]). It’s still an open research question what’s the best way to preserve spatial information when applying a local feature extraction, because spatial information is potentially useful information.

Face Database¶ Let’s get some data to experiment with first. I don’t want to do a toy example here. We are doing face recognition, so you’ll need some face images! You can either create your own dataset or start with one of the available face databases, http://face-rec.org/databases/ gives you an up-to-date overview. Three interesting databases are (parts of the description are quoted from http://face-rec.org): AT&T Facedatabase The AT&T Facedatabase, sometimes also referred to as ORL Database of Faces, contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).

Yale Facedatabase A, also known as Yalefaces. The AT&T Facedatabase is good for initial tests, but it’s a fairly easy database. The Eigenfaces method already has a 97% recognition rate on it, so you won’t see any great improvements with other algorithms. The Yale Facedatabase A (also known as Yalefaces) is a more appropriate dataset for initial experiments, because the recognition problem is harder. The database consists of 15 people (14 male, 1 female) each with 11 grayscale images sized pixel. There are changes in the light conditions (center light, left light, right light), facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses). The original images are not cropped and aligned. Please look into the Appendix for a Python script, that does the job for you.

Extended Yale Facedatabase B The Extended Yale Facedatabase B contains 2414 images of 38 different people in its cropped version. The focus of this database is set on extracting features that are robust to illumination, the images have almost no variation in emotion/occlusion/... . I personally think, that this dataset is too large for the experiments I perform in this document. You better use the AT&T Facedatabase for intial testing. A first version of the Yale Facedatabase B was used in [BHK97] to see how the Eigenfaces and Fisherfaces method perform under heavy illumination changes. [Lee05] used the same setup to take 16128 images of 28 people. The Extended Yale Facedatabase B is the merge of the two databases, which is now known as Extended Yalefacedatabase B. Preparing the data¶ Once we have acquired some data, we’ll need to read it in our program. In the demo applications I have decided to read the images from a very simple CSV file. Why? Because it’s the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a filename followed by a ; followed by the label (as integer number), making up a line like this: /path/to/image.ext;0 Let’s dissect the line. /path/to/image.ext is the path to an image, probably something like this if you are in Windows: C:/faces/person0/image0.jpg . Then there is the separator ; and finally we assign the label 0 to the image. Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label. Download the AT&T Facedatabase from AT&T Facedatabase and the corresponding CSV file from at.txt, which looks like this (file is without ... of course): ./at/s1/1.pgm;0 ./at/s1/2.pgm;0 ... ./at/s2/1.pgm;1 ./at/s2/2.pgm;1 ... ./at/s40/1.pgm;39 ./at/s40/2.pgm;39 Imagine I have extracted the files to D:/data/at and have downloaded the CSV file to D:/data/at.txt . Then you would simply need to Search & Replace ./ with D:/data/ . You can do that in an editor of your choice, every sufficiently advanced editor can do this. Once you have a CSV file with valid filenames and labels, you can run any of the demos by passing the path to the CSV file as parameter: facerec_demo.exe D:/data/at.txt Creating the CSV File¶ You don’t really want to create the CSV file by hand. I have prepared you a little Python script create_csv.py (you find it at src/create_csv.py coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this ( /basepath/<subject>/<image.ext> ): philipp@mango:~/facerec/data/at$ tree . |-- s1 | |-- 1.pgm | |-- ... | |-- 10.pgm |-- s2 | |-- 1.pgm | |-- ... | |-- 10.pgm ... |-- s40 | |-- 1.pgm | |-- ... | |-- 10.pgm Then simply call create_csv.py with the path to the folder, just like this and you could save the output: philipp@mango:~/facerec/data$ python create_csv.py at/s13/2.pgm;0 at/s13/7.pgm;0 at/s13/6.pgm;0 at/s13/9.pgm;0 at/s13/5.pgm;0 at/s13/3.pgm;0 at/s13/4.pgm;0 at/s13/10.pgm;0 at/s13/8.pgm;0 at/s13/1.pgm;0 at/s17/2.pgm;1 at/s17/7.pgm;1 at/s17/6.pgm;1 at/s17/9.pgm;1 at/s17/5.pgm;1 at/s17/3.pgm;1 [...] Please see the Appendix for additional informations.

Eigenfaces¶ The problem with the image representation we are given is its high dimensionality. Two-dimensional grayscale images span a -dimensional vector space, so an image with pixels lies in a -dimensional image space already. The question is: Are all dimensions equally useful for us? We can only make a decision if there’s any variance in data, so what we are looking for are the components that account for most of the information. The Principal Component Analysis (PCA) was independently proposed by Karl Pearson (1901) and Harold Hotelling (1933) to turn a set of possibly correlated variables into a smaller set of uncorrelated variables. The idea is, that a high-dimensional dataset is often described by correlated variables and therefore only a few meaningful dimensions account for most of the information. The PCA method finds the directions with the greatest variance in the data, called principal components. Algorithmic Description¶ Let be a random vector with observations . Compute the mean Compute the the Covariance Matrix S Compute the eigenvalues and eigenvectors of Order the eigenvectors descending by their eigenvalue. The principal components are the eigenvectors corresponding to the largest eigenvalues. The principal components of the observed vector are then given by: where . The reconstruction from the PCA basis is given by: where . The Eigenfaces method then performs face recognition by: Projecting all training samples into the PCA subspace.

Projecting the query image into the PCA subspace.

Finding the nearest neighbor between the projected training images and the projected query image. Still there’s one problem left to solve. Imagine we are given images sized pixel. The Principal Component Analysis solves the covariance matrix , where in our example. You would end up with a matrix, roughly . Solving this problem isn’t feasible, so we’ll need to apply a trick. From your linear algebra lessons you know that a matrix with can only have non-zero eigenvalues. So it’s possible to take the eigenvalue decomposition of size instead: and get the original eigenvectors of with a left multiplication of the data matrix: The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they need to be normalized to unit length. I don’t want to turn this into a publication, so please look into [Duda01] for the derivation and proof of the equations. Eigenfaces in OpenCV¶ For the first source code example, I’ll go through it with you. I am first giving you the whole source code listing, and after this we’ll look at the most important lines in detail. Please note: every source code listing is commented in detail, so you should have no problems following it. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 /* * Copyright (c) 2011. Philipp Wagner <bytefish[at]gmx[dot]de>. * Released to public domain under terms of the BSD Simplified license. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of the organization nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * See <http://www.opensource.org/licenses/bsd-license> */ #include "opencv2/core/core.hpp" #include "opencv2/contrib/contrib.hpp" #include "opencv2/highgui/highgui.hpp" #include <iostream> #include <fstream> #include <sstream> using namespace cv ; using namespace std ; static Mat norm_0_255 ( InputArray _src ) { Mat src = _src . getMat (); // Create and return normalized image: Mat dst ; switch ( src . channels ()) { case 1 : cv :: normalize ( _src , dst , 0 , 255 , NORM_MINMAX , CV_8UC1 ); break ; case 3 : cv :: normalize ( _src , dst , 0 , 255 , NORM_MINMAX , CV_8UC3 ); break ; default : src . copyTo ( dst ); break ; } return dst ; } static void read_csv ( const string & filename , vector < Mat >& images , vector < int >& labels , char separator = ';' ) { std :: ifstream file ( filename . c_str (), ifstream :: in ); if ( ! file ) { string error_message = "No valid input file was given, please check the given filename." ; CV_Error ( CV_StsBadArg , error_message ); } string line , path , classlabel ; while ( getline ( file , line )) { stringstream liness ( line ); getline ( liness , path , separator ); getline ( liness , classlabel ); if ( ! path . empty () && ! classlabel . empty ()) { images . push_back ( imread ( path , 0 )); labels . push_back ( atoi ( classlabel . c_str ())); } } } int main ( int argc , const char * argv []) { // Check for valid command line arguments, print usage // if no arguments were given. if ( argc < 2 ) { cout << "usage: " << argv [ 0 ] << " <csv.ext> <output_folder> " << endl ; exit ( 1 ); } string output_folder = "." ; if ( argc == 3 ) { output_folder = string ( argv [ 2 ]); } // Get the path to your CSV. string fn_csv = string ( argv [ 1 ]); // These vectors hold the images and corresponding labels. vector < Mat > images ; vector < int > labels ; // Read in the data. This can fail if no valid // input filename is given. try { read_csv ( fn_csv , images , labels ); } catch ( cv :: Exception & e ) { cerr << "Error opening file \" " << fn_csv << " \" . Reason: " << e . msg << endl ; // nothing more we can do exit ( 1 ); } // Quit if there are not enough images for this demo. if ( images . size () <= 1 ) { string error_message = "This demo needs at least 2 images to work. Please add more images to your data set!" ; CV_Error ( CV_StsError , error_message ); } // Get the height from the first image. We'll need this // later in code to reshape the images to their original // size: int height = images [ 0 ]. rows ; // The following lines simply get the last images from // your dataset and remove it from the vector. This is // done, so that the training data (which we learn the // cv::FaceRecognizer on) and the test data we test // the model with, do not overlap. Mat testSample = images [ images . size () - 1 ]; int testLabel = labels [ labels . size () - 1 ]; images . pop_back (); labels . pop_back (); // The following lines create an Eigenfaces model for // face recognition and train it with the images and // labels read from the given CSV file. // This here is a full PCA, if you just want to keep // 10 principal components (read Eigenfaces), then call // the factory method like this: // // cv::createEigenFaceRecognizer(10); // // If you want to create a FaceRecognizer with a // confidence threshold (e.g. 123.0), call it with: // // cv::createEigenFaceRecognizer(10, 123.0); // // If you want to use _all_ Eigenfaces and have a threshold, // then call the method like this: // // cv::createEigenFaceRecognizer(0, 123.0); // Ptr < FaceRecognizer > model = createEigenFaceRecognizer (); model -> train ( images , labels ); // The following line predicts the label of a given // test image: int predictedLabel = model -> predict ( testSample ); // // To get the confidence of a prediction call the model with: // // int predictedLabel = -1; // double confidence = 0.0; // model->predict(testSample, predictedLabel, confidence); // string result_message = format ( "Predicted class = %d / Actual class = %d." , predictedLabel , testLabel ); cout << result_message << endl ; // Here is how to get the eigenvalues of this Eigenfaces model: Mat eigenvalues = model -> getMat ( "eigenvalues" ); // And we can do the same to display the Eigenvectors (read Eigenfaces): Mat W = model -> getMat ( "eigenvectors" ); // Get the sample mean from the training data Mat mean = model -> getMat ( "mean" ); // Display or save: if ( argc == 2 ) { imshow ( "mean" , norm_0_255 ( mean . reshape ( 1 , images [ 0 ]. rows ))); } else { imwrite ( format ( "%s/mean.png" , output_folder . c_str ()), norm_0_255 ( mean . reshape ( 1 , images [ 0 ]. rows ))); } // Display or save the Eigenfaces: for ( int i = 0 ; i < min ( 10 , W . cols ); i ++ ) { string msg = format ( "Eigenvalue #%d = %.5f" , i , eigenvalues . at < double > ( i )); cout << msg << endl ; // get eigenvector #i Mat ev = W . col ( i ). clone (); // Reshape to original size & normalize to [0...255] for imshow. Mat grayscale = norm_0_255 ( ev . reshape ( 1 , height )); // Show the image & apply a Jet colormap for better sensing. Mat cgrayscale ; applyColorMap ( grayscale , cgrayscale , COLORMAP_JET ); // Display or save: if ( argc == 2 ) { imshow ( format ( "eigenface_%d" , i ), cgrayscale ); } else { imwrite ( format ( "%s/eigenface_%d.png" , output_folder . c_str (), i ), norm_0_255 ( cgrayscale )); } } // Display or save the image reconstruction at some predefined steps: for ( int num_components = min ( W . cols , 10 ); num_components < min ( W . cols , 300 ); num_components += 15 ) { // slice the eigenvectors from the model Mat evs = Mat ( W , Range :: all (), Range ( 0 , num_components )); Mat projection = subspaceProject ( evs , mean , images [ 0 ]. reshape ( 1 , 1 )); Mat reconstruction = subspaceReconstruct ( evs , mean , projection ); // Normalize the result: reconstruction = norm_0_255 ( reconstruction . reshape ( 1 , images [ 0 ]. rows )); // Display or save: if ( argc == 2 ) { imshow ( format ( "eigenface_reconstruction_%d" , num_components ), reconstruction ); } else { imwrite ( format ( "%s/eigenface_reconstruction_%d.png" , output_folder . c_str (), num_components ), reconstruction ); } } // Display if we are not writing to an output folder: if ( argc == 2 ) { waitKey ( 0 ); } return 0 ; } The source code for this demo application is also available in the src folder coming with this documentation: src/facerec_eigenfaces.cpp I’ve used the jet colormap, so you can see how the grayscale values are distributed within the specific Eigenfaces. You can see, that the Eigenfaces do not only encode facial features, but also the illumination in the images (see the left light in Eigenface #4, right light in Eigenfaces #5): We’ve already seen, that we can reconstruct a face from its lower dimensional approximation. So let’s see how many Eigenfaces are needed for a good reconstruction. I’ll do a subplot with Eigenfaces: // Display or save the image reconstruction at some predefined steps: for ( int num_components = 10 ; num_components < 300 ; num_components += 15 ) { // slice the eigenvectors from the model Mat evs = Mat ( W , Range :: all (), Range ( 0 , num_components )); Mat projection = subspaceProject ( evs , mean , images [ 0 ]. reshape ( 1 , 1 )); Mat reconstruction = subspaceReconstruct ( evs , mean , projection ); // Normalize the result: reconstruction = norm_0_255 ( reconstruction . reshape ( 1 , images [ 0 ]. rows )); // Display or save: if ( argc == 2 ) { imshow ( format ( "eigenface_reconstruction_%d" , num_components ), reconstruction ); } else { imwrite ( format ( "%s/eigenface_reconstruction_%d.png" , output_folder . c_str (), num_components ), reconstruction ); } } 10 Eigenvectors are obviously not sufficient for a good image reconstruction, 50 Eigenvectors may already be sufficient to encode important facial features. You’ll get a good reconstruction with approximately 300 Eigenvectors for the AT&T Facedatabase. There are rule of thumbs how many Eigenfaces you should choose for a successful face recognition, but it heavily depends on the input data. [Zhao03] is the perfect point to start researching for this:

Fisherfaces¶ The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn’t consider any classes and so a lot of discriminative information may be lost when throwing components away. Imagine a situation where the variance in your data is generated by an external source, let it be the light. The components identified by a PCA do not necessarily contain any discriminative information at all, so the projected samples are smeared together and a classification becomes impossible (see http://www.bytefish.de/wiki/pca_lda_with_gnu_octave for an example). The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented by the great statistician Sir R. A. Fisher. He successfully used it for classifying flowers in his 1936 paper The use of multiple measurements in taxonomic problems [Fisher36]. In order to find the combination of features that separates best between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation. This was also recognized by Belhumeur, Hespanha and Kriegman and so they applied a Discriminant Analysis to face recognition in [BHK97]. Algorithmic Description¶ Let be a random vector with samples drawn from classes: The scatter matrices and S_{W} are calculated as: , where is the total mean: And is the mean of class : Fisher’s classic algorithm now looks for a projection , that maximizes the class separability criterion: Following [BHK97], a solution for this optimization problem is given by solving the General Eigenvalue Problem: There’s one problem left to solve: The rank of is at most , with samples and classes. In pattern recognition problems the number of samples is almost always samller than the dimension of the input data (the number of pixels), so the scatter matrix becomes singular (see [RJ91]). In [BHK97] this was solved by performing a Principal Component Analysis on the data and projecting the samples into the -dimensional space. A Linear Discriminant Analysis was then performed on the reduced data, because isn’t singular anymore. The optimization problem can then be rewritten as: The transformation matrix , that projects a sample into the -dimensional space is then given by: Fisherfaces in OpenCV¶ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 /* * Copyright (c) 2011. Philipp Wagner <bytefish[at]gmx[dot]de>. * Released to public domain under terms of the BSD Simplified license. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of the organization nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * See <http://www.opensource.org/licenses/bsd-license> */ #include "opencv2/core/core.hpp" #include "opencv2/contrib/contrib.hpp" #include "opencv2/highgui/highgui.hpp" #include <iostream> #include <fstream> #include <sstream> using namespace cv ; using namespace std ; static Mat norm_0_255 ( InputArray _src ) { Mat src = _src . getMat (); // Create and return normalized image: Mat dst ; switch ( src . channels ()) { case 1 : cv :: normalize ( _src , dst , 0 , 255 , NORM_MINMAX , CV_8UC1 ); break ; case 3 : cv :: normalize ( _src , dst , 0 , 255 , NORM_MINMAX , CV_8UC3 ); break ; default : src . copyTo ( dst ); break ; } return dst ; } static void read_csv ( const string & filename , vector < Mat >& images , vector < int >& labels , char separator = ';' ) { std :: ifstream file ( filename . c_str (), ifstream :: in ); if ( ! file ) { string error_message = "No valid input file was given, please check the given filename." ; CV_Error ( CV_StsBadArg , error_message ); } string line , path , classlabel ; while ( getline ( file , line )) { stringstream liness ( line ); getline ( liness , path , separator ); getline ( liness , classlabel ); if ( ! path . empty () && ! classlabel . empty ()) { images . push_back ( imread ( path , 0 )); labels . push_back ( atoi ( classlabel . c_str ())); } } } int main ( int argc , const char * argv []) { // Check for valid command line arguments, print usage // if no arguments were given. if ( argc < 2 ) { cout << "usage: " << argv [ 0 ] << " <csv.ext> <output_folder> " << endl ; exit ( 1 ); } string output_folder = "." ; if ( argc == 3 ) { output_folder = string ( argv [ 2 ]); } // Get the path to your CSV. string fn_csv = string ( argv [ 1 ]); // These vectors hold the images and corresponding labels. vector < Mat > images ; vector < int > labels ; // Read in the data. This can fail if no valid // input filename is given. try { read_csv ( fn_csv , images , labels ); } catch ( cv :: Exception & e ) { cerr << "Error opening file \" " << fn_csv << " \" . Reason: " << e . msg << endl ; // nothing more we can do exit ( 1 ); } // Quit if there are not enough images for this demo. if ( images . size () <= 1 ) { string error_message = "This demo needs at least 2 images to work. Please add more images to your data set!" ; CV_Error ( CV_StsError , error_message ); } // Get the height from the first image. We'll need this // later in code to reshape the images to their original // size: int height = images [ 0 ]. rows ; // The following lines simply get the last images from // your dataset and remove it from the vector. This is // done, so that the training data (which we learn the // cv::FaceRecognizer on) and the test data we test // the model with, do not overlap. Mat testSample = images [ images . size () - 1 ]; int testLabel = labels [ labels . size () - 1 ]; images . pop_back (); labels . pop_back (); // The following lines create an Fisherfaces model for // face recognition and train it with the images and // labels read from the given CSV file. // If you just want to keep 10 Fisherfaces, then call // the factory method like this: // // cv::createFisherFaceRecognizer(10); // // However it is not useful to discard Fisherfaces! Please // always try to use _all_ available Fisherfaces for // classification. // // If you want to create a FaceRecognizer with a // confidence threshold (e.g. 123.0) and use _all_ // Fisherfaces, then call it with: // // cv::createFisherFaceRecognizer(0, 123.0); // Ptr < FaceRecognizer > model = createFisherFaceRecognizer (); model -> train ( images , labels ); // The following line predicts the label of a given // test image: int predictedLabel = model -> predict ( testSample ); // // To get the confidence of a prediction call the model with: // // int predictedLabel = -1; // double confidence = 0.0; // model->predict(testSample, predictedLabel, confidence); // string result_message = format ( "Predicted class = %d / Actual class = %d." , predictedLabel , testLabel ); cout << result_message << endl ; // Here is how to get the eigenvalues of this Eigenfaces model: Mat eigenvalues = model -> getMat ( "eigenvalues" ); // And we can do the same to display the Eigenvectors (read Eigenfaces): Mat W = model -> getMat ( "eigenvectors" ); // Get the sample mean from the training data Mat mean = model -> getMat ( "mean" ); // Display or save: if ( argc == 2 ) { imshow ( "mean" , norm_0_255 ( mean . reshape ( 1 , images [ 0 ]. rows ))); } else { imwrite ( format ( "%s/mean.png" , output_folder . c_str ()), norm_0_255 ( mean . reshape ( 1 , images [ 0 ]. rows ))); } // Display or save the first, at most 16 Fisherfaces: for ( int i = 0 ; i < min ( 16 , W . cols ); i ++ ) { string msg = format ( "Eigenvalue #%d = %.5f" , i , eigenvalues . at < double > ( i )); cout << msg << endl ; // get eigenvector #i Mat ev = W . col ( i ). clone (); // Reshape to original size & normalize to [0...255] for imshow. Mat grayscale = norm_0_255 ( ev . reshape ( 1 , height )); // Show the image & apply a Bone colormap for better sensing. Mat cgrayscale ; applyColorMap ( grayscale , cgrayscale , COLORMAP_BONE ); // Display or save: if ( argc == 2 ) { imshow ( format ( "fisherface_%d" , i ), cgrayscale ); } else { imwrite ( format ( "%s/fisherface_%d.png" , output_folder . c_str (), i ), norm_0_255 ( cgrayscale )); } } // Display or save the image reconstruction at some predefined steps: for ( int num_component = 0 ; num_component < min ( 16 , W . cols ); num_component ++ ) { // Slice the Fisherface from the model: Mat ev = W . col ( num_component ); Mat projection = subspaceProject ( ev , mean , images [ 0 ]. reshape ( 1 , 1 )); Mat reconstruction = subspaceReconstruct ( ev , mean , projection ); // Normalize the result: reconstruction = norm_0_255 ( reconstruction . reshape ( 1 , images [ 0 ]. rows )); // Display or save: if ( argc == 2 ) { imshow ( format ( "fisherface_reconstruction_%d" , num_component ), reconstruction ); } else { imwrite ( format ( "%s/fisherface_reconstruction_%d.png" , output_folder . c_str (), num_component ), reconstruction ); } } // Display if we are not writing to an output folder: if ( argc == 2 ) { waitKey ( 0 ); } return 0 ; } The source code for this demo application is also available in the src folder coming with this documentation: src/facerec_fisherfaces.cpp For this example I am going to use the Yale Facedatabase A, just because the plots are nicer. Each Fisherface has the same length as an original image, thus it can be displayed as an image. The demo shows (or saves) the first, at most 16 Fisherfaces: The Fisherfaces method learns a class-specific transformation matrix, so the they do not capture illumination as obviously as the Eigenfaces method. The Discriminant Analysis instead finds the facial features to discriminate between the persons. It’s important to mention, that the performance of the Fisherfaces heavily depends on the input data as well. Practically said: if you learn the Fisherfaces for well-illuminated pictures only and you try to recognize faces in bad-illuminated scenes, then method is likely to find the wrong components (just because those features may not be predominant on bad illuminated images). This is somewhat logical, since the method had no chance to learn the illumination. The Fisherfaces allow a reconstruction of the projected image, just like the Eigenfaces did. But since we only identified the features to distinguish between subjects, you can’t expect a nice reconstruction of the original image. For the Fisherfaces method we’ll project the sample image onto each of the Fisherfaces instead. So you’ll have a nice visualization, which feature each of the Fisherfaces describes: // Display or save the image reconstruction at some predefined steps: for ( int num_component = 0 ; num_component < min ( 16 , W . cols ); num_component ++ ) { // Slice the Fisherface from the model: Mat ev = W . col ( num_component ); Mat projection = subspaceProject ( ev , mean , images [ 0 ]. reshape ( 1 , 1 )); Mat reconstruction = subspaceReconstruct ( ev , mean , projection ); // Normalize the result: reconstruction = norm_0_255 ( reconstruction . reshape ( 1 , images [ 0 ]. rows )); // Display or save: if ( argc == 2 ) { imshow ( format ( "fisherface_reconstruction_%d" , num_component ), reconstruction ); } else { imwrite ( format ( "%s/fisherface_reconstruction_%d.png" , output_folder . c_str (), num_component ), reconstruction ); } } The differences may be subtle for the human eyes, but you should be able to see some differences:

Local Binary Patterns Histograms¶ Eigenfaces and Fisherfaces take a somewhat holistic approach to face recognition. You treat your data as a vector somewhere in a high-dimensional image space. We all know high-dimensionality is bad, so a lower-dimensional subspace is identified, where (probably) useful information is preserved. The Eigenfaces approach maximizes the total scatter, which can lead to problems if the variance is generated by an external source, because components with a maximum variance over all classes aren’t necessarily useful for classification (see http://www.bytefish.de/wiki/pca_lda_with_gnu_octave). So to preserve some discriminative information we applied a Linear Discriminant Analysis and optimized as described in the Fisherfaces method. The Fisherfaces method worked great... at least for the constrained scenario we’ve assumed in our model. Now real life isn’t perfect. You simply can’t guarantee perfect light settings in your images or 10 different images of a person. So what if there’s only one image for each person? Our covariance estimates for the subspace may be horribly wrong, so will the recognition. Remember the Eigenfaces method had a 96% recognition rate on the AT&T Facedatabase? How many images do we actually need to get such useful estimates? Here are the Rank-1 recognition rates of the Eigenfaces and Fisherfaces method on the AT&T Facedatabase, which is a fairly easy image database: So in order to get good recognition rates you’ll need at least 8(+-1) images for each person and the Fisherfaces method doesn’t really help here. The above experiment is a 10-fold cross validated result carried out with the facerec framework at: https://github.com/bytefish/facerec. This is not a publication, so I won’t back these figures with a deep mathematical analysis. Please have a look into [KM01] for a detailed analysis of both methods, when it comes to small training datasets. So some research concentrated on extracting local features from images. The idea is to not look at the whole image as a high-dimensional vector, but describe only local features of an object. The features you extract this way will have a low-dimensionality implicitly. A fine idea! But you’ll soon observe the image representation we are given doesn’t only suffer from illumination variations. Think of things like scale, translation or rotation in images - your local description has to be at least a bit robust against those things. Just like SIFT , the Local Binary Patterns methodology has its roots in 2D texture analysis. The basic idea of Local Binary Patterns is to summarize the local structure in an image by comparing each pixel with its neighborhood. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater-equal its neighbor, then denote it with 1 and 0 if not. You’ll end up with a binary number for each pixel, just like 11001111. So with 8 surrounding pixels you’ll end up with 2^8 possible combinations, called Local Binary Patterns or sometimes referred to as LBP codes. The first LBP operator described in literature actually used a fixed 3 x 3 neighborhood just like this: Algorithmic Description¶ A more formal description of the LBP operator can be given as: , with as central pixel with intensity ; and being the intensity of the the neighbor pixel. is the sign function defined as: This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood in [AHP04]. The idea is to align an abritrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods: For a given Point the position of the neighbor can be calculated by: Where is the radius of the circle and is the number of sample points. The operator is an extension to the original LBP codes, so it’s sometimes called Extended LBP (also referred to as Circular LBP) . If a points coordinate on the circle doesn’t correspond to image coordinates, the point get’s interpolated. Computer science has a bunch of clever interpolation schemes, the OpenCV implementation does a bilinear interpolation: By definition the LBP operator is robust against monotonic gray scale transformations. We can easily verify this by looking at the LBP image of an artificially modified image (so you see what an LBP image looks like!): So what’s left to do is how to incorporate the spatial information in the face recognition model. The representation proposed by Ahonen et. al [AHP04] is to divide the LBP image into local regions and extract a histogram from each. The spatially enhanced feature vector is then obtained by concatenating the local histograms (not merging them). These histograms are called Local Binary Patterns Histograms. Local Binary Patterns Histograms in OpenCV¶ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 /* * Copyright (c) 2011. Philipp Wagner <bytefish[at]gmx[dot]de>. * Released to public domain under terms of the BSD Simplified license. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * Neither the name of the organization nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * See <http://www.opensource.org/licenses/bsd-license> */ #include "opencv2/core/core.hpp" #include "opencv2/contrib/contrib.hpp" #include "opencv2/highgui/highgui.hpp" #include <iostream> #include <fstream> #include <sstream> using namespace cv ; using namespace std ; static void read_csv ( const string & filename , vector < Mat >& images , vector < int >& labels , char separator = ';' ) { std :: ifstream file ( filename . c_str (), ifstream :: in ); if ( ! file ) { string error_message = "No valid input file was given, please check the given filename." ; CV_Error ( CV_StsBadArg , error_message ); } string line , path , classlabel ; while ( getline ( file , line )) { stringstream liness ( line ); getline ( liness , path , separator ); getline ( liness , classlabel ); if ( ! path . empty () && ! classlabel . empty ()) { images . push_back ( imread ( path , 0 )); labels . push_back ( atoi ( classlabel . c_str ())); } } } int main ( int argc , const char * argv []) { // Check for valid command line arguments, print usage // if no arguments were given. if ( argc != 2 ) { cout << "usage: " << argv [ 0 ] << " <csv.ext>" << endl ; exit ( 1 ); } // Get the path to your CSV. string fn_csv = string ( argv [ 1 ]); // These vectors hold the images and corresponding labels. vector < Mat > images ; vector < int > labels ; // Read in the data. This can fail if no valid // input filename is given. try { read_csv ( fn_csv , images , labels ); } catch ( cv :: Exception & e ) { cerr << "Error opening file \" " << fn_csv << " \" . Reason: " << e . msg << endl ; // nothing more we can do exit ( 1 ); } // Quit if there are not enough images for this demo. if ( images . size () <= 1 ) { string error_message = "This demo needs at least 2 images to work. Please add more images to your data set!" ; CV_Error ( CV_StsError , error_message ); } // Get the height from the first image. We'll need this // later in code to reshape the images to their original // size: int height = images [ 0 ]. rows ; // The following lines simply get the last images from // your dataset and remove it from the vector. This is // done, so that the training data (which we learn the // cv::FaceRecognizer on) and the test data we test // the model with, do not overlap. Mat testSample = images [ images . size () - 1 ]; int testLabel = labels [ labels . size () - 1 ]; images . pop_back (); labels . pop_back (); // The following lines create an LBPH model for // face recognition and train it with the images and // labels read from the given CSV file. // // The LBPHFaceRecognizer uses Extended Local Binary Patterns // (it's probably configurable with other operators at a later // point), and has the following default values // // radius = 1 // neighbors = 8 // grid_x = 8 // grid_y = 8 // // So if you want a LBPH FaceRecognizer using a radius of // 2 and 16 neighbors, call the factory method with: // // cv::createLBPHFaceRecognizer(2, 16); // // And if you want a threshold (e.g. 123.0) call it with its default values: // // cv::createLBPHFaceRecognizer(1,8,8,8,123.0) // Ptr < FaceRecognizer > model = createLBPHFaceRecognizer (); model -> train ( images , labels ); // The following line predicts the label of a given // test image: int predictedLabel = model -> predict ( testSample ); // // To get the confidence of a prediction call the model with: // // int predictedLabel = -1; // double confidence = 0.0; // model->predict(testSample, predictedLabel, confidence); // string result_message = format ( "Predicted class = %d / Actual class = %d." , predictedLabel , testLabel ); cout << result_message << endl ; // Sometimes you'll need to get/set internal model data, // which isn't exposed by the public cv::FaceRecognizer. // Since each cv::FaceRecognizer is derived from a // cv::Algorithm, you can query the data. // // First we'll use it to set the threshold of the FaceRecognizer // to 0.0 without retraining the model. This can be useful if // you are evaluating the model: // model -> set ( "threshold" , 0.0 ); // Now the threshold of this model is set to 0.0. A prediction // now returns -1, as it's impossible to have a distance below // it predictedLabel = model -> predict ( testSample ); cout << "Predicted class = " << predictedLabel << endl ; // Show some informations about the model, as there's no cool // Model data to display as in Eigenfaces/Fisherfaces. // Due to efficiency reasons the LBP images are not stored // within the model: cout << "Model Information:" << endl ; string model_info = format ( " \t LBPH(radius=%i, neighbors=%i, grid_x=%i, grid_y=%i, threshold=%.2f)" , model -> getInt ( "radius" ), model -> getInt ( "neighbors" ), model -> getInt ( "grid_x" ), model -> getInt ( "grid_y" ), model -> getDouble ( "threshold" )); cout << model_info << endl ; // We could get the histograms for example: vector < Mat > histograms = model -> getMatVector ( "histograms" ); // But should I really visualize it? Probably the length is interesting: cout << "Size of the histograms: " << histograms [ 0 ]. total () << endl ; return 0 ; } The source code for this demo application is also available in the src folder coming with this documentation: src/facerec_lbph.cpp

Conclusion¶ You’ve learned how to use the new FaceRecognizer in real applications. After reading the document you also know how the algorithms work, so now it’s time for you to experiment with the available algorithms. Use them, improve them and let the OpenCV community participate!

Credits¶ This document wouldn’t be possible without the kind permission to use the face images of the AT&T Database of Faces and the Yale Facedatabase A/B. The Database of Faces¶ ** Important: when using these images, please give credit to “AT&T Laboratories, Cambridge.” ** The Database of Faces, formerly The ORL Database of Faces, contains a set of face images taken between April 1992 and April 1994. The database was used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department. There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The files are in PGM format. The size of each image is 92x112 pixels, with 256 grey levels per pixel. The images are organised in 40 directories (one for each subject), which have names of the form sX, where X indicates the subject number (between 1 and 40). In each of these directories, there are ten different images of that subject, which have names of the form Y.pgm, where Y is the image number for that subject (between 1 and 10). A copy of the database can be retrieved from: http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip. Yale Facedatabase A¶ With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B. The Yale Face Database A (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. (Source: http://cvc.yale.edu/projects/yalefaces/yalefaces.html) Yale Facedatabase B¶ With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B. The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as the Yale Face Database B. Please refer to the homepage of the Yale Face Database B (or one copy of this page) for more detailed information of the data format. You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of “the Exteded Yale Face Database B” and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman’s paper, “From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose”, PAMI, 2001, [bibtex]. The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in “Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 [pdf].” All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images. If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well. (Source: http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html)

Literature¶ [AHP04] (1, 2, 3) Ahonen, T., Hadid, A., and Pietikainen, M. Face Recognition with Local Binary Patterns. Computer Vision - ECCV 2004 (2004), 469–481. [BHK97] (1, 2, 3, 4, 5) Belhumeur, P. N., Hespanha, J., and Kriegman, D. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 711–720. [Bru92] Brunelli, R., Poggio, T. Face Recognition through Geometrical Features. European Conference on Computer Vision (ECCV) 1992, S. 792–800. [Duda01] Duda, Richard O. and Hart, Peter E. and Stork, David G., Pattern Classification (2nd Edition) 2001. [Fisher36] Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals Eugen. 7 (1936), 179–188. [GBK01] Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J., From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 643-660. [Kanade73] Kanade, T. Picture processing system by computer complex and recognition of human faces. PhD thesis, Kyoto University, November 1973 [KM01] Martinez, A and Kak, A. PCA versus LDA IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, pp. 228-233, 2001. [Lee05] Lee, K., Ho, J., Kriegman, D. Acquiring Linear Subspaces for Face Recognition under Variable Lighting. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27 (2005), Nr. 5 [Messer06] Messer, K. et al. Performance Characterisation of Face Recognition Algorithms and Their Sensitivity to Severe Illumination Changes. In: In: ICB, 2006, S. 1–11. [RJ91] Raudys and A.K. Jain. Small sample size effects in statistical pattern recognition: Recommendations for practitioneers. - IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 3 (1991), 252-264. [Tan10] Tan, X., and Triggs, B. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Transactions on Image Processing 19 (2010), 1635–650. [TP91] Turk, M., and Pentland, A. Eigenfaces for recognition. Journal of Cognitive Neuroscience 3 (1991), 71–86. [Tu06] Chiara Turati, Viola Macchi Cassia, F. S., and Leo, I. Newborns face recognition: Role of inner and outer facial features. Child Development 77, 2 (2006), 297–311. [Wiskott97] Wiskott, L., Fellous, J., Krüger, N., Malsburg, C. Face Recognition By Elastic Bunch Graph Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997), S. 775–779 [Zhao03] Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A. Face recognition: A literature survey. ACM Computing Surveys (CSUR) 35, 4 (2003), 399–458.