Now let’s compute a new matrix Y , which is the original data matrix projected onto the first num_components principal components.

For example, the first (also third) principal component captures a male face, whereas the second (also fourth) one seems to capture a female face, the fifth one captures a face with long hairs.

Next, let’s plot the first few principal components . From what we computed above, each right singular vector has the same number of entries as there are pixels in an image. So, we could plot them as images by reshaping them. What do they appear to capture?

The question is ill-defined and the answer is relative. In this case, a reasonable argument is that the spectrum actually decays somewhat slowly. Why? If we try fitting the singular values { σ_i } to functions of i , we’ll find that 1/√(i+1) is actually a pretty good fit. That is considered fairly slow decay; there would be significant compressibility if the curve dropped off exponentially (or at least superlinearly) instead.

Does the spectrum of these data decay quickly or slowly? How should that affect our choice of k, if we consider a k-truncated SVD?

The following table and the plot inspect the singular values, i.e., the entries of Σ stored in Sigma . The plot will show the singular values as dots, plotted at each position x=i for the i-th singular values. To give a rough idea of how quickly the singular values decay, the plot includes a solid line showing the curve, σ0/√(i+1).

num_components = 5 # Number of principal components Y = np . matmul ( X , VT [: num_components ,:] . T )

The next plot visualizes the face datapoints projected on the first two principal components with bokeh with the thumbnails as the face images.

Next, let’s run Run k-means on the projected data, Y[:m, :num_components] , to try to identify up to num_clusters clusters.

The next animation shows the interactive plot of the face data points projected on the first two principal components with bokeh with the thumbnails as the face images.

For the 500 randomly selected faces, the green cluster seems to contain mostly the whiter / brighter faces whereas the red cluster mostly contains the darker faces, whereas the blue cluster seems to contain younger populations mostly.

2. Face Reconstruction and A Simple Face Detector

This problem appeared as projects in this CMU machine learning for signal processing course and also in this UW Computer Vision course. The description of the problem is taken straightaway from the course projects.

We are given a corpus of facial images [here] from the LFWCrop database. Each image in this corpus is 64 x 64 and grayscale. We need to learn a typical (i.e. Eigen) face from them. There are 1071 (.pgm) images in this dataset. A few of them are shown below.

Computing the EigenFaces

As described in the original paper from MIT and this case study from JHU, the following figure shows the theory / algorithm for how to efficiently compute the eigenfaces from the (training) images and then choose top k eigenfaces (eigenvectors).

The next 4 figures show the meanface, the percentage variance captured by the eigenfaces and the top 100 and then top 400 eigenfaces computed computed from the images, respectively.





Projection on the EigenFaces and Reconstruction

Now, let’s project a face onto the face space to generate a vector of k coefficients, one for each of the k eigenfaces (for different values of k). Then let’s reconstruct the same face from the vector of coefficients computed.

The following figures and animations show how a given image can be reconstructed by projecting on the first few dominant eigenfaces and also how the reconstruction error (residual) decreases as more and more eigenfaces are used for projection. Also, the reconstruction error goes down quite fast, with the selection of first few eigenfaces.

Original Face Image



Reconstructed Face Image by projecting on the truncated eigenfaces space



The Reconstruction Error when first k eigenfaces were used



The same is shown for yet another image.

The next figure shows the images projected on and reconstructed from the first two eigenfaces. The similar images stay in close proximity whereas the dissimilar ones (w.r.t. projection on the first 2 dominant eigenvectors) are far apart in the 2D space.