1. Subtract mean The first step in the principal component analysis is to subtract the mean for each variable of the data set. The next scatter chart shows how the data is rearranged in our example. As shown, the subtraction of the mean results in a translation of the data, which have now zero mean.

2. Calculate the covariance matrix The covariance of two random variables measures the degree of variation from their means for each other. The sign of the covariance provides us with information about the relation between them: If the covariance is positive, then the two variables increase and decrease together.

If the covariance is negative, then when one variable increases, the other decreases, and vice versa. These values determine the linear dependencies between the variables, which will be used to reduce the data set's dimension. Back to our example, the covariance matrix is shown next. x 1 x 2 x 1 0.33 0.25 x 2 0.25 0.41 The variance is a measure of how spread the data is from the mean. The diagonal values show the covariance of each variable and itself, and they equal their variance. The off-diagonal values show the covariance between the two variables. In this case, these values are positive, which means that both variables increase and decrease together.

3. Calculate eigenvectors and eigenvalues Eigenvectors are defined as those vectors whose directions remain unchanged after any linear transformation has been applied. However, their length could not remain the same after the transformation, i.e., the result of this transformation is the vector multiplied by a scalar. This scalar is called eigenvalue, and each eigenvector has one associated with it. The number of eigenvectors or components that we can calculate for each data set is equal to the data set's dimension. In this case, we have a 2-dimensional data set, so the number of eigenvectors will be 2. The next image represents the eigenvectors for our example. Since they are calculated from the covariance matrix described before, eigenvectors represent the directions in which the data have a higher variance. On the other hand, their respective eigenvalues determine the amount of variance that the data set has in that direction. Once we have obtained these new directions, we can plot the data in terms of them, as shown in the next image for our example. Note that the data has not changed; we are rewriting them in terms of these new directions instead of the previous x 1 -x 2 directions.

4. Select principal components Among the available eigenvectors that previously calculated, we must select those onto which we project the data. The selected eigenvectors will be called principal components. To establish a criterion to select the eigenvectors, we must first define the relative variance of each and the total variance of a data set. The relative variance of an eigenvector measures how much information can be attributed to it. The total variance of a data set is the sum of the variance of all the variables. These two concepts are determined by the eigenvalues. For our example, the next table shows the relative and the cumulative variance for each eigenvector. Relative variance Cumulative variance PC 1 84.60 84.60 PC 2 15.40 100 As we can see, the first eigenvector can explain almost 85% of all the data's variance while the second eigenvector explains around 15% of it. The next graph shows the cumulative variance for the components. A common way to select the variables is to establish the amount of information that we want the final data set to explain. If this amount of information decreases, the number of principal components that we select will decrease. In this case, as we want to reduce the 2-dimensional data set into a 1-dimensional data set, we will select the first eigenvector as the principal component. Consequently, the final reduced data set will explain 85% of the variance of the original one.

5. Reduce data dimension Once we have selected the principal components, the data must be projected onto them. The next image shows the result of this projection for our example. Although this projection can explain most of the variance of the original data, we have lost the information about the variance along with the second component. In general, this process is irreversible, which means that we cannot recover the original data from the projection.