Let's consider the problem first. I have a system \(Ax=b\), where \(A\in{R^{m\times{n}}}\) is a data matrix holding coefficients, and \(b\in{R^m}\) is an observation vector, while \(m\geq{n}\). Such system is called overdetermined, and usually there is no exact solution to such system.

Since we can't find the exact solution, we opt for the next best, the "nearest" solution. Recall from the Clojure Linear Algebra Refresher series that in linear algebra we use norms to measure distance. Therefore, the nearest something is something with the least norm \({\lVert Ax-b \rVert}_p\). But, which norm to choose? 1-norm? ∞-norm? Different norms will give different answers. The matter is, fortunately, solved by the fact that minimization of 1-norm and ∞-norm is too complicated, so our best buddy the Euclidean norm is the winner.

Minimization of the Euclidean norm \(min {\lVert Ax-b \rVert}_2\) is called the least squares problem. It is convenient because this function is differentiable, and the 2-norm is preserved under orthogonal transformations (see Clojure Linear Algebra Refresher series for the basics, and the specific details follow here shortly).

When \(A\) is full-rank (\(ran(A)=n\)), there is a unique linear squares solution, and it can be found by solving the symmetric positive definite system \(A^TAx_{LS}=A^Tb\). You remember one of the previous articles in this series? We talked about these.

Let's try a random example in Clojure:

First, I'll construct a (randomly chosen) matrix a , and make sure its rank is 2 (here, by doing SVD):

( def a ( dge 3 2 [ 1 2 3 -1 1 -1 ] ) )

( svd a )

#uncomplicate.neanderthal.internal.common.SVDecomposition{:sigma #RealDiagonalMatrix[double, type:gd mxn:2x2, offset:0] ▧ ┓ ↘ 3.79 1.63 ┗ ┛ , :u nil, :vt nil, :master true}

There are 2 non-zero values there, so the rank is 2, which I wanted to check.

Next, I'll construct \(A^TA\):

( def ata ( mm ( trans a ) a ) )

ata

#RealGEMatrix[double, mxn:2x2, layout:column, offset:0] ▥ ↓ ↓ ┓ → 14.00 -2.00 → -2.00 3.00 ┗ ┛

I created ata as a general matrix, so we can both look at it and see that it is symmetric, as promised by math theory. Now I'll make it symmetric (in the Clojure data structure sense). Of course, in a real project, I'd skip those intermediate steps.

( def b ( dge 3 1 [ 1 -1 -2 ] ) ) ( def solution ( sv ( view-sy ( mm ( trans a ) a ) ) ( mm ( trans a ) b ) ) )

solution

#RealGEMatrix[double, mxn:2x1, layout:column, offset:0] ▥ ↓ ┓ → -0.55 → -0.37 ┗ ┛

This should be the LS solution, and we'll check later whether a more general method will give the same answer. Yes, there are more general methods, so what I've shown you until now is just a learning-oriented warm-up. Usually you do not need to be so creative, and can just go to the method X and say "Solve this for me!".

An interesting side note here is that solving these normal equations is connected to solving gradient equations. Which might be an interesting topic if you're interested in machine learning. I'll have to say more on that when I finish these more general topics in the Linear Algebra series.

When we find the LS solution \(x_{LS}\), we can also compute the minimum residual \(r_{LS} = b-Ax_{LS}\). It's norm will tell us how far the system we solved exactly is from the original system.

( axpy -1 ( mm a solution ) b )

#RealGEMatrix[double, mxn:3x1, layout:column, offset:0] ▥ ↓ ┓ → 1.18 → 0.47 → -0.71 ┗ ┛

Minimum residual, the norm of the residual, seems quite high here. We'll check whether our solution is correct later.

( nrm2 ( axpy -1 ( mm a solution ) b ) )

1.459992790176863

This method is quite OK for full-rank systems, but even then, it is sensitive to the presence of small singular values (recall the last article about Singular Value Decomposition).

Another method is to use normal equations, which includes Cholesky factorization, matrix multiplication and matrix-vector multiplication. I'll skip this, because there is a better and more general method.

These better methods usually rely on the QR factorization.