Clojure Linear Algebra Refresher (1) - Vector Spaces

June 3, 2017

Please share: Twitter.

New books are available for subscription.

If you are at least a bit like me, you had learned (some) linear algebra during your university days, had done well at those math courses playing the human calculator role, multiplying matrices with pen and paper, but then buried these experiences deep down in your brain during those many years at typical programming tasks (which are not famous for using much linear algebra).

Now, you want to do some machine learning, or deep learning, or simply some random number crunching, and intuition takes you a long way, but not far enough: as soon as you hit more serious tasks, beyond the cats and dogs deep learning tutorials, there are things such as eigenvalues, or LU factorization, or whatever-the-hell-that-was. You can follow the math formulas, especially when someone else have already implemented them in software, but not everything is clear, and sooner or later you are the one that needs to make working software out of that cryptic formula involving matrices.

In other words, you are not completely illiterate when it comes to maths, but you can not really work out this proof or that out of the blue; your math-fu is way below your Clojure-fu. Fortunately, you are a good programmer, and do not need much hand holding when it comes to Clojure. You just need some dots connecting what you read in the math textbooks and the functions in the Clojure linear algebra library Neanderthal .

This article is a starting article in a series that is going to briefly skim through a good engineering textbook on linear algebra, making notes that help you relate that material to Clojure code. I like this book: Linear Algebra With Applications, Alternate Edition by Gareth Williams. The reasons I chose this book are the following:

It is application oriented; it has more theorem proofs than I need, but it at least does not skip application examples

It is a nice hardcover, printed in color, that can be purchased for whopping $8.78 worldwide second-hand (7th edition at least). EDIT: After this article spent a day at Hacker News frontpage, the price increased and number of available copies dropped. I do not recommend getting a PDF from the well-known Russian online library.

worldwide second-hand (7th edition at least). EDIT: After this article spent a day at Hacker News frontpage, the price increased and number of available copies dropped. I do not recommend getting a PDF from the well-known Russian online library. The alternate edition starts with 100 pages of matrix applications before diving into more abstract theory; good for engineers!

Any decent linear algebra textbook will cover the same topics, but not necessarily in the same order, or with the same grouping. This particular article covers chapter 4 from the book I recommended. A typical (non-alternate) textbook will have this chapter as a starting chapter.

The Vector Space \(R^n\) To work with vectors in a vector space \(R^n\), I use Neanderthal vectors, and specify the dimension \(n\) as a part of the construction of the vector. Sometimes the general vctr function is appropriate, and sometimes it is easier to work with specialized functions such as dv and sv (for native vectors on the CPU), clv (OpenCL vectors on GPU or CPU), or cuv (CUDA GPU vectors). This tutorial will use native, double floating-point precision vectors on the CPU ( dv ). To get access to these constructors, I'll require the namespace that defines them. I'll also require the core namespace, where most functions that operate on vectors and matrices are located. ( require ' [ uncomplicate.neanderthal [ native :refer :all ] [ core :refer :all ] ] ) For example, vector v1 is a vector in \(R^4\), while v2 is a vector in \(R^{22}\). ( def v1 ( dv -1 2 5.2 0 ) ) ( def v2 ( dv ( range 22 ) ) ) If i print these vectors out in the REPL, I can find more details about them, including their components (elements, entries): v1 #RealBlockVector[double, n:4, offset: 0, stride:1] [ -1.00 2.00 5.20 0.00 ] v2 #RealBlockVector[double, n:22, offset: 0, stride:1] [ 0.00 1.00 2.00 ⋯ 20.00 21.00 ] Addition and Scalar Multiplication Math books define vector spaces using surprisingly few (at least to us, programmers) things. These are addition and scalar multiplication. Addition in mathematics is simply and operation \(+\), but in programming, we are doing numerical computations, so we are also concerned with the implementation details that math textbooks do not discuss. One way to add two vectors would be the function xpy (vector x plus vector y): ( xpy v1 v2 ) This won't work, since we cannot add two vectors from different vector spaces (\(R^4\) and \(R^{22}\)), and when you evaluate this in your REPL, you'll get an exception thrown. The vectors have to be compatible, both in the mathematical sense (same vector space), and in an implementation specific way, in which there is no sense to add single-precision CUDA vector to a double-precision OpenCL vector. Let's try with v3 , which is in \(R^4\): ( def v3 ( dv -2 -3 1 0 ) ) ( xpy v1 v3 ) #RealBlockVector[double, n:4, offset: 0, stride:1] [ -3.00 -1.00 6.20 0.00 ] This function is pure; v1 and v2 haven't changed, while the result is a new vector instance (which in this case has not been kept, but went to the garbage). Scalar multiplication is done using the pure function scal . It accepts a scalar and a vector, and returns the scaled vector, while leaving the original as it was before: ( scal 2.5 v1 ) #RealBlockVector[double, n:4, offset: 0, stride:1] [ -2.50 5.00 13.00 0.00 ] v1 #RealBlockVector[double, n:4, offset: 0, stride:1] [ -1.00 2.00 5.20 0.00 ] Pure functions are all fine and dandy, but in the real world of numerical computations, we are constrained with time and space: we need our vectors to fit into available memory, and the results to be available today. With vectors in \(R^4\), computers will achieve that no matter how bad and unoptimized our code is. For the real world applications though, we'll deal with billions of elements and demanding operations. For those cases, we'll want to do two things: minimize memory copying, and fuse multiple operations into one. When it comes to vector addition and scalar multiplication, this means that we can fuse scal and xpy into axpy (scalar a times vector x plus vector y) and avoid memory copying by using destructive functions such as scal! and axpy! (note the ! suffix). For more varieties, please check the documentation of the neanderthal core namespace out. ( axpy! 2.5 v1 v3 ) #RealBlockVector[double, n:4, offset: 0, stride:1] [ -4.50 2.00 14.00 0.00 ] Note that v3 has changed as a result of this operation, and it now contains the result, written over its previous contents: v3 #RealBlockVector[double, n:4, offset: 0, stride:1] [ -4.50 2.00 14.00 0.00 ] Special Vectors Zero vector can be constructed by calling vector constructor with an integer argument that specifies dimension: ( dv 7 ) #RealBlockVector[double, n:7, offset: 0, stride:1] [ 0.00 0.00 0.00 ⋯ 0.00 0.00 ] When we already have a vector, and need a zero vector in the same vector space (with the same dimension), we can call the zero function: ( zero v2 ) #RealBlockVector[double, n:22, offset: 0, stride:1] [ 0.00 0.00 0.00 ⋯ 0.00 0.00 ] Negative of a vector is computed simply by scaling with -1 : ( def v4 ( scal -1 v1 ) ) v4 #'user2/v4#RealBlockVector[double, n:4, offset: 0, stride:1] [ 1.00 -2.00 -5.20 -0.00 ] How to do vector subtraction? As I mentioned earlier, there are only two independent operations in \(R^n\), vector addition and scalar multiplication. Vector subtraction is simply an addition with the negative vector. We can do this with one fused operation, be it axpy or axpby : ( axpby! 1 v1 -1 v3 ) #RealBlockVector[double, n:4, offset: 0, stride:1] [ 3.50 0.00 -8.80 0.00 ] Linear Combinations of Vectors To compute a linear combination such as \(a\mathbf{u} + b\mathbf{v} + c\mathbf{w}\), we can use multiple axpy calls or even let one axpy call sort this out (and waste less memory by not copying intermediate results)! ( let [ u ( dv 2 5 -3 ) v ( dv -4 1 9 ) w ( dv 4 0 2 ) ] ( axpy 2 u -3 v 1 w ) ) #RealBlockVector[double, n:3, offset: 0, stride:1] [ 20.00 7.00 -31.00 ] Column Vectors Both row vectors and column vectors are represented in the same way in Clojure: with Neanderthal dense vectors. Orientation does not matter, and the vector will simply fit into an operation that needs the vector to be horizontal or vertical in mathematical sense.

Dot Product, Norm, Angle, and Distance Dot product of two compatible vectors \(\mathbf{u}\cdot\mathbf{v} = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n\) can be computed using the function with a predictably boring name dot : ( let [ u ( dv 1 -2 4 ) v ( dv 3 0 2 ) ] ( dot u v ) ) 11.0 Norm (aka length, or magnitude, or L2 norm) most often refers to Euclidean distance between two vectors (although other norms can be used): \(\lVert \mathbf{u} \lVert = \sqrt{u_1^2 + u_2^2 + \cdots + u_n^2}\) In Clojure, we use the nrm2 function to compute this norm: ( nrm2 ( dv 1 3 5 ) ) 5.916079783099616 A unit vector is a vector whose norm is 1. Given a vector v , we can construct a unit vector u in the same direction: \(\mathbf{u} = \frac{1}{\lVert \mathbf{v} \lVert} \mathbf{v}\) or, in Clojure code: ( let [ v ( dv 1 3 5 ) ] ( scal ( / ( nrm2 v ) ) v ) ) #RealBlockVector[double, n:3, offset: 0, stride:1] [ 0.17 0.51 0.85 ] This process is called normalizing the vector. Angle Between Vectors Any linear algebra book will teach you how to compute the cosine of an angle between two vectors: \(cos\theta = \frac{\mathbf{u \cdot v}}{\lVert \mathbf{u} \lVert \lVert \mathbf{v} \lVert}\) or, in Clojure: ( let [ u ( dv 1 0 0 ) v ( dv 1 0 1 ) ] ( / ( dot u v ) ( nrm2 u ) ( nrm2 v ) ) ) 0.7071067811865475 You can compute θ out of this cosine, but sometimes you do not even need to do that. For example, two vectors are orthogonal if the angle between them is \(90^\circ\). Since we know that \(cos{\frac{\pi}{4}} = 0\), we can simply test whether the dot product is 0. ( dot ( dv 2 -3 1 ) ( dv 1 2 4 ) ) 0.0 These two vectors seem to be orthogonal. We can determine distance between points in a vector space by calculating the norm of the difference between their direction vectors: \(d(\mathbf{x},\mathbf{y}) = \lVert \mathbf{x} - \mathbf{y} \lVert\) In Clojure, that is as simple to do as: ( let [ x ( dv 4 0 -3 5 ) y ( dv 1 -2 3 0 ) ] ( nrm2 ( axpy -1 y x ) ) ) 8.602325267042627

General Vector Spaces It is important to note that real vectors are not the only "kind" of vector spaces there is. Anything that has operations of addition and scalar multiplication that have certain properties (see the textbooks for details) is a vector space. Some well known vector spaces are real vectors (\(R^n\)), matrices (\(M^{mn}\)), complex vectors (\(C^n\)), and, of course - functions. However, when we do numerical computing, we usually work with real or complex vectors and matrices, so in the following discussion I'll refer only to the parts of the textbook that deal with those. For the abstract theoretical parts, you won't use Clojure anyway, but your trusty pen and paper. Neanderthal currently supports only real vectors and matrices; complex vectors and matrices will be supported in some of the upcoming versions. I have already shown you how to use vectors. With matrices, it's similar, just with more options. The first difference is that we can use different optimized implementations of matrices, to exploit knowledge of their structure. Therefore, there are dense matrices (GE), dense triangular matrices (TR), dense symmetrical matrices (SY), various banded storages for symmetric and triangular matrices, sparse matrices (Sp), etc. Some of those are already implemented and supported in Neanderthal at the time of this writing, some are waiting their turn to be implemented (and may be at the time you're reading this - check Neanderthal API docs out). Anyway, let's construct a matrix and compute its norm: ( def m ( dge 2 3 ( range 6 ) ) ) m #'user2/m#RealGEMatrix[double, mxn:2x3, layout:column, offset:0] ▥ ↓ ↓ ↓ ┓ → 0.00 2.00 4.00 → 1.00 3.00 5.00 ┗ ┛ ( nrm2 m ) 7.416198487095662 Euclidean norm applied to matrices is known as Frobenius norm (FYI). Most of the functions that can be applied to vectors, can also be applied to matrices. Furthermore, there are many more functions in linear algebra that work exclusively with matrices. Neanderthal API docs list what is available in Clojure at the moment when you read this.

Skip Some Theory The following two sections, Subspaces and Linear Combinations are theoretical. We skip them, since I can not show you anything in Clojure code related to this.

Linear Dependence and Independence Lots of theory here, but when it comes to computation, I can note that two vectors are linearly dependent if they are on the same line, implying that the angle between them is 0, meaning the cosine is 1. If cosine is not 1, they are independent. There is a catch, though: floating-point computations are not absolutely precise. For very large vectors, sometime there will be rounding error, so you might get something like 0.99999999 or 1.000001. Equality comparison of floating-point numbers is a tricky business. Keep that in mind, and know that even there Neanderthal can help you, since it offers functions that take some imprecision margin into account. See the Neanderthal API docs and look up for functions such as f= in the math namespace. So, in Clojure: ( let [ x ( dv 2 -1 3 ) y ( dv 4 -2 6 ) ] ( / ( dot x y ) ( nrm2 x ) ( nrm2 y ) ) ) 1.0 These two vectors are linearly dependent indeed. But, what if we have more than two vectors, and are tasked with ckecking whether their set is dependent or not? See example 1 from the page 170 of the book I recommended. There is a set of vectors \({x = (1,2,0), y = (0,1,-1), z = (1,1,2)}\), and we need to find whether there are scalars \(c_1, c_2, \text{and } c_3\) such that \(c_1\mathbf{x} + c_2\mathbf{y} + c_3\mathbf{z} = 0\). This leads to a system of linear equations that, when solved, could have an unique solution, in which case the vectors are linearly independent, or to more than one solution, in which case the vectors are linearly dependent. Luckily for us, Neanderhal offers functions for solving systems of linear equations. Require the linalg namespace to reach for them: ( require ' [ uncomplicate.neanderthal.linalg :refer :all ] ) Now, we can set up and solve the system that will find the answer for us: ( let [ cs ( dge 3 1 ) ] ( sv! ( dge 3 3 [ 1 2 0 0 1 -1 1 1 2 ] ) cs ) cs ) #RealGEMatrix[double, mxn:3x1, layout:column, offset:0] ▥ ↓ ┓ → 0.00 → -0.00 → -0.00 ┗ ┛ Just like in the textbook, the solution is: \(c_1=0, c_2=0, c_3=0\). If we tried with obviously dependent vectors, we would have got an exception complaining that the solution could not be computed. Not bad…

Basis and Dimension Some more theory here that does not relate directly to Clojure code. Note that the dimension of the space of a vector could be inquired with the (dim x) function. The dimension of the space of the matrix is \(m \times n\), so (* (mrows a) (ncols a)) . All the spaces we deal with here have finite dimension, while we leave infinite dimension vectors to pen and paper adventures.

Rank Rank of a matrix is useful since it relates matrices to vectors, and can tell us some useful properties of a matrix at hand. See the textbook (or wikipedia) for the definition and use. What is the question right now is how do we compute it? It is not easy at all, but, fortunately, I can use Neanderthal. When we inquire about the rank, we expect a natural number as a result. We can get rank by doing singular value decomposition by calling function svd , and inspecting one of its results, namely the sorted vector of singular values s . The number of singular values in :sigma that are greater than 0 is the rank of the matrix. As before, take into account that floating point numbers should be compared with equality carefully. Here's example 3 (page 185 from the textbook) done in Clojure: ( let [ a ( dge 4 4 [ 1 0 0 0 2 0 0 0 0 1 0 0 0 0 1 0 ] ) ] ( svd a ) ) #uncomplicate.neanderthal.internal.common.SVDecomposition{:sigma #RealDiagonalMatrix[double, type:gd mxn:4x4, offset:0] ▧ ┓ ↘ 2.24 1.00 1.00 0.00 ┗ ┛ , :u nil, :vt nil, :master true} After inspecting vector :sigma for non-zero values (visually, or by writing a small function that checks its elements) I am a little smarter than before since I know now that \(rank(A) = 3\). Seems so cumbersome. Why isn't there a function rank in Neanderthal that does this for me? Well, the problem is that I rarely need only rank. Often I also need those other results from svd! or similar functions. If rank was so easy to access, it would have promoted wasteful computations of other results that go with it. If you really need rank it is easy to write that function as part of your personal toolbox. Maybe I'll include such functions in Neanderthal later. Who knows. What is important is that the cake is available right now, so you can easily add your own cherries on top of it.

Orthonormal Vectors and Projections This is yet another theoretical section. From the computational point of view, what could be interesting, is computing projection of a vector onto another vector or a subspace. \(proj_u v = \frac{v \cdot u}{u \cdot u} u\) Basically, it boils down to be able to compute dot product and then scale vector u. ( let [ v ( dv 6 7 ) u ( dv 1 4 ) ] ( scal ( / ( dot u v ) ( dot u u ) ) u ) ) #RealBlockVector[double, n:2, offset: 0, stride:1] [ 2.00 8.00 ] It's similar for subspaces, so that might be the obligatory trivial exercise that is being left to the reader.