Vectors and Matrices in Agda James Wood

A recent project of mine has been to formalise and reformalise some basic linear algebra in Agda. This was motivated directly by our recent result that many syntactic properties of usage-tracking type systems benefit from being stated in terms of vectors and matrices. This form of usage tracking appears to be exactly powerful enough to capture intuitionistic linear logic, so involvement of linear algebra is in some sense unsurprising. The similarities go further than the names.

The aim for this post is to describe and justify the method I have settled on. The challenge is to take an arbitrary semiring (rig) R of coefficients and define vectors, matrices, matrix addition, and matrix multiplication, and then prove all of the expected properties of these operations. We will not consider performance, mainly because I neither know nor care at the moment.

For the impatient, my recommendations are the following four points.

Define vectors and matrices as functions from positions to coefficients.

Positions are (single/pairs of) Fin s, with definitional equality as good as equivalence.

s, with definitional equality as good as equivalence. Define the identity matrix via an equality test on the indices of each position, rather than trying an inductive construction.

Do not define this equality test via the decidability of equality of Fin n .

Obligatory literate blog post notes: this post comes from a literate Agda file, of which many parts have been surpressed. Defined names are clickable. The full source code is available at VecMat.lagda.md, and depends just on the Standard Library. I have tried to make the Agda code readable at a high level, particularly by using equational reasoning syntax.

The naïve approach

If you just want to read the right way of doing things, skip to Using what we learnt.

Everyone’s favourite dependent type family is the vectors, defined as follows.

These are lists with a specified length. At a technical level, Vec is an inductive type family being constrained by the inductive structure of a natural number n . This means that any function we write out of a Vec will be by induction on the vector’s structure.

We can define matrices as vectors of row vectors.

We will also need to lift relations pointwise to vectors, mainly to talk about equivalence. For the sake of code reuse, we will take a heterogeneous definition.

Addition and the zero matrix will follow from the applicative structure of λ A → Vec A n (that is, zip and replicate , respectively). With a few more standard list functions, we will be able to define the identity matrix and matrix multiplication.

We will multiply matrices by taking the outer product of each column/row pair, giving a collection of matrices, and then summing these matrices together. This is why we define the function outer .

Accessing rows of matrices is easy, because we can view the first row by pattern matching. However, columns are buried, and to access them we need a function defined by induction on the rows.

lest/rest-map-∷ is a lemma we will need later. It says that adding a constant column and then viewing a column gives back that constant column and the original matrix.

With these done, we can focus on vector/matrix operations. Vectors have a zero, addition, and scaling by a coefficient.

These definitions are convenient for use in matrix operations.

This implementation of _*ₘ_ is by no means canonical. However, I argue that it fits the spirit of inductive vectors well. In particular, this definition is primarily by induction on N , one of the inputs. I have tried several other implementations in the past, but none were particularly nice.

In the Agda standard library, algebraic structures each come with an equivalence relation _≈_ , which is weaker than propositional equality. Algebraic laws only hold up to _≈_ , so we will need to have similar equivalence relations for vectors and matrices.

We want to prove that 1ₘ is the unit of _*ₘ_ . Our approach is to follow the definition of _*ₘ_ and do induction on M . This conveniently tells us something about the dimension m , allowing 1ₘ to compute.

Okay, we did it, but it’s a bit of a mess. If you look at the proof, you notice that a lot of it is about getting round the fact that, for an m × n matrix, we only have indirect access to the n dimension, the columns. The left/rest function is the worst offender for this. It says we want access to a column of a matrix, but the inductive structure only gives us access to the rows. So we have to pick off the first coefficient in each row, and call that a column.

Ultimately, this means that we should be caring more about the inductive structure of the dimensions, rather than any inductive structure on matrices themselves. Dimensions always appear alone, so doing induction over them is always easy, whereas matrices have two dimensions. So, to stop doing induction on matrices, let’s make them not inductive data structures.

Using what we learnt

How can we avoid prioritising one dimension of a matrix over the other? The only way of doing this I can think of is by defining vectors and matrices as functions from positions to values. Vectors and matrices are now no longer inductive structures, but rather functions, and our primary way of defining and using them is via index notation. Let’s do it.

This time, defining our own equivalence is even more critical. Even if the coefficients are up to propositional equality, standard Agda functions don’t support extensionality for propositional equality, so we’d have trouble proving anything.

Next, we can think about matrix operations.

The additive structure is particularly easy. Everything is pointwise in a way that even Agda can see, so the lemmas are easy too.

The multiplicative structure still requires some thought. In particular, there are some careful choices to be made about the identity matrix, which I will explain when we prove things about it.

These definitions should better resemble standard mathematical practice. Rather than our input-driven, parallel definition of _*ₘ_ from before, we have an output-driven definition in terms of lots of little sum s (mathematical Σ). 1ₘ has gone from the rather alien functional programming construction to the Kronecker δ.

Now, how about our test case of proving properties about multiplication?

The structure of this proof is very different from what it was last time, reflecting the differences in our definitions. In line with things being output-driven, the proof is by induction on the index i – a row number in the result. It can also be seen as being by induction on the dimension m , and then by cases on the index i . The steps of reasoning should look familiar to a mathematician. This is a proof by index notation where equations are all between coefficients rather than matrices, and we use both simple algebraic properties and properties of summation to simplify expressions. I invite the reader to actually read the reasoning steps, and to this end I have included some steps which are just computation and tried to move the proof terms (the ones inside ≈⟨ _ ⟩ ) out of the way.

Behind the scenes

There is still some sleight of hand at play in these proofs. The least sleight is the behaviour of 1ₘ . For slick proofs, we require all four cases ( 1ₘ zero zero , 1ₘ zero (suc j) , 1ₘ (suc i) zero , and 1ₘ (suc i) (suc j) ) to compute in a convenient way. The first three should compute to their values, and the final one should compute to 1ₘ i j so that we’re still in the realm of our inductive hypothesis. Without paying attention to this, it is quite easy to make an awkward definition of 1ₘ that will make you miserable.

Perhaps the most obvious thing for an Agda programmer to do would be to think of what the identity matrix looks like, and implement it in such a way that this intuition is visible in the code. The thing we all remember about the identity matrix is that, for the position (i, j) , the coefficient is 1 when i = j , and 0 otherwise. So, let’s use the fact that equality is decidable on Fin n , and just write this down.

This satisfies three of the computation rules, but normalising 1ₘd (suc i) (suc j) gives us 1ₘd (suc i) (suc j) | (suc i ≟ suc j | i ≟ j) . In other words, computation depends on the result of suc i ≟ suc j , which is in turn not going to compute until we know more about i ≟ j . This is a real pain. When we used the boolean-valued equality test _==_ earlier, we could say that suc i == suc j was equal to i == j . But with intrinsic decidable equality, suc i ≟ suc j and i ≟ j have different types, so we have to deconstruct and reconstruct our Dec values at the recursive call, fixing up the proofs depending on which side we’re on. One can go to the effort of proving that 1ₘd (suc i) (suc j) ≡ 1ₘd i j , but I’m sure one would rather not. It would be an afterthought.

The sneakiest trick I pulled (though admittedly the one with least impact) is in the index zero . Notice that, in the case *ₘ-identityˡ M zero k , the right-hand side of our equation is M zero k . The zero here comes about because we’re in the case where i = zero . However, working from the left of the equation, we see that we get a new zero from sum . Specifically, sum {suc n} v = v zero + sum {n} (v ∘ suc) , and the zero in this expression is something we made up from nothing. Thus, we benefited from the fact that all zero es are judgementally equal.

It may seem obvious that there is only one zero , but when reïmplementing Fin as thinnings from 1, this is no longer easy. To recap, here is a definition of thinnings.

The intended semantics is that a value of type Th m n is the instructions for how to take a vector of length n and produce a vector of smaller length m by only keeping some of the original elements. Thus, a value of Th 1 n describes how to pick 1 element out of a vector of length n .

Th 1 n is equivalent to Fin n . Moreover, the induction on them is very similar, as shown by this conversion function.

There is no case for stop , as 1 ≠ 0 . keep means that we’re keeping the first element, or in other words choosing the first element, so this corresponds to the index zero . drop means that we’re not keeping the first element, so the index must be somewhere later.

In the keep case, θ has type Th 0 n . Referring back to the semantics, this means that we’re not going to pick any more elements, so it must be that θ is n -many drop s followed by a stop . However, we cannot see this judgementally, so whenever we come across a case keep θ , we must reason it equal to any other representation of zero . In our proof *ₘ-identityˡ , this means adding a fixup line at the end of the i = zero case that rewrites by a propositional equality proof.

Conclusions

I like this case study because several of the recommendations I make go against one piece of common wisdom or another. One really has to think critically about requirements rather than just doing what has worked before (or, in my case, try out every possible approach and see which one works best).

It tells us something about the Standard Library, or more generally about the requirements of any standard library for Agda. In particular, we can’t replace Fin by Th 1 and expect everything to keep working, and even when we have Dec proofs, sometimes a Bool version of the same thing is necessary. Perhaps we should have a type EDec for extrinsic decidability, complementing intrinsic decidability from Dec .

We can take this all the way back to the fact that Σ, 2, and large eliminations are enough to implement binary sums. It’s interesting to note, though, the differences at the level of judgemental equality. Indeed, proj₁ (suc i ≟ᴱ suc j) computes to proj₁ (i ≟ᴱ j) , as we wanted.

The second definition of Vec is similar to stdlib’s Data.Table . I import it here just so you can click and find the definition.

The only difference between my Vec and Table is that Table is a single-field record rather than just a definition. The technique of creating a single-field record is often a useful one. For example, if we had dependent tables, where elements at different indices could have different types, the following definition would be best.

When we have a term t : DTable n A , unification can easily recover both n and A because they are there in the type. Instead of DTable n A computing to (i : Fin n) → A i , the record declaration intuitively blocks the computation until we explicitly use the get accessor. Unification has trouble with inferring A from (i : Fin n) → A i because it is a higher-order term. However, in our specific case for Vec (as in stdlib’s Table ), A is not a higher-order term, and there are no problems extracting n and A from Fin n → A . Thus stdlib’s definition of Table is overkill.

Questions and comments about all of this post are welcome.