Variations on a theme

Back in 2014 I wrote a post called How to Conquer Tensorphobia that should end up on Math Programming’s “greatest hits” album. One aspect of tensors I neglected to discuss was the connection between the modern views of tensors and the practical views of linear algebra. I feel I need to write this because every year or two I forget why it makes sense.

The basic question is:

What the hell is going on with the outer product of vectors?

The simple answer, the one that has never satisfied me, is that the outer product of is the matrix whose entry is the product . This doesn’t satisfy me because it’s an explanation by fiat. It lacks motivation and you’re supposed to trust (or verify) things magically work out. To me this definition is like the definition of matrix multiplication: having it dictated to you before you understand why it makes sense is a cop out. Math isn’t magic, it needs to make perfect sense.

The answer I like, and the one I have to re-derive every few years because I never wrote it down, is a little bit longer.

To borrow a programming term, the basic confusion is a type error. We start with two vectors, in some vector space (let’s say everything is finite dimensional), and we magically turn them into a matrix. Let me reiterate for dramatic effect: we start with vectors, which I have always thought of as objects, things you can stretch, rotate, or give as a gift to your cousin Mauricio. While a matrix is a mapping, a thing that takes vectors as input and spits out new vectors. Sure, you can play with the mappings, feed them kibble and wait for them to poop or whatever. And sure, sometimes vectors are themselves maps, like in the vector space of real-valued functions (where the word “matrix” is a stretch, since it’s infinite dimensional).

But to take two vectors and *poof* get a mapping of all vectors, it’s a big jump. And part of the discomfort is that it feels random. To be happy we have to understand that the construction is natural, or even better canonical, meaning this should be the only way to turn two vectors into a linear map. Then the definition would make sense.

So let’s see how we can do that. Just to be clear, everything we do in this post will be for finite-dimensional vector spaces over , but we’ll highlight the caveats when they come up.

Dual vector spaces

The first step is understanding how to associate a vector with a linear map in a “natural” or “canonical” way. There is one obvious candidate: if you give me a vector , I can make a linear map by taking the dot product of with the input. So I can associate

The dash is a placeholder for the input. Another way to say it is to define and say the association takes . So this “association,” taking to the inner product, is itself a mapping from to “maps from to .” Note that is linear in and because that’s part of the definition of an inner product.

To avoid saying “maps from to ” all the time, we’ll introduce some notation.

Definition: Let be a vector space over a field . The set of -linear maps from is called . More generally, the set of -linear maps from to another -vector space is called .

“Hom” stands for “homomorphism,” and in general it just means “maps with the structure I care about.” For this post , but most of what we say here will be true for any field. If you go deeply into this topic, it matters whether is algebraically closed, or has finite characteristic, but for simplicity we’ll ignore all of that. We’ll also ignore the fact that these maps are called linear functionals and this is where the name “functional analysis” comes from. All we really want to do is understand the definition of the outer product.

Another bit of notation for brevity:

Definition: Let be a -vector space. The dual vector space for , denoted , is .

So the “vector-to-inner-product” association we described above is a map . It takes in and spits out .

Now here’s where things start to get canonical (interesting). First, is itself a vector space. This is an easy exercise, and the details are not too important for us, but I’ll say the key: if you want to add two functions, you just add their (real number) outputs. In fact we can say more:

Theorem: and are isomorphic as vector spaces, and the map is the canonical isomorphism.

Confessions of a mathematician: we’re sweeping some complexity under the rug. When we upgraded our vector space to an inner product space, we fixed a specific (but arbitrary) inner product on . For finite dimensional vector spaces it makes no difference, because every finite-dimensional -inner product space is isomorphic to with the usual inner product. But the theorem is devastatingly false for infinite-dimensional vector spaces. There are two reasons: (1) there are many (non-canonical) choices of inner products and (2) the mapping for any given inner product need not span . Luckily we’re in finite dimensions so we can ignore all that. [Edit: see Emilio’s comments for a more detailed discussion of what’s being swept under the rug, and how we’re ignoring the categorical perspective when we say “natural” and “canonical.”]

Before we make sense of the isomorphism let’s talk more about . First off, it’s not even entirely obvious that is finite-dimensional. On one hand, if is a basis of then we can quickly prove that are linearly independent in . Indeed, if they weren’t then there’d be some linear combination that is the zero function, meaning that for every vector , the following is zero

But since the inner product is linear in both arguments we get that for every . And this can only happen when is the zero vector (prove this).

One consequence is that the linear map is injective. So we can think of as “sitting inside” . Now here’s a very slick way to show that the span all of . First we can assume our basis is actually an orthonormal basis with respect to our inner product (this is without loss of generality). Then we write any linear map as

To show these two are actually equal, it’s enough to show they agree on a basis for . That is, if you plug in to the function on the left- and right-hand side of the above, you’ll get the same thing. The orthonormality of the basis makes it work, since all the irrelevant inner products are zero.

In case you missed it, that completes the proof that and are isomorphic. Now when I say that the isomorphism is “canonical,” I mean that if you’re willing to change the basis of and , then is the square identity matrix, i.e. the only isomorphism between any two finite vector spaces (up to a change of basis).

Tying in tensors

At this point we have a connection between single vectors and linear maps whose codomain has dimension 1. If we want to understand the outer product, we need a connection between pairs of vectors and matrices, i.e. . In other words, we’d like to find a canonical isomorphism between and . But already it’s not possible because the spaces have different dimensions. If then the former has dimension and the latter has dimension . So any “natural” relation between these spaces has to be a way to embed as a subspace via some injective map.

There are two gaping problems with this approach. First, the outer product is not linear as a map from . To see this, take any , pick any scalar . Scaling the pair means scaling both components to , and so the outer product is the matrix .

The second problem is that the only way to make a subspace of (up to a change of basis) is to map to the first two rows of a matrix with zeros elsewhere. This is canonical but it doesn’t have the properties that the outer product promises us. Indeed, the outer product let’s us uniquely decompose a matrix as a “sum of rank 1 matrices,” but we don’t get a unique decomposition of a matrix as a sum of these two-row things. We also don’t even get a well-defined rank by decomposing into a sum of two-row matrices (you can get cancellation by staggering the sum). This injection is decisively useless.

It would seem like we’re stuck, until we think back to our association between and . If we take one of our two vectors, say , and pair it with , we can ask how could be turned into a linear map in . A few moments of guessing and one easily discovers the map

In words, we’re scaling by the inner product of and . In geometric terms, we project onto and scale by the signed length of that projection. Let’s call this map , so that the association maps . The thought process of “easily discovering” this is to think, “What can you do with a function and an input ? Plug it in. Then what can you do with the resulting number and a vector ? Scale .”

If you look closely you’ll see we’ve just defined the outer product. This is because the outer product works by saying is a matrix, which acts on a vector by doing . But the important thing is that, because and are canonically isomorphic, this is a mapping

Now again, this mapping is not linear. In fact, it’s bilinear, and if there’s one thing we know about bilinear maps, it’s that tensors are their gatekeepers. If you recall our previous post on tensorphobia, this means that this bilinear map “factors through” the tensor product in a canonical way. So the true heart of this association is a map defined by

And now the punchline,

Theorem: is an isomorphism of vector spaces.

Proof. If is a basis for then it’s enough to show that forms a basis for . Since we already know and there are of the , all we need to do is show that the ‘s are linearly independent. For brevity let me remove the ‘s and call .

Suppose they are not linearly independent. Then there is some choice of scalars so that the linear combination below is the identically zero function

In other words, if I plug in any from my (orthonormal) basis, the result is zero. So let’s plug in .

The orthonormality makes all of the when , so we get a linear combination of the being zero. Since the form a basis, it must be that all the . The same thing happens when you plug in or any other , and so all the are zero, proving linear independence.

This theorem immediately implies some deep facts, such as that every matrix can be uniquely decomposed as a sum of the ‘s. Moreover, facts like the ‘s being rank 1 are immediate: by definition the maps scale a single vector by some number. So of course the image will be one-dimensional. Finding a useful basis with which to decompose a matrix is where things get truly fascinating, and we’ll see that next time when we study the singular value decomposition.

In the mean time, this understanding generalizes nicely (via induction/recursion) to higher dimensional tensors. And you don’t need to talk about slices or sub-tensors or lose your sanity over -tuples of indices.

Lastly, all of this duality stuff provides a “coordinate-free” way to think about the transpose of a linear map. We can think of the “transpose” operation as a linear map which (even in infinite dimensions) has the following definition. If then is a linear map taking to . The latter is a function in , so we need to say what it does on inputs . The only definition that doesn’t introduce any type errors is . A more compact way to say this is that .

Until next time!