A Visual Introduction to Function Kernels

In the last few posts we’ve focused heavily on matrices and their applications. In this post we’re going to use matrices to learn about Kernels.

The Kernel of a function is the set of points that the function sends to 0 0 0. Amazingly, once we know this set, we can immediately characterize how the matrix (or linear function) maps its inputs to its outputs.

I hope that by the end of this post you will:

Understand what a Kernel of a function is and how it helps us understand a function better. Realize that the inverses of output points are always some translation of the Kernel (for linear functions). See that there are many pretty patterns and coincidences that flow out of the properties of linear functions.

Functions Across Spaces

In previous posts, we noticed how matrices are just linear functions. We found that the matrices we studied just rotate or stretch a vector in some way.

An example of a function that stretches its input. Source: http://developer.apple.com. An example of a function that stretches its input. Source: http://developer.apple.com.

But we only studied square matrices (i.e. 2x2 or 3x3 matrices). What happens when our matrices aren’t square?

Let’s start with a 2x3 matrix that represents a linear function f f f. Let’s study a function f f f defined by the matrix F F F:

F = [ 2 1 0 0 1 3 ] F = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} F=[20​11​03​]

What happens if I apply this matrix on a vector v = [ 3 2 1 ] v = \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} v=⎣⎡​321​⎦⎤​?

Let’s find out:

F v = [ 2 1 0 0 1 3 ] [ 3 2 1 ] F v = [ 8 5 ] \begin{aligned} F v &= \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix} \\ F v &= \begin{bmatrix} 8 \\ 5 \end{bmatrix} \end{aligned} FvFv​=[20​11​03​]⎣⎡​321​⎦⎤​=[85​]​

In other words, f ( [ 3 2 1 ] ) = [ 8 5 ] ) f(\begin{bmatrix} 3 \\ 2 \\ 1 \end{bmatrix}) = \begin{bmatrix} 8 \\ 5 \end{bmatrix}) f(⎣⎡​321​⎦⎤​)=[85​]).

So f f f effectively takes a point in 3 Dimensions, ( [ 3 2 1 ] \begin{bmatrix}3 \\ 2 \\ 1\end{bmatrix} ⎣⎡​321​⎦⎤​) and sends it to a point in 2 Dimensions ( [ 8 5 ] \begin{bmatrix} 8 \\ 5 \end{bmatrix} [85​]). We can see this below:

f f f maps points from R 3 R^3 R 3 to R 2 R^2 R 2 . maps points fromto

We interact with functions that take points from 3D to 2D all the time. For instance, everytime you take a picture with a camera you are taking a 3D space (the world you see), and collapsing it onto a 2D space (the camera sensor).

A camera converts a 3D object (the tree) into a 2d representation (image). Source: Wikipedia. A camera converts a 3D object (the tree) into a 2d representation (image). Source: Wikipedia.

The input space for f f f is R 3 R^3 R3 and the output space is R 2 R^2 R2. More formally, we write this as:

f : R 3 → R 2 f: R^3 \rightarrow R^2 f:R3→R2

Losing Information

Returning to the example of cameras, when we take pictures, we squash a 3D world onto a 2D sensor. In the process, we lose some amount of information primarily related to depth.

Specifically, points that are far away will appear close to each other even though they may be quite distant.

Eventually, points infinitely far away on the horizon all collapse onto the same point. We can see this in the example image below.

Notice how the pillars above appear closer and closer together even though they are staying the same distance apart. We have lost information about the distance between this pillars when converting to 2D. Image source: Unsplash. Notice how the pillars above appear closer and closer together even though they are staying the same distance apart. We have lost information about the distance between this pillars when converting to 2D. Image source: Unsplash.

So just like a camera, will our function also “lose” some information when it moves points from 3D to 2D? Will it collapse multiple points from the input to the same point in the output?

The Kernel - Set of Points that Map to 0

To answer this, let’s start by seeing all the points v = [ v 1 v 2 v 3 ] v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} v=⎣⎡​v1​v2​v3​​⎦⎤​ that map onto the origin of the output space - [ 0 0 ] \begin{bmatrix} 0 \\ 0 \end{bmatrix} [00​]. This gives us a good starting point for understanding which points from our input hit the same point in the output.

Which points map to [ 0 0 ] \begin{bmatrix}0 \\ 0\end{bmatrix} [ 0 0 ​ ] ? Which points map to

We want to solve:

F v = [ 0 0 ] [ 2 1 0 0 1 3 ] [ x y z ] = [ 0 0 ] \begin{aligned} Fv &= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \\ \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} &= \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{aligned} Fv[20​11​03​]⎣⎡​xyz​⎦⎤​​=[00​]=[00​]​

Carrying out this multiplication, we see this is satisfied when:

2 x + y = 0 2x + y = 0 2x+y=0 y + 3 z = 0 y + 3z = 0 y+3z=0

Solving this for each variable in terms of y y y , we find: x = − y 2 z = − y 3 \begin{aligned} x &= -\frac{y}{2} \\ z &= -\frac{y}{3} \end{aligned} x z ​ = − 2 y ​ = − 3 y ​ ​ So, f − 1 ( [ 0 0 ] ) f^{-1}(\begin{bmatrix} 0 \\ 0\end{bmatrix}) f − 1 ( [ 0 0 ​ ] ) is a line parameterized by: x = − t 2 y = t z = − t 3 \begin{aligned} x &= -\frac{t}{2} \\ y &= t \\ z &= -\frac{t}{3} \\ \end{aligned} x y z ​ = − 2 t ​ = t = − 3 t ​ ​ for some t ∈ R t \in R t ∈ R .

This line is shown below. Some points on this line are: v 1 = [ 0 0 0 ] v_1 = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} v1​=⎣⎡​000​⎦⎤​, v 2 = [ − 3 6 − 2 ] v_2 = \begin{bmatrix} -3 \\ 6 \\ -2 \end{bmatrix} v2​=⎣⎡​−36−2​⎦⎤​, v 3 = [ − 4 . 5 9 − 3 ] v_3 = \begin{bmatrix} -4.5 \\ 9 \\ -3 \end{bmatrix} v3​=⎣⎡​−4.59−3​⎦⎤​.

The Kernel of f f f is the set of points that f f f maps to 0 0 0 . In this case, it forms a line. The Kernel ofis the set of points thatmaps to. In this case, it forms a line.

There’s a term for this set - the Kernel. We call it the Kernel because this set of vectors maps to the origin, or the core, of the output space.

In our specific case, the Kernel is a line. When f f f maps this line to the point 0 0 0, we lose information about the line - in the output space, the points on the line are no longer distinguishable.

Returning to our camera analogy, this is similar to how all points on the horizon are no longer distinguishable after the conversion from 3D to 2D. Thus, you can think of the Kernel as a quick way to see how the function compresses or loses information.

Terminology Aside

Let’s get some more quick terminology out of the way before proceeding. We’re going to use the following terms:

Image - the set of outputs of the function (i.e. everything in f ( x ) f(x) f(x)). The image of a point x x x is just f ( x ) f(x) f(x). Pre-Image - the set of inputs for the function (i.e. the x x x in f ( x ) f(x) f(x)). The pre-image of a point y y y is just f − 1 ( y ) f^{-1}(y) f−1(y).

In the above example, the function f f f maps the oval (Pre-Image) on the left to the point (Image) on the right. In the above example, the functionmaps the oval (Pre-Image) on the left to the point (Image) on the right.

Translations of the Kernel - Mapping to [ 1 1 ] \begin{bmatrix} 1 \\ 1 \end{bmatrix} [ 1 1 ​ ]

We found the set of points that map to [ 0 0 ] \begin{bmatrix} 0 \\ 0 \end{bmatrix} [00​] (i.e. the pre-image of the origin). We call this set the Kernel or K K K for short.

Can we now similarly find the set of points that map to [ 1 1 ] \begin{bmatrix} 1 \\ 1 \end{bmatrix} [11​]?

Once you know the pre-image of [ 0 0 ] \begin{bmatrix} 0 \\ 0 \end{bmatrix} [ 0 0 ​ ] , it's super simple to find the pre-image of [ 1 1 ] \begin{bmatrix} 1 \\ 1 \end{bmatrix} [ 1 1 ​ ] or any other point for that matter. We're going to do this to show something really cool:

Finding the pre-image

Let’s start by finding the points that maps to [ 1 1 ] \begin{bmatrix} 1 \\ 1 \end{bmatrix} [11​] as before.

F v = [ 1 1 ] [ 2 1 0 0 1 3 ] [ x y z ] = [ 1 1 ] \begin{aligned} F v &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \\ \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{aligned} Fv[20​11​03​]⎣⎡​xyz​⎦⎤​​=[11​]=[11​]​

Solving for each variable, we find that this is just the line defined by:

x = 1 − t 2 y = t z = 1 − t 3 \begin{aligned} x &= \frac{1-t}{2} \\ y &= t \\ z &= \frac{1-t}{3} \end{aligned} xyz​=2 1−t​=t=3 1−t​​

for some t ∈ R t \in R t∈R.

f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix}1 \\ 1 \end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) is the set of points that f f f maps to [ 1 1 ] \begin{bmatrix} 1 \\ 1 \end{bmatrix} [ 1 1 ​ ] . This is also a line. Notice how similar it is to the line for K K K . is the set of points thatmaps to. This is also a line. Notice how similar it is to the line for

Some valid point are:

v = [ 0 1 0 ] v = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} v=⎣⎡​010​⎦⎤​, v = [ − 3 7 − 2 ] v = \begin{bmatrix} -3 \\ 7 \\ -2 \end{bmatrix} v=⎣⎡​−37−2​⎦⎤​.

This line looks awfully similar to the line for K K K doesn’t it?

Let’s see them both on the same graph. Notice that they’re parallel to each other!

Notice that f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) is parallel to K K K . In other words, f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) is just K K K shifted over. Notice thatis parallel to. In other words,is justshifted over.

Translating the Kernel

So what’s the relation between the two lines we plotted above - f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) f−1([11​]) and K = f − 1 ( [ 0 0 ] ) K = f^{-1}(\begin{bmatrix} 0 \\ 0 \end{bmatrix}) K=f−1([00​])?

f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) f−1([11​]) is just a translation of K K K.

It is a translation by any vector v ∈ f − 1 ( [ 1 1 ] ) v \in f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) v∈f−1([11​]).

Adding K K K to v v v gives us the full pre-image, f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) . Addingtogives us the full pre-image,

Or said another way,

f − 1 ( [ 1 1 ] ) = v + K , for any v ∈ f − 1 ( [ 1 1 ] ) \begin{aligned} f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix}) = v + K & \text{, for any $v \in f^{-1}(\begin{bmatrix}1 \\ 1\end{bmatrix})$} \\ \end{aligned} f − 1 ( [ 1 1 ​ ] ) = v + K ​ , for any v ∈ f − 1 ( [ 1 1 ​ ] ) ​

This seems kind of too good to be true. Is it? Let’s test it out!

Let’s take a point k ∈ K k \in K k∈K. For instance, k = [ − 3 6 2 ] k = \begin{bmatrix} -3 \\ 6 \\ 2 \end{bmatrix} k=⎣⎡​−362​⎦⎤​. Let’s take a v ∈ f − 1 ( [ 1 1 ] ) v \in f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) v∈f−1([11​]) like v = [ 0 1 0 ] v = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} v=⎣⎡​010​⎦⎤​.

What’s f ( k + v ) f(k + v) f(k+v)?

f ( k + v ) = f ( [ − 3 6 2 ] + [ 0 1 0 ] ) = f ( [ − 3 7 2 ] ) = F [ − 3 7 2 ] = [ 2 1 0 0 1 3 ] [ − 3 7 2 ] = [ 1 1 ] \begin{aligned} f(k + v) &= f(\begin{bmatrix} -3 \\ 6 \\ 2 \end{bmatrix} + \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix}) \\ &= f(\begin{bmatrix} -3 \\ 7 \\ 2 \end{bmatrix}) \\ &= F \begin{bmatrix} -3 \\ 7 \\ 2 \end{bmatrix} \\ &= \begin{bmatrix} 2 & 1 & 0 \\ 0 & 1 & 3 \end{bmatrix} \begin{bmatrix} -3 \\ 7 \\ 2 \end{bmatrix} \\ &= \begin{bmatrix} 1 \\ 1 \end{bmatrix} \end{aligned} f(k+v)​=f(⎣⎡​−362​⎦⎤​+⎣⎡​010​⎦⎤​)=f(⎣⎡​−372​⎦⎤​)=F⎣⎡​−372​⎦⎤​=[20​11​03​]⎣⎡​−372​⎦⎤​=[11​]​

So it is indeed the case here that f ( k + v ) f(k+v) f(k+v) is [ 1 1 ] \begin{bmatrix}1 \\1 \end{bmatrix} [11​]!

All Translations of the Kernel are Pre-Images

Ok there’s something kind of mind blowing going on here:

We took one point in f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) . We added K K K to it. And suddenly we got ALL of f − 1 ( [ 1 1 ] ) f^{-1}(\begin{bmatrix} 1 \\ 1 \end{bmatrix}) f − 1 ( [ 1 1 ​ ] ) !

In fact this is true more generally!

If you give me ANY point that maps to some point f ( v ) f(v) f ( v ) , say v v v , then I can find ALL the points that map to f ( v ) f(v) f ( v ) by just adding v + K v+K v + K .

Breaking Down Why

Let’s break the above statement down into two parts.

First, we’re saying that given some v v v , all points in the set v + K v+K v + K will map to the same place as v v v (i.e. f ( v ) = f ( v + K ) f(v) = f(v+K) f ( v ) = f ( v + K ) ). Next, these are ALL the points that map to f ( v ) f(v) f ( v ) . Or, every point that maps to f ( v ) f(v) f ( v ) must be in the set v + K v+K v + K .

The first statement (left hand side) simply states that all points on the line v + K v+K v + K must map to f ( v ) f(v) f ( v ) . The second statement says that if a point maps to f ( v ) f(v) f ( v ) , like A A A , and B B B above, then they must also fall on the line v + K v+K v + K . The first statement (left hand side) simply states that all points on the linemust map to. The second statement says that if a point maps to, like, andabove, then they must also fall on the line

Let’s prove each of the above statements more formally, starting with the first.

1. all points in the set v + K v+K v + K will map to the same place as v v v

A more formal way of saying this is:

f ( v + K ) = f ( v ) , for any v ∈ R 3 f(v+K) = f(v) \text{, for any } v \in R^3 f(v+K)=f(v), for any v∈R3

Let’s break down why this is true. Take any k ∈ K k \in K k∈K (in the kernel). Then,

f ( v + k ) = f ( v ) + f ( k ) Since is a linear function f f ( v + k ) = f ( v ) + 0 Since k ∈ K f ( v + k ) = f ( v ) \begin{aligned} f(v+k) &= f(v) + f(k) & \text{Since $f$ is a linear function} \\ f(v+k) &= f(v) + 0 & \text{Since } k \in K\\ f(v+k) &= f(v) \end{aligned} f(v+k)f(v+k)f(v+k)​=f(v)+f(k)=f(v)+0=f(v)​Since f is a linear functionSince k∈K​

The below video shows this visually.

By decomposing v + k v+k v + k into v v v , and k k k , we see that f ( v + k ) = f ( v ) f(v+k) = f(v) f ( v + k ) = f ( v ) .

Additionally, given this is true for some v + k v+k v+k, this is true for all points on the line v + K v+K v+K. The reason is that the different amounts of K K K all contribute nothing different and it’s only the value of v v v that matters to f f f. This is shown below:

Any point in K K K , such as k k k , k ′ k' k ′ , and k ′ ′ k'' k ′ ′ , does not change the result of f f f . Hence, f ( v + K ) = f ( v ) f(v+K) = f(v) f ( v + K ) = f ( v ) .

Let’s now move to the next statement.

2. Every point that maps to f ( v ) f(v) f ( v ) must be in the set v + K v+K v + K

Essentially, this is saying that there can be no point w w w such that w w w maps to f ( v ) f(v) f(v) but is not in v + K v+K v+K.

Let’s prove this.

Choose any v , w v, w v,w such that f ( v ) = v ′ f(v) = v' f(v)=v′ and f ( w ) = v ′ f(w) = v' f(w)=v′. We wish to show that w ∈ v + K w \in v+K w∈v+K.

Let w ′ = w − v w' = w - v w ′ = w − v . Then f ( w ′ ) = f ( w − v ) = f ( w ) − f ( v ) = 0 f(w') = f(w - v) = f(w) - f(v) = 0 f ( w ′ ) = f ( w − v ) = f ( w ) − f ( v ) = 0 . Hence w ′ ∈ K w' \in K w ′ ∈ K (as all points that map to 0 0 0 are in K K K ). Thus, v + w ′ ∈ v + K v + w' \in v +K v + w ′ ∈ v + K . Since v + w ′ = v + w − v = w v+w' = v + w - v = w v + w ′ = v + w − v = w , w ∈ v + K w \in v +K w ∈ v + K .

So we’ve successfully proved our two points!

The Relation Between Translations of K K K and Points in the Image

We’ve already seen something really cool - every translation of K K K is the full pre-image of a point in f f f.

Now is there any relation between how far apart two translations of K K K are (say v + K v+K v+K and w + K w+K w+K) and how far apart their images are ( f ( v ) f(v) f(v), f ( w ) f(w) f(w))?

Yes!

If v + K v+K v + K and w + K w+K w + K are v ′ v' v ′ apart, their images will be f ( v ′ ) f(v') f ( v ′ ) apart!



If v + K v+K v + K and w + K w+K w + K are v ′ v' v ′ apart, their images will be f ( v ′ ) f(v') f ( v ′ ) apart. Ifandareapart, their images will beapart.

Why is this the case? It again follows pretty simply:

w + K = v + K + v ′ + K Since and are apart w + K v + K v ′ w + K = v + v ′ + K f ( w + K ) = f ( v + v ′ + K ) apply f f ( w ) = f ( v + v ′ ) Since f ( w + K ) = f ( w ) f ( w ) = f ( v ) + f ( v ′ ) Hence f(v’) apart \begin{aligned} w+K &= v+K + v'+K & \text{Since $w+K$ and $v+K$ are $v'$ apart} \\ w+K &= v+v' + K & \text{ } \\ f(w+K) &= f(v+v'+K) & \text{apply $f$} \\ f(w) &= f(v+v') & \text{Since $f(w+K) = f(w)$} \\ f(w) &= f(v) + f(v') & \text{Hence f(v') apart} \end{aligned} w+Kw+Kf(w+K)f(w)f(w)​=v+K+v′+K=v+v′+K=f(v+v′+K)=f(v+v′)=f(v)+f(v′)​Since w+K and v+K are v′ apart apply fSince f(w+K)=f(w)Hence f(v’) apart​

Overall Space

Let’s now take a step back and view what’s happening in the overall space.

Every point in the image can be seen as the image of some translation of K K K. As we move K K K around, we get new points in the image!

Conclusion

We’ve now seen some really cool things that you may not have noticed before:

Every matrix is a linear function and that linear function will have some kernel K K K that maps to 0 0 0 . All pre-images of output points are just going to be translations of K K K . If v ′ v' v ′ is the distance between the translations of K K K , f ( v ′ ) f(v') f ( v ′ ) is the distance between their images.

The last point actually leads us to the first isomorphism theorem of group theory. This broadly states that the relation between the sets of pre-images of a special type of function known as a homomorphism (in our case f f f) is the exact same as the relation between the set of output points (we’ll go into this in the next blog post!).

There are many practical uses of this knowledge but I wanted to share it for simpler reasons. Sometimes math is just pretty - it has all these cool properties that fit together so nicely that you can’t help but enjoy seeing them.

For example:

Who would have thought that all the pre-image sets are just translations of each other? Or that the relation between these pre-image sets mirrors the relation between the points in the image?

I hope you enjoyed getting a taste of some abstract algebra and I’ll see you in the next post!