January 09, 2018 at 05:15 Tags Math

This is a brief article on affine mappings and their relation to linear mappings, with some applications.

Linear vs. Affine To start discussing affine mappings, we have to first address a common confusion around what it means for a function to be linear. According to Wikipedia the term linear function can refer to two distinct concepts, based on the context: In Calculus, a linear function is a polynomial function of degree zero or one; in other words, a function of the form for some constants a and b . In Linear Algebra, a linear function is a linear mapping, or linear transformation. In this article we're going to be using (2) as the definition of linear, and it will soon become obvious why (1) is confusing when talking about transformations. To avoid some of the jumble going forward, I'm goine to be using the term mapping instead of function, but in linear algebra the two are interchangeable (transformation is another synonym, which I'm going to be making less effort to avoid since it's not as overloaded ).

Linear transformations Since we're talking about linear algebra, let's use the domain of vector spaces for the definitions. A transformation (or mapping) f is linear when for any two vectors and \vec{w} (assuming the vectors are in the same vector space, say ): f(\vec{v}+\vec{w})=f(\vec{v})+f(\vec{w})

f(k\vec{v})=kf(\vec{v}) for some scalar k For example, the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle - where v_1 and v_2 are the components of - is linear. The mapping g(\vec{v})=\langle v_2,2v_{1}v_{2} \rangle is not linear. In fact, it can be shown that for the kind of vector spaces we're mostly interested in , any linear mapping can be represented by a matrix that is multiplied by the input vector. This is because we can represent any vector in terms of the standard basis vectors: \vec{v}=v_1\vec{e}_1+...+v_n\vec{e}_n . Then, since f is linear: \[f(\vec{v})=f(\sum_{i=1}^{n}v_i\vec{e}_i)=\sum_{i=1}^{n}v_if(\vec{e}_i)\] If we think of f(\vec{e}_i) as column vectors, this is precisely the multiplication of a matrix by : \[f(\vec{v}) = \begin{pmatrix} \mid & \mid & & \mid \\ f(\vec{e}_1) & f(\vec{e}_2) & \cdots & f(\vec{e}_n) \\ \mid & \mid & & \mid \\ \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \\ ... \\ v_n \end{pmatrix}\] This multiplication by a matrix can also be seen as a change of basis for from the standard base to a base defined by f . If you want a refresher on how changes of basis work, take a look at my older post on this topic. Let's get back to our earlier example of the mapping f(\vec{v})=\langle 3v_1-4v_2,v_2 \rangle . We can represent this mapping with the following matrix: \[\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\] Meaning that: \[f(\vec{v})=\begin{pmatrix} 3 & -4 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} v_1 \\ v_2 \end{pmatrix}\] Representing linear mappings this way gives us a number of interesting tools for working with them. For example, the associativity of matrix multiplication means that we can represent compositions of mappings by simply multiplying the mapping matrices together. Consider the following mapping: \[S=\begin{pmatrix} 2 & 0\\ 0 & 2 \end{pmatrix}\] In equational form: S(\vec{v})=\langle 2v_1,2v_2 \rangle . This mapping stretches the input vector 2x in both dimensions. To visualize a mapping, it's useful to examine its effects on some standard vectors. Let's use the vectors (0,0) , (0,1) , (1,0) , (1,1) (the "unit square"). In they represent four points that can be connected together as follows : It's easy to see that when transformed with S , we'll get: It's also well known that rotation (relative to the origin) can be modeled with the following mapping with in radians: \[R=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\] Transforming our unit square with this matrix we get: Finally, let's say we want to combine these transformations. To stretch and then rotate a vector, we would do: f(\vec{v})=R(Sv) . Since matrix multiplication is associative, this can also be rewritten as: f(\vec{v})=(RS)v . In other words, we can find a matrix A=RS which represents the combined transformation, and we "find" it by simply multiplying R and S together : \[A=\begin{pmatrix} cos\theta & sin\theta \\ -sin\theta & cos\theta \end{pmatrix}\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta \\ -2sin\theta & 2cos\theta \end{pmatrix}\] And when we multiply our unit by this matrix we get:

Affine transformations Now that we have some good context on linear transformations, it's time to get to the main topic of this post - affine transformations. For an affine space (we'll talk about what this is exactly in a later section), every affine transformation is of the form g(\vec{v})=Av+b where is a matrix representing a linear transformation and b is a vector. In other words, an affine transformation combines a linear transformation with a translation. Quite obviously, every linear transformation is affine (just set b to the zero vector). However, not every affine transformation is linear. For a non-zero b , the linearity rules don't check out. Let's say that: \[\begin{align*} f(\vec{v})&=A\vec{v}+\vec{b} \\ f(\vec{w})&=A\vec{w}+\vec{b} \end{align*}\] Then if we try to add these together, we get: \[f(\vec{v}+\vec{w})=A(\vec{v}+\vec{w})+\vec{b}\] Whereas: \[f(\vec{v})+f(\vec{w})=A\vec{v}+b+A\vec{w}+b=A(\vec{v}+\vec{w})+2b\] The violation of the scalar multiplication rule can be checked similarly. Let's examine the affine transformation that stretches a vector by a factor of two (similarly to the S transformation we've discussed before) and translates it by 0.5 for both dimensions: \[f(\vec{v})=\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}\vec{v}+\begin{pmatrix} 0.5 \\ 0.5\end{pmatrix}\] Here is this transformation visualized: With some clever augmentation, we can represent affine transformations as a multiplication by a single matrix, if we add another dimension to the vectors : \[f(\vec{v})=T\vec{v}=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1 \\ v_2 \\ 1 \end{pmatrix}\] The translation vector is tacked on the right-hand side of the transform matrix, with a 1 for the extra dimension (the matrix gets 0s in that dimension). The result will always have a 1 in the final dimension, which we can ignore. Affine transforms can be composed similarly to linear transforms, using matrix multiplication. This also makes them associative. As an example, let's compose the scaling+translation transform discussed most recently with the rotation transform mentioned earlier. This is the augmented matrix for the rotation: \[R=\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}\] The composed transform will be f(\vec{v})=T(R(\vec{v}))=(TR)\vec{v} . Its matrix is: \[TR=\begin{pmatrix} 2 & 0 & 0.5 \\ 0 & 2 & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} cos\theta & sin\theta & 0 \\ -sin\theta & cos\theta & 0 \\ 0 & 0 & 1 \end{pmatrix}=\begin{pmatrix} 2cos\theta & 2sin\theta & 0.5 \\ -2sin\theta & 2cos\theta & 0.5 \\ 0 & 0 & 1 \end{pmatrix}\] The visualization is:

Affine subspaces The previous section defined affine transformation w.r.t. the concept of affine space, and now it's time to pay the rigor debt. According to Wikipedia, an affine space: ... is a geometric structure that generalizes the properties of Euclidean spaces in such a way that these are independent of the concepts of distance and measure of angles, keeping only the properties related to parallelism and ratio of lengths for parallel line segments. Since we've been using vectors and vector spaces so far in the article, let's see the relation between vector spaces and affine spaces. The best explanation I found online is the following. Consider the vector space , with two lines: The blue line can be seen as a vector subspace (also known as linear subspace) of . On the other hand, the green line is not a vector subspace because it doesn't contain the zero vector. The green line is an affine subspace. This leads us to a definition: A subset U \subset V of a vector space is an affine space if there exists a u \in U such that U - u = \{x-u \mid x \in U\} is a vector subspace of . If you recall the definition of affine transformations from earlier on, this should seem familiar - linear and affine subspaces are related by using a translation vector. It can also be said that an affine space is a generalization of a linear space, in that it doesn't require a specific origin point. From Wikipedia, again: Any vector space may be considered as an affine space, and this amounts to forgetting the special role played by the zero vector. In this case, the elements of the vector space may be viewed either as points of the affine space or as displacement vectors or translations. When considered as a point, the zero vector is called the origin. Adding a fixed vector to the elements of a linear subspace of a vector space produces an affine subspace. One commonly says that this affine subspace has been obtained by translating (away from the origin) the linear subspace by the translation vector. When mathematicians define new algebraic structures, they don't do it just for fun (well, sometimes they do) but because such structures have some properties which can lead to useful generalizations. Affine spaces and transformations also have interesting properties, which make them useful. For example, an affine transformation always maps a line to a line (and not to, say, a parabola). Any two triangles can be converted one to the other using an affine transform, and so on. This leads to interesting applications in computational geometry and 3D graphics.

Affine functions in linear regression and neural networks Here I want to touch upon the linear vs. affine confusion again, in the context of machine learning. Recall that Linear Regression attempts to fit a line onto data in an optimal way, the line being defined as the function: \[y(x) = mx + b\] But as this article explained, y(x) is not actually a linear function; it's an affine function (because of the constant factor b ). Should linear regression be renamed to affine regression? It's probably too late for that :-), but it's good to get the terminology right. Similarly, a single fully connected layer in a neural network is often expressed mathematically as: \[y(\vec{x})=W\vec{x}+\vec{b}\] Where \vec{x} is the input vector, is the weight matrix and \vec{b} is the bias vector. This function is also usually referred to as linear although it's actually affine.