$\begingroup$

Let $A$ be a real $m \times n$ matrix. I'll assume that $m \geq n$ for simplicity. It's natural to ask in which direction $v$ does $A$ have the most impact (or the most explosiveness, or the most amplifying power). The answer is \begin{align} \tag{1}v_1 = \,\,& \arg \max_{v \in \mathbb R^n} \quad \| A v \|_2 \\ & \text{subject to } \, \|v\|_2 = 1. \end{align} A natural follow-up question is, after $v_1$, what is the next most explosive direction for $A$? The answer is \begin{align} v_2 = \,\,& \arg \max_{v \in \mathbb R^n} \quad \| A v \|_2 \\ & \text{subject to } \,\langle v_1, v \rangle = 0, \\ & \qquad \qquad \, \, \, \, \|v\|_2 = 1. \end{align} Continuing like this, we obtain an orthonormal basis $v_1, \ldots, v_n$ of $\mathbb R^n$. This special basis of $\mathbb R^n$ tells us the directions that are, in some sense, most important for understanding $A$.

Let $\sigma_i = \|A v_i \|_2$ (so $\sigma_i$ quantifies the explosive power of $A$ in the direction $v_i$). Suppose that unit vectors $u_i$ are defined so that $$ \tag{2} A v_i = \sigma_i u_i \quad \text{for } i = 1, \ldots, n. $$ The equations (2) can be expressed concisely using matrix notation as $$ \tag{3} A V = U \Sigma, $$ where $V$ is the $n \times n$ matrix whose $i$th column is $v_i$, $U$ is the $m \times n$ matrix whose $i$th column is $u_i$, and $\Sigma$ is the $n \times n$ diagonal matrix whose $i$th diagonal entry is $\sigma_i$. The matrix $V$ is orthogonal, so we can multiply both sides of (3) by $V^T$ to obtain $$ A = U \Sigma V^T. $$ It might appear that we have now derived the SVD of $A$ with almost zero effort. None of the steps so far have been difficult. However, a crucial piece of the picture is missing -- we do not yet know that the columns of $U$ are pairwise orthogonal.

Here is the crucial fact, the missing piece: it turns out that $A v_1$ is orthogonal to $A v_2$: $$ \tag{4} \langle A v_1, A v_2 \rangle = 0. $$ I claim that if this were not true, then $v_1$ would not be optimal for problem (1). Indeed, if (4) were not satisfied, then it would be possible to improve $v_1$ by perturbing it a bit in the direction $v_2$.

Suppose (for a contradiction) that (4) is not satisfied. If $v_1$ is perturbed slightly in the orthogonal direction $v_2$, the norm of $v_1$ does not change (or at least, the change in the norm of $v_1$ is negligible). When I walk on the surface of the earth, my distance from the center of the earth does not change. However, when $v_1$ is perturbed in the direction $v_2$, the vector $A v_1$ is perturbed in the non-orthogonal direction $A v_2$, and so the change in the norm of $A v_1$ is non-negligible. The norm of $A v_1$ can be increased by a non-negligible amount. This means that $v_1$ is not optimal for problem (1), which is a contradiction. I love this argument because: 1) the intuition is very clear; 2) the intuition can be converted directly into a rigorous proof.

A similar argument shows that $A v_3$ is orthogonal to both $A v_1$ and $A v_2$, and so on. The vectors $A v_1, \ldots, A v_n$ are pairwise orthogonal. This means that the unit vectors $u_1, \ldots, u_n$ can be chosen to be pairwise orthogonal, which means the matrix $U$ above is an orthogonal matrix. This completes our discovery of the SVD.

To convert the above intuitive argument into a rigorous proof, we must confront the fact that if $v_1$ is perturbed in the direction $v_2$, the perturbed vector $$ \tilde v_1 = v_1 + \epsilon v_2 $$ is not truly a unit vector. (Its norm is $\sqrt{1 + \epsilon^2}$.) To obtain a rigorous proof, define $$ \bar v_1(\epsilon) = \sqrt{1 - \epsilon^2} v_1 + \epsilon v_2. $$ The vector $\bar v_1(\epsilon)$ is truly a unit vector. But as you can easily show, if (4) is not satisfied, then for sufficiently small values of $\epsilon$ we have $$ f(\epsilon) = \| A \bar v_1(\epsilon) \|_2^2 > \| A v_1 \|_2^2 $$ (assuming that the sign of $\epsilon$ is chosen correctly). To show this, just check that $f'(0)

eq 0$. This means that $v_1$ is not optimal for problem (1), which is a contradiction.

(By the way, I recommend reading Qiaochu Yuan's explanation of the SVD here. In particular, take a look at "Key lemma # 1", which is what we discussed above. As Qiaochu says, key lemma # 1 is "the technical heart of singular value decomposition".)