$\begingroup$

Here is a slightly less ad-hoc approach to deriving the formula.

You look at the polynomial $ax^2+bx+c$ and you think of it as being composed of two kinds of indeterminates: coefficients $a$,$b$,$c$, and variable $x$. What you wish to do is if $ax^2+bx+c=a(x-r_1)(x-r_2)$ you want find an expression for $r_1$ and $r_2$ in terms of $a,b,c$ involving only the operations $+,-,\times,\div$ and $\sqrt[n]{}$.

But how are $r_1$ and $r_2$ related to $a,b$ and $c$? If you look at the expression $ax^2+bx+c=a(x-r_1)(x-r_2)$, it is easy to compute that $b=-a(r_1+r_2)$ and $c=ar_1r_2$.

Intuitively because you know that $(r_1+r_2)=-\frac ba$, determining $r_1$ and $r_2$ is the same as determining $(r_1-r_2)$. Let $E=(r_1-r_2)$ and note that $2r_1=(r_1+r_2)+(r_1-r_2)=-\frac ba+E=$ and $2r_2=(r_1+r_2)-(r_1-r_2)=-\frac ba-E$, so we already have most of our quadratic formula: $$r_1,r_2=\frac{-b}{2a}\pm\frac{E}2$$

All we need to do then, is express $E=(r_1-r_2)$ using $+,-,\times,\div,\sqrt[n]{}$ in terms of $a,b,c$. In order to do this, we need to take a small detour to see what expressions in $+,-,\times,\div$ and $a,b,c$ could possible be.

Note that the coefficients $b=-a(r_1+r_2)$ and $c=r_1r_2$ are symmetric functions in $r_1$ and $r_2$ in the sense that if you exchange $r_1$ with $r_2$ for each other, the values of $b$ and $c$ do not change. Furthermore, $b$ and $c$ are in fact scalar multiples of the so-called elementary symmetric functions, which have the property that any symmetric function (in $2$ variables) can be expressed uniquely as a polynomial (quotient of polynomials for our purposes) in them.

In particular, we can "symmetrize" the quantity $E=(r_1-r_2)$ to obtain the discriminant $D=(r_1-r_2)^2$ which is in some sense "the smallest" symmetric function of $r_1$ and $r_2$ that becomes 0 if $r_1=r_2$. Technically, though, the above is the discriminant only when $a=1$ because our coefficients $b$ and $c$ are elementary symmetric functions scaled by $a$, so we define the general discriminant to be $D=a^2(r_1-r_2)^2$. Because $D$ is symmetric and $b$ and $c$ are (up to a multiplicative factor) elementary symmetric, we should be able to express $D$ as a polynomial in $b$ and $c$.

We do so in a somewhat ad-hoc matter (though there are algorithms that will do this procedurally): $$D=a^2(r_1-r_2)^2$$ so $$D=a^2(r_1^2-2r_1r_2+r_2^2)$$ hence $$D=a^2(r_1^2+2r_1r_2+r_2^2-4r_1r_2)$$ and finally $$D=a^2(r_1+r_2)^2-a^24r_1r_2$$ giving us $$D=b^2-4ac$$

Evidently, now we have that $\sqrt{D}=a(r_1-r_2)=aE$ and so $E=\frac{\sqrt{D}}a$. This allows us to rewrite our formula so far to get from $$r_1,r_2=\frac{-b}{2a}\pm\frac{E}{2}$$ to $$r_1,r_2=\frac{-b}{2a}\pm\frac{\sqrt{D}}{2a}$$ and finally $$r_1,r_2=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$$

The only strange question is: why did we only have to take one square root in order to get the formula, i.e. why did the quantity $E=(r_1-r_2)$ turn out to be a square root of a nice polynomial in $a,b,c$? That is where modern Galois theory comes in.

What's really happening is this: the first four suggest that you think of the coefficients as living in the field $F$ (a set of expressions such that adding, subtracting, multiplying, or dividing any two of them gives another expression in the set) consisting of $\{\dfrac {p(a,b,c)}{q(a,b,c)}\}$ where $p$ and $q$ are polynomials in three variables (and rational coefficients). Then $r_1$ and $r_2$ will generate an extension field $E$ of $F$, that is, the smallest field $E$ that contains $F$ and also $r_1$ and $r_2$. Galois theory says that this extension field $E$ will be a $2!=2$-dimensional vector space over $F$ and hence a single square root will be sufficient to generate $E$. Thus we need an expression in the coefficients (symmetric expression in the roots) whose square root is an expression in the roots, but not symmetric, and a natural choice then is the most elementary anti-symmetric function known as the Vandermonde determinant which is precisely $(r_1-r_2)$ in this case (anti-symmetric=swapping two variables flips the sign, obviously the square of an anti-symmetric function is a symmetric function).

For general polynomials, the extension field will be of higher dimension, and so you will need to take possibly several roots of different orders. Galois theory allows us to compute what these roots ought to be and in what order (giving us the cubic and quartic formulas in a way that is not ad-hoc at all), and also shows that the general degree $5$ and above polynomial does not have a formula involving only $+,-,\times,\div,\sqrt[n]{}$. (Some people feel frightened by this, because taking roots should invert the raising of powers, but this is not the case because the order of operations matters...) Now, if the coefficients of the higher degree polynomial satisfy some additional relations (i.e. are not completely independent from each other), then Galois theory also gives procedures for computing formulas for those cases and also for determining what such relations ought to be.