Jensen's Inequality states that given a convex function \(g\) then $$E[g(X)] \geq g(E[X])$$

And since \(f(x) = x^2\) is a convex function this means that:$$E[X^2] \geq E[X]^2$$

Why does this matter? Well it means that because \(E[X^2]\) is always greater than or equal to \(E[X]^2\) that their difference can never be less than 0! This corresponds very well to our intuitive sense of what we mean by "variance", after all what would negative variance mean? It is also conviently the case that the only time \(E[X^2] = E[X]^2\) is when the Random Variable \(X\) is a constant (ie there is literally no variance).

Jensen's inequality provides with a sort of minimum viable reason for using \(X^2\). \(X^2\) can't be less then zero and increases with the degree to which the values of a Random Variable vary. In mathematics it is fairly common that something will be defined by a function merely becasue the function behaves the way we want it to. But it turns out there is an even deeper reason why we used squared and not another convex function.

Other measures of a distribution

There are a few other useful measurements of a probability distribution that we're going to look at that should help us to understand why we would choose \(x^2\). Before we dive into them let's review another way we can define variance. We used the definition \(Var(x) = E[X^2] - E[X]^2\) because it is very simple to read, it was useful in building out a Covariance and Correlation, and now it has made Variance's relationship to Jensen's Inequality very clear. In a previous post we demonstrated that Variance can also be defined as$$Var(X) = E[(X -\mu)^2]$$ It turns out that this definition will provide more insight as we explore Skewness and Kurtosis.

Skewness

Skewness defines how much a distribution is shifted in a certain direction. The mathematical definition of Skewness is $$\text{skewness} = E[(\frac{X -\mu}{\sigma})^3]$$ Where \(\sigma\) is our common definition of Standard Deviation \(\sigma = \sqrt{\text{Var(X)}}\).

The Normal Distribution has a Skewness of 0, as we can clearly see it is equally distributed around each side.