Typically, we talk about the distribution of assets. We accept that there are many possible futures and our asset of choice may take a range of values. To help us understand and summarize the distribution of outcomes, we mostly talk about the expected return of an asset or strategy, and then secondarily the variance or standard deviation. These give us a primitive but concise way of talking about the average outcome and how frequently we end up with an outcome near that average.

The mean return is 0.26%, which, is well, positive expected value. But observe the left tail. Not all return distributions are normal!

In a world where the distribution of returns (or the distribution of log returns) is Gaussian (Normal), we only need talk about the mean and variance of our outcomes. Such an assumption, while clearly not fully true, is powerful as our two summary statistics of choice, mean and variance, fully describe the distribution.

This is great for a few reasons:

We can communicate with each other effectively. If I tell you that a distribution is Normal and give you its mean and standard deviation, you know exactly which distribution I’m describing.

It’s simple. Gaussians are mathematically nice to work with. Very often in quantitative finance, we choose simple even if naive models.

Finally, we make some remarks on why linear systems are so important. The answer is simple: because we can solve them! — Richard Feynman

That said, while the normality assumption isn’t crazy (because of the CLT), one should take special care especially when trading derivatives which often show amplified sensitivity to the inadequacies of characterizing outcomes with a Gaussian. Skewness and kurtosis, the third and fourth moments of a distribution, may have significant impact.

We’re going to step through the math of Kelly as it pertains to continuous distributions; while most treatments jump straight to the normally-distributed world, we’ll take a bit of care to point out how things may change in a non-normal world.

Throughout our discussion we use C³ and C⁴ to refer to skewness and kurtosis — this denotes that they are the 3rd and 4th central moment about the mean. C¹ would be the distribution’s mean, with C² being its variance.

As a refresher, the skewness of a sample is

For skewness, we take each sample, subtract the mean, divide by the standard deviation, and cube it.

Similarly, kurtosis of a sample is

As n->infinity, the bottom approximation is quite good

The following two plots demonstrate the effect of varying skewness and kurtosis levels on outcomes.

Observe that as skewness increases, the distribution ‘leans’ left. The distribution would lean right if skewness was negative.

Increased kurtosis leads to increased “peakedness”

When evaluating the performance of an asset, and accordingly how much one should invest, it’s often worthwhile to look at these higher moments when considering the risk-reward of the opportunity.

Readers less interested in the math can skip the derivation; it’s a bit more explicit than is typical in the literature, but well, still, math.

Derivation

We’ll begin with a brief notation refresher:

Lowercase f is the fraction invested

p and q represent probability of win and probability of loss, respectively

Lowercase g is growth rate

In our first entry on the series, we determined that in an even-odds binary game, the growth rate of our capital is

Extending this to where the return for winning and losing is asymmetrical

If we relabel our variables slightly, where we’re going gets more obvious

This can be expressed as a sum that lets us extend beyond the binary-only game

Expressing as an expectation

This should be intuitive; our growth rate is a function of the expected log gains combined with our fraction invested. To extend to the continuous case, we replace the sum in (4) with an integral

One should be a bit careful with the above notation; the growth rate is not benchmarked to any reference rate. The r in the above is the excess return rate of our asset or strategy (the return minus LIBOR or Treasury rates).

We seek to maximize growth so once again we derive and set to zero

Note we derive w.r.t f

Here, p(r) describes the return distribution of our excess return rate. This is empirical in markets; we do not know the true distribution of an asset or a strategy — we estimate it from data.

The trick to the derivation (thanks Wolfram Alpha)¹ is in the Taylor expansion of the 1 + fr term; we can approximate 1 / (1+fr) with

We can expand the right hand side of (7)

If we assume that the returns r follow a normal distribution, we can substitute in the following

Where C³ and C⁴ are the 3rd and 4th moments of our returns distribution

Afterwards, we’re left with

If we truncate after the first two Taylor terms and solve for the optimal Kelly fraction f

This is the fraction invested that maximizes long term growth if we ignore skewness and kurtosis

We can expand out (6) up above with a Taylor approximation

Plugging f back in from (11), we can solve for the growth rate at f (dropping again here the 3rd and 4th moments)

This growth is ONLY true ignoring skew and kurtosis effects!

Summing it up

The Kelly fraction we arrived at in (11) is often truncated to the approximation

This is generally reasonable because the mean is expected to be << the variance

This approximation is quite close in the reasonable case where the expected mean profit per trade is low