Kurtosis and Skewness are very close relatives of the “data normalized statistical moment” family – Kurtosis being the fourth and Skewness the third moment, and yet they are often used to detect very different phenomena in data. At the same time, it is typically recommendable to analyse the outputs of both together to gather more insight and understand the nature of the data better.

By Pawel Rzeszucinski, Codewise.com

Descriptive Statistics can provide great amount of insight about data, however it often lays interesting pitfalls in front of us, sometimes causing misinterpretation of the results. One way of mitigating such risks, is to use a combination of more than one technique to reach an unambiguous conclusion. Today we will see how Kurtosis outputs can be supplemented by Skewness in tackling a very interesting challenge.

Introduction



In one of my previous posts we saw that Kurtosis is as a robust metric for detection of impulsive content within the data, however “an impulse” can have many faces and Kurtosis does not always seem to be drawing the full picture. In the case study described below I will show how to add one additional ‘brush’, that goes by the name of Skewness, to the image painting process.

Case Study



The scenario is as follows - a shop recording the number of sold goods as a function of time tries to automatically detect the presence of any abnormal demand.

In the previous year (previous post), the Kurtosis was used to detect the impulse with a great success. Figure 1 shows the data for which the value of Kurtosis was 6.227 - clearly above 3 which is default for Gaussian noise. Impulsive content was detected, great! After some time however, a curious Business Analyst detected a somewhat puzzling case, depicted in Figure 2. Even though the nature of the signal changed quite dramatically – the impulse has only the upward-facing part, Kurtosis returned precisely the same value, 6.227 (it actually took me half a dozen tries to synthesize such a signal).



Figure 1



Figure 2

The amplitude distribution of signals shown in Figure 1 and 2 can be seen in Figure 3 and 4 respectively. There is quite a noticeable change to be noticed. Figure 3 shows an almost identically symmetrical distribution, whereas Figure 4 shows a shape which leans towards the left-hand side of the plot. Side note: despite the left-side leaning, such distribution shape is referred to as a right-skewed distribution, because we are really interested in the relative movement of the mean value. In our case it definitely shifted towards the right side, due to the presence of the prominent impulse.

At first shocked, the Business Analyst quickly discovered what was going on. He referred to the formula of Kurtosis (to be seen in Eq. 1) and noted that all the powers in the equation are even numbers and so Kurtosis may be blind to the differentiation between the ‘above mean’ and ‘below mean’ values. Sufficiently greater impulse in just one direction (as in Figure 2) may happily produce the same result as a symmetrical impulse (Figure 1).



Figure 3



Figure 4

This is where the helping hand of the Skewness comes into play. Its formula can be seen in Eq. 1 [1]:

Where n is the total number of samples in the data, xi is the ith sample within the data and x is the sample mean of the data.

Skewness formula is virtually identical to formula of Kurtosis apart from the powers in numerator and denominator and now the distinction between the ‘above mean’ and ‘below mean’ values becomes possible. Skewness outputs values which are close to 0 for symmetrically distributed signals, values between 0 and 1 for right-skewed (aka positively skewed) signals, and values between 0 and -1 for right-skewed (aka negatively skewed) signals. The shapes of such distributions together with their corresponding relation of mean, median and mode are shown in Figure 5 (taken from [2]). When applied on signals from Figure 1 and 2, Skewness values are 0.06 and 0.58 respectively. At this point, the Business Analyst will always use Kurtosis with conjunction with Skewness values to be able to not only detect the presence of impulses but also determine the direction of their attack.



Figure 5 taken from [2]

References:

[1] Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing, Princeton University Press, 2008

[2] Ken Black, Business Statistics: Contemporary Decision Making, John Wiley & Sons, 2009



Bio: Pawel Rzeszucinski received MSc in Computer Science from Cranfield University and MSc in Electronics from Wroclaw University of Technology. He subsequently moved to The University of Manchester where he obtained PhD on project sponsored by QinetiQ related to data analytics for helicopter gearbox diagnostics. Upon returning to Poland he worked as a Senior Scientist at ABB’s Corporate Research Center and a Senior Risk Modeler in Strategic Analytics at HSBC. Currently he is a Data Scientist at Codewise.

Related: