This tutorial will teach you how you can extract valuable information from time series, such as your sold copies on Steam or your Google Analytics. The previous part of this series introduced a technique called moving average, which has been used to attenuate the effects of noise in a signal. When signals represent an event that evolves over time, we are in front of a time series. Classical decomposition is a technique that attempts to find the main trends within time series.

Introduction

Even when it’s highly effective, moving average is completely agnostic to the signal that is filtering. We can achieve much better results if we understand the process that is generating our data, and which components contribute to its final shape. Let’s think for a second about a game on Steam. The following chart shows the number of copies activated each day, for a hypothetical game:

Time Series (De)Composition

Time series that represents sold copies naturally contains several components. The trend component reflects the long-term progression of the sales. It’s a function of how effective your marketing campaign is, and it indicates how well your game is doing. This trend is perturbed by several other effects which contributes to the buying behaviours of players, but on a shorter term scale. Seasonal sales, monthly discounts and even which day of the week is it; all those events have a quantifiable, cyclic effect on the data and therefore make the seasonal component, , of the time series. Finally, what’s left is assumed to be caused by random, non periodic events and is called irregular (or residual) component, . If a small YouTuber covers your game, it can boost your sales for a day or two, but is unlikely to have any long term consequence. Those uncorrelated, acylic and low-impact events add to the “noise” of the time series.

The sales chart shown in the previous section was indeed generated as the sum of these three components:

The seasonal cycle has a length of 30 days; it is reasonable to assume that every month has a similar influence on the sold copies. Knowing the length of a cycle will be essential to decompose our time series in its basic components.

Trend Component Estimation

As described in the previous part of this tutorial, An Introduction to Signal Smoothing, a first possible step to highlight the true trend of the data is to use moving average. One of the assumption is that the data contained a 30-day seasonal cycle. If that is the case, we should choose a window that covers those 30 days entirely. Since 30 is an even number, is the technique that should be used. For all other odd numbers, standard is sufficient.

This produces a new time series, which we call . In an ideal scenario, , but this is extremely unlikely. Removing from the original data produces what is called a detrended series.

Seasonal Component Estimation

The effectiveness of on depends on how true our assumptions are. If there is indeed a seasonal cycle of 30 days within our data, we can now extract it easily. What we have to do is simply split our data in chunks of 30 days each, and average each day across all 12 months. This produces an average 30 days cycle, called . In coding terms:

float [] season = new float [30]; for (int day = 0; day < 30; day ++) { // Averages across all months float sum = 0; for (int month = 0; month < 12; month ++) sum += detrended[month*30 + day]; season[day] = sum / 30; } 1 2 3 4 5 6 7 8 9 float [ ] season = new float [ 30 ] ; for ( int day = 0 ; day < 30 ; day ++ ) { // Averages across all months float sum = 0 ; for ( int month = 0 ; month < 12 ; month ++ ) sum += detrended [ month * 30 + day ] ; season [ day ] = sum / 30 ; }

We can now replicate those ideal 30 days 12 times, to reconstruct the seasonal component of the time series .

If you look at our Steam sales toy example, you can see that the seasonal component is a sinusoid. This means that it sums up to zero. The assumption under which moving average removes noise is that it must sum up to zero. This is rarely the case, since most seasonal cycles sums up to a positive quantity. This is the stage in which we can check whether our assumption is correct or not. What we have to do is to sum up all the days in a month, to see whether or not the zero-sum property yields:

where are the weights used in the moving average pass. If is not zero, we have injected into our trend a constant value. The next step is to revise :

Irregular Component Estimation

The last step is to extract all the components from , which leaves us with an estimation of the irregular component:

The results of this analysis can be seen in the graph below. The original components are shown with a dotted line for reference:

Summary

Assumptions

The time series can be decomposed in trend, seansonal and irregular components:

Seasonal component has known period .

For instance, if we have a monthly cycle.

has known period . For instance, if we have a monthly cycle. The seasonal component is repeated times, meaning is composed of observations.

For instance, if we have data for twelve months.

times, meaning is composed of observations. For instance, if we have data for twelve months. There is one entry for each day.

Inputs

: the original time series;

: the original time series; : the lenght of the seasonal cycle;

: the lenght of the seasonal cycle; : the number of cycles in the data.

Output

: estimation for ;

: estimation for ; : estimation for ;

: estimation for ; : estimation for .

Procedure

Smooth using moving average to find the first approximation of the trend component, .

If m is odd, use , otherwise : Calculate the detrended series : Calculate a single seasonal cycle (of lenght ) by averaging out the data across all available repetitions of the cycle: Calculate the averaged sum a single seasonal cycle, using the same weights used in moving average of step 1: Calculate a better estimation for the trended component: Calculate the seasonal component by concatenating for times: Calculate the irregular component:

Conclusion

This tutorial shown a powerful approach to decompose time series in their main components. The technique has been developed for financial purposes, and it works very well with sales data. The main drawback of the classical time series decomposition is that it does not work well with random events, or multiple cycles. Real sales often exhibit not only monthly but also weekly and quarterly cycles.

Other resources