When trend and seasonality is present in a time series, instead of decomposing it manually to fit an ARMA model using the Box Jenkins method, another very popular method is to use the seasonal autoregressive integrated moving average (SARIMA) model which is a generalization of an ARMA model. SARIMA models are denoted SARIMA(p,d,q)(P,D,Q)[S], where S refers to the number of periods in each season, d is the degree of differencing (the number of times the data have had past values subtracted), and the uppercase P, D, and Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.

The SARIMA model is a bit complex to write out directly so a backshift operator is needed to describe it. For example SARIMA(1,1,1)(1,1,1)[4] is written as:

The backward shift operator B is a useful notational device when working with time series lags: By(t)=y(t−1)

Rules for SARIMA model selection from ACF/PACF plots

These are all rule of thumbs, not an exact science for picking the number of each parameters in SARIMA(p,d,q)(P,D,Q)[S]. It is an art in picking good parameters from the ACF/PACF plots. The following rules also apply to ARMA and ARIMA models. Identifying the order of differencing: d=0 if the series has no visible trend or ACF at all lags is low. d≥1 if the series has visible trend or positive ACF values out to a high number of lags. Note: if after applying differencing to the series and the ACF at lag 1 is -0.5 or more negative the series may be overdifferenced. Note: If you find the best d to be d=1 then the original series has a constant trend. A model with d=2 assumes that the original series has a time-varying trend. Identifying the number of AR and MA terms p is equal to the first lag where the PACF value is above the significance level. q is equal to the first lag where the ACF value is above the significance level. Identifying the seasonal part of the model: S is equal to the ACF lag with the highest value (typically at a high lag). D=1 if the series has a stable seasonal pattern over time. D=0 if the series has an unstable seasonal pattern over time. Rule of thumb: d+D≤2 P≥1 if the ACF is positive at lag S, else P=0. Q≥1 if the ACF is negative at lag S, else Q=0. Rule of thumb: P+Q≤2 Grid search for SARIMA model selection

Doing a full manual time series analysis can be a tedious task, especially when you have many data sets to analyze. It is preferred to then automate the task of model selection with grid search. For SARIMA, since we have many parameters, grid search may take hours to complete on one data set if we set the limit of each parameter too high. Setting the limits too high will also make your model too complex and overfit the training data. To prevent the long runtime and overfitting problem, we apply what is known as the parsimony principle where we create a combination of all parameters such that p+d+q+P+D+Q≤ 6. Another approach is to set each parameter as 0 or 1 or 2 and do grid search using AIC with each combination. It is more common in forecasting studies to apply grid search on SARIMA when you are using it as a benchmark method to more advanced models such as deep neural networks. But as a reminder, grid search may not always give you the best model. To get the best model you may need to manually experiment with different parameters using the ACF/PACF plots.

Python Tutorial After loading in our time series we plot it, here we use the classical Air Passengers time series.

From inspecting the plot we can conclude that this time series has a positive linear trend, multiplicative seasonal patterns, and possibly some irregular patterns. This information strongly suggests for us to use a SARIMA model to do our forecasting. Let's get to it! First we split 70% of data for training and 30% fo testing.