Hi! I’m Jose Portilla and I teach Python, Data Science and Machine Learning online to over 500,000 students! If you’re interested in learning more about how to do types of analysis and visualization in this blog posts, you can check out discount coupon links to my courses on Python for Data Science and Machine Learning and using Python for Financial Analysis and Algorithmic Trading.

Alright, on to the discussion of time series!

Time Series Basics

Let’s first discuss what a time series is and what it’s not. We’ll also talk about what kinds of time series are suitable for ARIMA based forecasting models.

Time Series data is experimental data that has been observed at different points in time (usually evenly spaced, like once a day). For example, the data of airline ticket sales per day is a time series. However, just because a series of events has a time element does not automatically make it a time series, such as the dates of major airline disasters, which are randomly spaced and are not time series. These types of random processes are known as point process.

Time Series have several key features such as trend, seasonality, and noise.

What we’ll be doing in this article is analyzing these features of a time series data set, and then seeing if we can use mathematical models to forecast into the future. We’ll also see how we can split our original time series data set to evaluate how well our model predicts the future.

Not every time series is suitable for forecasting, so don’t expect any get rich quick schemes on forecasting stock prices :)

Forecasting with ARIMA

“Prediction is very difficult, especially about the future”.

Forecasting is the process of making predictions of the future, based on past and present data. One of the most common methods for this is the ARIMA model, which stands for AutoRegressive Integrated Moving Average.

In an ARIMA model there are 3 parameters that are used to help model the major aspects of a times series: seasonality, trend, and noise. These parameters are labeled p,d,and q.

p is the parameter associated with the auto-regressive aspect of the model, which incorporates past values. For example, forecasting that if it rained a lot over the past few days, you state its likely that it will rain tomorrow as well.

d is the parameter associated with the integrated part of the model, which effects the amount of differencing to apply to a time series. You can imagine an example of this as forecasting that the amount of rain tomorrow will be similar to the amount of rain today, if the daily amounts of rain have been similar over the past few days.

q is the parameter associated with the moving average part of the model.

If our model has a seasonal component (we’ll show this in more detail later), we use a seasonal ARIMA model (SARIMA). In that case we have another set of parameters: P,D, and Q which describe the same associations as p,d, and q, but correspond with the seasonal components of the model.

The methods we will employ in this blog example will only take in data from a uni-variate time series. That means we really are only considering the relationship between the y-axis value the x-axis time points. We’re not considering outside factors that may be effecting the time series.

A common mistake beginners make is they immediately start to apply ARIMA forecasting models to data that has many outside factors, such as stock prices or a sports team’s performance. While ARIMA can be a powerful and relevant tool for times series related to those topics, if you only use it by itself and don’t account for outside factors, such as a CEO getting fired or an injury on the team, you won’t have good results. Keep this in mind as you begin to apply these concepts to your own data sets.