In my previous posts, we have discussed stationarity tests on crypto trading data. In this upcoming series, we are going to explore how to implement Autoregressive Integrated Moving Average Model (ARIMA) into our crypto quantitative analysis.

The following material is for educational purposes only. Do NOT use it in production or bid with real money.

ARIMA

ARIMA is a general class of statistical models for analyzing and forecasting time series data. It includes random walk, moving average, seasonal and non-seasonal exponential smoothing and autoregressive models. One of the good places to start learning the fundamental theory is Prof. Nau’s course notes and materials, which provides a deep dive into time series analysis, explaining every aspect in detail.

BTCUSD Trading Data

Again, we are using Catalyst to pull out BTCUSD trading data from 2015-3-3 to 2019-6-25. In your terminal, run this command to ingest the data we need for this post.

(venv) catalyst ingest-exchange -x bitfinex -f daily -i btc_usd

If you haven’t installed Catalyst yet, take a look at this post and come back later.

Great! Let us create a new file named ARIMA.ipynb with JuypterLab or Juypter Notebook. Add the following code and hit Shift + Enter

% matplotlib inline # Increase chart resolution % config InlineBackend . figure_format = 'retina' from catalyst.api import symbol , record from catalyst import run_algorithm from datetime import timedelta from itertools import product import matplotlib.pyplot as plt import matplotlib.patches as mpatches import matplotlib.lines as mlines from matplotlib import style from matplotlib import ticker import numpy as np import pandas as pd from scipy import stats import seaborn as sns from statsmodels.tsa.statespace.sarimax import SARIMAX from statsmodels.tsa.stattools import adfuller import warnings warnings . filterwarnings ( 'ignore' ) trading_pair = 'btc_usd' frequency = 'daily' exchange = 'bitfinex' start = '2015-3-3' end = '2019-6-25' capital_base = 1000 quote_currency = trading_pair . split ( '_' )[ 1 ] def initialize ( context ): context . asset = symbol ( trading_pair ) def handle_data ( context , data ): # The last known price and volume of current date/minute and the day/minute before if frequency == 'daily' : price = data . current ( context . asset , 'price' ) volume = data . current ( context . asset , 'volume' ) elif frequency == 'minute' : price = data . current ( context . asset , 'price' ) volume = data . current ( context . asset , 'volume' ) record ( price = price , volume = volume ) if __name__ == '__main__' : perf = run_algorithm ( capital_base = capital_base , data_frequency = frequency , initialize = initialize , handle_data = handle_data , exchange_name = exchange , quote_currency = quote_currency , start = pd . to_datetime ( start , utc = True ), end = pd . to_datetime ( end , utc = True ))

The daily (close) price and volume are stored in perf (performance). Let us resample our data into daily, weekly, bi-weekly and monthly data sets.

# Daily data d_df = pd . DataFrame ( perf . loc [:, [ 'price' , 'volume' ]], index = perf . index ) . resample ( '1d' ) . mean () . round ( 2 ) . copy () # Weekly data w_df = d_df . resample ( '1w' ) . mean () . round ( 2 ) . copy () # Bi-weekly data bw_df = d_df . resample ( '2w' ) . mean () . round ( 2 ) . copy () # Monthly data m_df = d_df . resample ( '1M' ) . mean () . round ( 2 ) . copy () df_list = [ d_df , w_df , bw_df , m_df ]

Logarithmic Price Chart

You might be curious about what they look like. Remember the logarithmic price scale discussed in my previous post? We can create a similar function and take a quick look at what we have from the resampling.

Further, we are going to use this very same function to compare our forecast with history price at the end of this series. It is totally okay if you can not understand some parts of this function at the moment. I will do my best to explain them in detail.

def make_log_price_chart ( df , frequency = 'Daily' ): """ Plot in logarithmic price scale """ sns . set ( style = "ticks" ) fig_log_price = plt . figure ( figsize = ( 26 , 7 ))

We are going to use seaborn ticks style. I have set up the figure size based on my screen. Feel free to change it to your liking. In case you need to install seaborn, run this command in the terminal

(venv) pip install -U seachborn

Now we need to check the data frequency once the function has taken in the panda dataframe, and we are going to use this frequency information to plot the figure accordingly.

# Data Frequency if 'Day' in str ( df . index . freq ): frequency = 'Daily' elif 'Week: weekday=6' in str ( df . index . freq ): frequency = 'Weekly' elif '2 * Weeks: weekday=6' in str ( df . index . freq ): frequency = 'Bi-weekly' elif 'MonthEnd' in str ( df . index . freq ): frequency = 'Monthly'

Let us create the first subplot for the price data.

# History Price ax1 = plt . subplot2grid (( 1 , 1 ), ( 0 , 0 ), rowspan = 1 , colspan = 1 )

If there is only history price, we are going to use line chart. At the end of this series, the data frame will contain forecast as well. In that case, the function uses scatter chart to plot the history price instead.

if 'forecast' in df . columns : ax1 . semilogy ( df . index , df . price , basey = 10 , color = 'k' , linewidth = 0 , marker = '.' ) lg_price = mlines . Line2D ([], [], color = 'k' , label = 'BTCUSD History' , linewidth = 0 , marker = '.' ) else : ax1 . semilogy ( df . index , df . price , basey = 10 , color = 'k' , linewidth = 0.75 ) lg_price = mlines . Line2D ([], [], color = 'k' , label = 'BTCUSD History' , linewidth = 1.5 )

More format tweeks for the price chart.

ax1 . yaxis . set_major_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . yaxis . set_minor_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . set_ylim ([ np . min ( perf . price ) * 0.9 , np . max ( perf . price ) * 1.1 ]) plt . ylabel ( 'Log Price' , fontsize = 12 ) ax1 . tick_params ( axis = 'y' , which = 'both' , labelsize = 8 ) plt . grid ( which = 'minor' ) plt . xticks ( fontsize = 12 , rotation = 0 )

Now, we can work on the second subplot - volume.

# Volume in Log Price Chart ax1v = ax1 . twinx () ax1v . set_ylim ([ 0 , np . max ( perf . volume ) * 2 ]) ax1v . fill_between ( df . index , df . volume , color = '#F89E38' , alpha = 0.4 , label = '2' ) plt . grid ( False ) plt . setp ( ax1v . get_yticklabels (), visible = False ) lg_volume = mpatches . Patch ( color = '#F89E38' , label = 'Volume' , alpha = 0.4 )

Here comes the forecast part. We will show the last forecast price on the right in a annotation green box. In case of history price only, the chart presents the last known price in a white box. In addition, the 95% confidence interval (upper & lower) will be plotted in red.

if 'forecast' in df . columns : # Forecast Price ax1 . semilogy ( df . index , df . forecast , basey = 10 , color = 'g' , linewidth = 0.75 ) # Forecast Price Annotation bbox_props = dict ( boxstyle = 'round' , fc = 'g' , ec = 'k' , lw = 0.25 ) last_forecast_price_date = df . forecast . last_valid_index () last_forecast_price_index = df . index . get_loc ( last_forecast_price_date ) last_forecast_price = df . forecast . iloc [ last_forecast_price_index ] ax1 . annotate ( str ( last_forecast_price ), ( last_forecast_price_date , last_forecast_price ), xytext = ( last_forecast_price_date + pd . Timedelta ( '7d' ), last_forecast_price ), bbox = bbox_props , fontsize = 10 ) # Plot confidence interval first_forecast_price_date = df . forecast . first_valid_index () first_forecast_price_index = df . index . get_loc ( first_forecast_price_date ) ax1 . fill_between ( df . index [ first_forecast_price_index :], df [ 'upper forecast' ] . iloc [ first_forecast_price_index :], df [ 'lower forecast' ] . iloc [ first_forecast_price_index :], color = 'r' , alpha = 0.2 ) ax1 . yaxis . set_major_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . yaxis . set_minor_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) lg_forecast = mlines . Line2D ([], [], color = 'g' , label = 'BTCUSD Forecast' , linewidth = 1.5 ) lg_forecast_ci = mpatches . Patch ( color = 'r' , label = '95 % Confidence Interval' , alpha = 0.2 ) plt . legend ( handles = [ lg_price , lg_forecast , lg_volume , lg_forecast_ci ], loc = 'center right' ) else : # History Price Annotation bbox_props = dict ( boxstyle = 'round' , fc = 'w' , ec = 'k' , lw = 0.25 ) last_known_price_date = df . price . last_valid_index () last_known_price_index = df . index . get_loc ( last_known_price_date ) last_known_price = df . price . iloc [ last_known_price_index ] ax1 . annotate ( str ( last_known_price ), ( last_known_price_date , last_known_price ), xytext = ( last_known_price_date + pd . Timedelta ( '14d' ), last_known_price ), bbox = bbox_props , fontsize = 10 ) plt . legend ( handles = [ lg_price , lg_volume ], loc = 'center right' )

Put everything together for make_log_price_chart() , and we have got the following.

def make_log_price_chart ( df , frequency = 'Daily' ): """ Plot in logarithmic price scale """ sns . set ( style = "ticks" ) fig_log_price = plt . figure ( figsize = ( 26 , 7 )) # Data Frequency if 'Day' in str ( df . index . freq ): frequency = 'Daily' elif 'Week: weekday=6' in str ( df . index . freq ): frequency = 'Weekly' elif '2 * Weeks: weekday=6' in str ( df . index . freq ): frequency = 'Bi-weekly' elif 'MonthEnd' in str ( df . index . freq ): frequency = 'Monthly' # History Price ax1 = plt . subplot2grid (( 1 , 1 ), ( 0 , 0 ), rowspan = 1 , colspan = 1 ) if 'forecast' in df . columns : ax1 . semilogy ( df . index , df . price , basey = 10 , color = 'k' , linewidth = 0 , marker = '.' ) lg_price = mlines . Line2D ([], [], color = 'k' , label = 'BTCUSD History' , linewidth = 0 , marker = '.' ) else : ax1 . semilogy ( df . index , df . price , basey = 10 , color = 'k' , linewidth = 0.75 ) lg_price = mlines . Line2D ([], [], color = 'k' , label = 'BTCUSD History' , linewidth = 1.5 ) ax1 . yaxis . set_major_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . yaxis . set_minor_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . set_ylim ([ np . min ( perf . price ) * 0.9 , np . max ( perf . price ) * 1.1 ]) plt . ylabel ( 'Log Price' , fontsize = 12 ) ax1 . tick_params ( axis = 'y' , which = 'both' , labelsize = 8 ) plt . grid ( which = 'minor' ) plt . xticks ( fontsize = 12 , rotation = 0 ) # Volume in Log Price Chart ax1v = ax1 . twinx () ax1v . set_ylim ([ 0 , np . max ( perf . volume ) * 2 ]) ax1v . fill_between ( df . index , df . volume , color = '#F89E38' , alpha = 0.4 , label = '2' ) plt . grid ( False ) plt . setp ( ax1v . get_yticklabels (), visible = False ) lg_volume = mpatches . Patch ( color = '#F89E38' , label = 'Volume' , alpha = 0.4 ) if 'forecast' in df . columns : # Forecast Price ax1 . semilogy ( df . index , df . forecast , basey = 10 , color = 'g' , linewidth = 0.75 ) # Forecast Price Annotation bbox_props = dict ( boxstyle = 'round' , fc = 'g' , ec = 'k' , lw = 0.25 ) last_forecast_price_date = df . forecast . last_valid_index () last_forecast_price_index = df . index . get_loc ( last_forecast_price_date ) last_forecast_price = df . forecast . iloc [ last_forecast_price_index ] ax1 . annotate ( str ( last_forecast_price ), ( last_forecast_price_date , last_forecast_price ), xytext = ( last_forecast_price_date + pd . Timedelta ( '7d' ), last_forecast_price ), bbox = bbox_props , fontsize = 10 ) # Plot confidence interval first_forecast_price_date = df . forecast . first_valid_index () first_forecast_price_index = df . index . get_loc ( first_forecast_price_date ) ax1 . fill_between ( df . index [ first_forecast_price_index :], df [ 'upper forecast' ] . iloc [ first_forecast_price_index :], df [ 'lower forecast' ] . iloc [ first_forecast_price_index :], color = 'r' , alpha = 0.2 ) ax1 . yaxis . set_major_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) ax1 . yaxis . set_minor_formatter ( ticker . FormatStrFormatter ( '$ %.2 f' )) lg_forecast = mlines . Line2D ([], [], color = 'g' , label = 'BTCUSD Forecast' , linewidth = 1.5 ) lg_forecast_ci = mpatches . Patch ( color = 'r' , label = '95 % Confidence Interval' , alpha = 0.2 ) plt . legend ( handles = [ lg_price , lg_forecast , lg_volume , lg_forecast_ci ], loc = 'center right' ) else : # History Price Annotation bbox_props = dict ( boxstyle = 'round' , fc = 'w' , ec = 'k' , lw = 0.25 ) last_known_price_date = df . price . last_valid_index () last_known_price_index = df . index . get_loc ( last_known_price_date ) last_known_price = df . price . iloc [ last_known_price_index ] ax1 . annotate ( str ( last_known_price ), ( last_known_price_date , last_known_price ), xytext = ( last_known_price_date + pd . Timedelta ( '14d' ), last_known_price ), bbox = bbox_props , fontsize = 10 ) plt . legend ( handles = [ lg_price , lg_volume ], loc = 'center right' ) title = 'BTCUSD {} Price (Logarithmic Scale)' . format ( frequency ) plt . title ( title , fontsize = 16 ) plt . show ()

Awesome. Let us use this function to run through our data frame list including daily, weekly, bi-weekly and monthly BTCUSD history prices.

for df in df_list : make_log_price_chart ( df )

BTCUSD daily log price

BTCUSD weekly log price

BTCUSD bi-weekly log price

BTCUSD monthly log price

Stationarity Test

Apparently, our price data is not stationary. But before we perform any data transformation and differencing, we can create a simple stationarity test function (You might have noticed this function is a bit similar to the one mentioned in one of my previous posts) for later use.

def run_stationarity_test ( time_series , window = 10 ): """ This window here is a bit arbitrary. After all, rolling average is just a visual technique to verify the stationarity. """ sns . set ( style = "darkgrid" ) #Determing rolling statistics rolmean = time_series . rolling ( window = window ) . mean () rolstd = time_series . rolling ( window = window ) . std () #Plot rolling statistics: fig = plt . figure ( figsize = ( 27 , 5 )) orig = plt . semilogy ( time_series , color = 'blue' , label = 'Original' , lw = 0.75 , alpha = 0.7 ) mean = plt . semilogy ( rolmean , color = 'red' , label = 'Rolling Mean' , lw = 1 ) std = plt . semilogy ( rolstd , color = 'black' , label = 'Rolling Std' , lw = 1 ) plt . legend ( loc = 'best' ) if 'Day' in str ( time_series . index . freq ): freq = 'Daily' elif 'Week: weekday=6' in str ( time_series . index . freq ): freq = 'Weekly' elif '2 * Weeks: weekday=6' in str ( time_series . index . freq ): freq = 'Bi-weekly' elif 'MonthEnd' in str ( time_series . index . freq ): freq = 'Monthly' plt . title ( '{} Data Rolling Mean & Standard Deviation' . format ( freq ), fontsize = 20 ) plt . yticks ( fontsize = 14 ) plt . xticks ( fontsize = 14 , rotation = 0 ) plt . show () #Perform ADF test: first_valid_date = time_series . first_valid_index () print ( 'Results of ADF Test:' ) dftest = adfuller ( time_series . loc [ first_valid_date :], autolag = 'AIC' ) dfoutput = pd . Series ( dftest [ 0 : 4 ], index = [ 'Test Statistic' , 'p-value' , '#Lags Used' , 'Number of Observations Used' ]) for key , value in dftest [ 4 ] . items (): dfoutput [ 'Critical Value ( % s)' % key ] = value print ( dfoutput )

Let us run this test through our data frame list as well.

for df in df_list : run_stationarity_test ( df . price )

BTCUSD daily log price rolling mean and standard deviation

BTCUSD weekly log price rolling mean and standard deviation

BTCUSD bi-weekly log price rolling mean and standard deviation

BTCUSD monthly log price rolling mean and standard deviation

Visually speaking, it is quite obvious that our price data is not stationary.

According to Augmented Dickey-Fuller (ADF) test, when p-value is greater than 0.05, we can not reject the null hypothesis that the series has a unit root. In other words, the price time series is not stationary.

Usually, we can just take the natural log of the price and be done with it, but I am going to show you another general yet powerful tool called Box-Cox Transformation in the upcoming part 2. All source code can be found on my github. Meanwhile, if you have any questions/comments/proposals, feel free to shoot me a message.

I have also created one QUANT channel in one of the most popular discords in cryptoverse. Stop by and say hi to those down-to-earth crypto folks.

Stay calm and happy trading!