0.1 Before we dive deeper:

We’ve decided to document this journey of mutual learning, to better understand how the core Redis 5 and it’s top starred extensions work, discuss the new features in this new version, so the ones reading this series of articles, can make informed decisions regarding the different uses cases it fits and the ones it doesn’t.

We will consider you have an overall understanding of Redis 5.X, as described in the previous articles of this series:

— Redis 5.X under the hood: 1 — Redis-Server up and running

— Redis 5.X under the hood: 2 — Intro to Redis Commands and Data Structures — part 1

1 — Downloading and installing RedisTimeSeries module

Using the latest version of Redis, if possible, is one of the best practices. The latest Redis major release ( 5.0.0 ) was delivered on GitHub on 17 Oct 2018. In this series, we’re adopting the latest stable version available ( 5.0.3 ) from 12 Dec 2018. If you already have Redis and RedisTimeSeries module running on your computer, you can jump to section 2.

1.1 — Getting up and running:

We recommend two ways to get Redis and the Redis Time-Series Module up and running on your machine:

1.1.1 — Option 1 — Running with docker ( simplest way ):

If you have docker on your machine this is the simplest way of setting a test environment:

docker run -p 6379:6379 --name redis5- redistimeseries -d redislabs/redistimeseries

sudo apt install redis-tools

redis-cli

127.0.0.1:6379>

1.1.2 — Option 2 — Build Redis and Redis Time-Series Module:

If you want to build Redis and the Redis Time-Series Module execute the following commands on your terminal:



tar xvzf redis-5.0.3.tar.gz

cd redis-5.0.3

make

make test

sudo make install

cd .. && git clone

git submodule init

git submodule update

cd src

make all wget http://download.redis.io/releases/redis-5.0.3.tar.gz tar xvzf redis-5.0.3.tar.gzcd redis-5.0.3makemake testsudo make installcd .. && git clone https://github.com/RedisLabsModules/RedisTimeSeries.git && cd RedisTimeSeriesgit submodule initgit submodule updatecd srcmake all

Add the next line to redis.conf file:

loadmodule /path/to/redistimeseries.so

Finally:

redis-server /path/to/redis.conf &

redis-cli

127.0.0.1:6379>

It is also possible to load a module at runtime using the following command:

MODULE LOAD /path/to/redistimeseries.so

To list all loaded modules, use:

127.0.0.1:6379> module list

1) 1) "name"

2) "timeseries"

3) "ver"

4) (integer) 100

2 — Picking up a challenging dataset

An excellent source of time series data is the UCI Machine Learning Repository. In there you will find good quality standard datasets on which to practice, ranging from Meteorology, Medicine and Monitoring domains.

We’ve decided to pick the “Air Quality” dataset to use on our basic setup, which contains 9358 instances of hourly averaged responses from an array of 5 metal oxide chemical sensors embedded in an Air Quality Chemical Multisensor Device. The Data was recorded from March 2004 to February 2005 (one year) and will enable us to produce the following aggregations using only Redis to do the aggregations and operations on the data:

Day over Day comparisons — comparing sets of 1 day. As an example today with yesterday, today with one week prior, etc. We will refer to this as -DoD on our setup.

on our setup. Week over Week comparisons — comparing sets of 7 days. We will refer to this as -WoW on our setup.

on our setup. Month over Month comparisons — comparing sets of 30 days. We will refer to this as -MoM on our setup.

To load the data and produce the graphs we will recur to Python 3.7 along with visualisation modules.

2.1 — Loading the data into Redis

From the “Air Quality” dataset we’ll extract the “True hourly averaged concentration CO in mg/m³”, the Temperature in Celsius (°C), and the Relative Humidity (%), along with the period of time it relates.

We’ll create a time series per measurement using TS.CREATE. Once created all the measurements will be sent using TS.ADD.

The following sample creates a time-series and populates it with three entries.

127.0.0.1:6379> TS.CREATE ts:carbon_monoxide

OK

127.0.0.1:6379> TS.ADD ts:carbon_monoxide 1112587200 2.199

OK

127.0.0.1:6379> TS.ADD ts:carbon_monoxide 1112590800 1.99

OK

127.0.0.1:6379> TS.ADD ts:carbon_monoxide 1112594400 0.4

OK

To ease the process of loading data to RedisTimeseries we’ve created a script called dataloader.py available on github. The following code is just a snippet with the most relevant parts of the complete file:

To properly load the data into Redis use the next set of commands:

$ python3 -m pip install -r requirements.txt

$ python3 dataloader.py

9471it [00:22, 416.13it/s]

You can see that with the script being run on our local machine we’ve achieved around 416 iterations per second, which translates to 78K OPS per minute since we’re doing 3 TS.ADD per iteration. For further details on how the script is run just type:

$ python3 dataloader.py -h

usage: dataloader.py [-h] [--port PORT] [--password PASSWORD] [--verbose][--host HOST] [--csv CSV] [--csv_delimiter CSV_DELIMITER] optional arguments: -h, --help show this help message and exit --port PORT redis instance port --password PASSWORD redis instance password --verbose enable verbose output --host HOST redis instance host --csv CSV csv file containing the dataset --csv_delimiter CSV_DELIMITER csv file field delimiter

2.2 — Doing the aggregations and operations on data

With time-series with thousands or even just hundreds of thousands of data points, it’s not practical for you to sift over each data point individually on your application and summarise after getting the data. Aggregated queries allow you to summarise metrics on the data store itself, leading to smaller responses, reduced network bandwidth and post-processing on your applications. Redis TimeSeries module features the following aggregation operators: Min, Max, Avg, Sum, Range, Count, First, and Last for any time bucket. What differs between them is what they do with the grouped data.

The min and max aggregators return the minimum or maximum value within a specified time frame, respectively. The avg aggregator returns the average of the values of the time series in each time frame. The sum aggregator totals up all the values for each time bucket. The Range aggregator returns the difference between the maximum and minimum observed values within a specified time frame for each time bucket. Count, First, and Last aggregators are all self-explanatory.

We’ve created a script called plot.py available on github. The following code is just a snippet with the most relevant part of the complete file, specifically the usage of the Redis command TS.RANGE, both with and without aggregation code parts.

Using plot.py to enable the visualisation and only Redis to do the aggregations and operations on the data we’ll make the following visual comparisons:

Day over Day comparison — comparing sets of 1 day. We will refer to this as -dod on our setup. As an example comparison between the relative humidity of 11/March/2004 with 11/May/2004. The aggregator we will use will be the average.