This is an example of time series data.



Not the easiest thing to read, at least in this format. You may have such data on your hands or encountered it before. But it's not always easy to understand what it means and how to make use of it. In this article, we will explain just that: what time series are, how to use them, and what you need out of a time series database.

So how do you turn that raw data into, for example, this?

The answer is through CrateDB Cloud, the database-as-a-service hosted in the cloud that is based on CrateDB - a time series database optimized for industrial internet of things (IIoT) use cases involving large amounts of machine data.

Well, that is a mouthful. So let's start with the basics. What if you find yourself with an IIoT use case, a lot - and we mean a lot - of sensor data on your hands, but you are unsure how best to store it for analysis? Or you've heard of time series and the internet of things and other such terms, but are unsure what all the above means or how it functions?

Let us explain what it's all about. We will explain where time series data come from, the fundamentals of time series databases, and how CrateDB Cloud addresses the need for an IIoT-optimized time series database. Of course, if you're already familiar with IIoT and just want to understand how CrateDB Cloud addresses your needs, feel free to skip to the section about CrateDB Cloud.

Understanding the internet of things and time series data

We live in a world of Big Data. Ever growing quantities of data are extracted from the world around us, and ever more services, institutions and companies are becoming dependent on the flow of large volumes of data for their operation. Often, the sources of these data flows already existed before. What is new is that their value for generating data over time is becoming recognized.

In other cases, the Internet of Things (IoT) and its associated data generation technology is making it possible to extract and transmit data points where none were available before. New sensors being available means new sensor readings, and therefore new data sources.

To illustrate the point, let's look at a small-scale and a large-scale example. A small source of data could be an IoT enabled fridge in your home. Equipped with sensors, the fridge continually measures certain relevant values - temperature, humidity, energy consumption, perhaps even capacity.

These data points are then continually transmitted to a central database on a server, which gathers and tracks them. (This could be the fridge manufacturer's database, or the IoT company's database, or even the consumer's own.)

In the past, fridges would still have had temperatures and energy consumption, but nothing would have tracked such data or stored it. Now, such data is recognized as valuable: among other things, IoT data can help increase productivity, support better equipment monitoring and failure prediction, allow for (further) automation, and even improve customer marketing. Moreover, new IoT technologies make it easy to record and transfer such information.

The large-scale example works along the same lines. Take for instance the New York City Taxi Commission, from whom we got the time series data shown at the start of this article. All the taxicabs licensed with the Commission digitally track every journey a customer makes with that cab.

This means a lot of data: the fare, the tip, the distance travelled, the start and end times, and much more. And this for every journey with every cab in a city as large - and as popular with tourists - as New York. You can imagine this adds up quickly.

What is a time series database?

At the heart of all this Big Data collection are time series databases. As the name suggests, a time series database gathers and tracks data in the form of a time series. But what are such time series data?

Essentially, time series data are a way of presenting data points as points on a timeline. Represented in the form of a graph, this means the data points (the values) represent one axis, and time another. Of course, the exact dimensions depend on the nature of the data and the relevant timescale.

A simple example of time series data represented in this way is the stock market. The data points are the prices of a particular stock or index. The other axis is time - a day of trading, say. The result is a graph that looks something like this:

The data gathered through IoT are essentially of this nature. Let's return again to the NYC taxicab data, as they are a good example of this: values for different dimensions (cab fare, travel duration, etc.) gathered by sensors (in the cab) collected and organized sequentially by time (a timestamp for each journey start).

For example, from the graphs at the top of this article you can clearly see that while relatively few New Yorkers take taxi trips in the very early morning, the ones that do tend to spend a lot. Perhaps they have a long way to go home after going clubbing in Manhattan.

Since this kind of data very quickly becomes very large in volume, databases are necessary that are tailored to such large volumes of data and the way the inputs are continually transmitted over time.

There are many kinds of databases: a traditional type are relational databases, tailored to smaller volume use cases such as IT monitoring. But some databases are specialized in time series data. The advantages of such databases are in their scale, speed, and ease of use, given such large volumes. These are time series databases.

Time series databases and the IIoT use case

You can imagine, then, that such time series databases need to be able to do several things very well in order to cope with modern IoT requirements. Manufacturing and its associated IIoT is the largest source - and generates the largest volumes of - time series data.This quantity is growing rapidly every year.

It is no surprise then that those responsible for managing IIoT data find themselves looking for time series databases that can speedily and efficiently support storage, analysis, and querying for (very) high volume use cases.

So if you want a time series database that fits your IIoT use case, this means it should:

Quickly process very large amounts of data. Typical IoT use cases add up fast. This goes even more for IIoT. Take airlines for example: a single flight of a Boeing 787 can generate over half a terabyte of data from its different sensors, and more recent plane designs produce even more.

Show flexibility and scalability. As IoT applications expand, businesses and institutions want to increase their data generation capacity and their data sources. This means it must be easy to smoothly and quickly increase the amount of data processed.

Provide good tracking, visualization, and prediction capabilities. Most of the time, the added value of all this Big Data intelligence is in prediction based on pattern analysis, in statistical analysis of variance and range, and/or simply to provide a clearer overview of relevant values.

Be easy to set up and maintain. The purpose of IoT intelligence generation is to make businesses and institutions more effective and efficient, not to add extra overhead and management problems.

Be secure. Wherever the data is being stored, it needs to combine efficiency with security against data loss in case of hardware failure or other problems.

To meet all these requirements is not straightforward. It requires a time series database optimized for high end IIoT needs, one that is flexible, scalable, and lightning fast. Such use cases go well beyond the capabilities of most databases, which are optimized for IT monitoring.

Moreover, if you want to be able to focus on your business intelligence - making the most of the data available to you - rather than on server maintenance and setup, you will want a database-as-a-service. This means you need a time series database fully based in the cloud. But even in the cloud, you still need a service that can live up to the IIoT use case standards mentioned above.

This is not a small ask. Fortunately, there is such a service: CrateDB Cloud.