Microsoft has a great article that provides an Internet of Things (IoT) reference architecture for the Microsoft Azure Platform as a Service (PaaS), including Azure IoT Central and OSS components like as SMACK (Spark, Mesos, Akka, Cassandra, Kafka) deployed on Azure VMs.

Crate.io is partnered with Microsoft and we have a lot of experience implementing IoT solutions for major international companies such as ALPLA, Gantner Instruments, and Roomonitor using our IoT Data Platform.

In this post, I will build on the Azure IoT reference architecture by explaining how CrateDB fits into the Azure ecosystem and can be used to supercharge your Azure tech stack.

IoT and time-series data

The Internet of Things has been gaining adoption momentum and studies show that is only likely to increase.

The basic IoT workflow looks like this:

Let's break that down:

CollectIoT devices collect raw data and transmit that data across the network.

Persist and processRaw data is ingested by a suitable data store and processed (e.g. cleaned, formatted, and enriched) to transform it into something more useful.

Query, present, and learnProcessed data can then be presented in a way that makes it easy to understand and use. This can include reporting and visualization.Machine learning can also be applied at this stage to generate insights, data models, and predictions.

ActOperators can use these outputs to inform better decision-making, planning, and execution.

When adopting IoT, there are a lot of important things that need to be considered. For instance: networking, protocol support, security, privacy, cost, high-availability, data retention policies, and so on.

Typical industrial use-cases (e.g., a factory) generates a massive amount of complex time-series data (i.e., data that tracks changes over time). And this data often requires real-time processing, querying, visualization, and analysis.

Like IoT as a whole, databases that are specialized to handle this sort of data are also experiencing adoption growth.

The Azure IoT reference architecture

So, let's take a closer look at the Azure IoT reference architecture, using the diagram from the original article:

Let's break that down:

Phew! That's a lot. But hopefully, this is a useful summary. If you want more details, check out the original article.

Introducing CrateDB

CrateDB is a new type of SQL database that offers many of the performance and scaling benefits typically associated with NoSQL databases (including horizontal scaling and schemaless objects) without you having to ditch SQL.

Specifically, CrateDB is an eventually consistent, distributed SQL database that uses a shared-nothing architecture. CrateDB clusters are masterless, and nodes coordinate seamlessly with each other. Query execution is automatically parallelized and distributed across the nodes in the cluster.

This architecture is well-suited to containerization, meaning a cluster running on Kubernetes can be scaled up or down as easily as running a kubectl scale command.

Scale up operations can take minutes instead of weeks.

A modest CrateDB cluster can ingest millions of records per second while also offering real-time queries (including joins and aggregations).

Some customers report that CrateDB is 20x faster than their previous database and on 75% less hardware.

All of this means CrateDB excels at handling the velocity, volume, and diversity of huge industrial time-series workloads.

For example, industrial sensors often increase the frequency of their measurements when values exceed configured thresholds. With multiple sensors, cascading failures can result in huge data ingestion spikes. CrateDB is able to handle these spikes without sacrificing query performance, meaning that your reporting and analysis tools won't stop working just when you need them the most.

Crate.io offers a hosted CrateDB product called CrateDB Cloud which runs on and is integrated with Azure.

CrateDB and Azure IoT

Since you're familiar with CrateDB and the Azure IoT reference architecture, I can show you where CrateDB Cloud fits in the Azure tech stack to handle hot, warm, and cold path data.

Here's a modified version of the previous diagram:

Let's take a closer look.

CrateDB for hot storage

Hot storage is for data that sees continuous use and needs to be accessed immediately.

For example, if you're collecting sensor data from machines on a factory floor, you want to be able to spot faults as soon as possible and act on that data right away.

That means you need to pick a time-series database that can handle the amount of data your sensors are producing, as well as offering real-time query facilities.

Hot storage is the most expensive sort of storage because it typically involves the most performant hardware, advanced networking setups, redundant copies of the data, multiple geographic availability zones, and so on.

The Azure reference architecture uses Azure Stream Analytics and Azure Functions to handle hot path data ingestion. But those don't offer querying or visualization functionality. If you want to do that, you can add Azure Time Series Insights (TSI) to your tech stack.

Here's an overview of TSI:

Built for storage, visualization, and querying large amounts of time-series data

Fully integrated with cloud gateways like Azure IoT Hub and Azure Event Hubs

Out-of-the-box visualization (the TSI explorer)

Query service for embedding time series data into custom applications

Azure TSI is good at what it does. But for the demanding workloads of industrial time-series data, CrateDB has a more competitive pricing structure than that of Azure TSI. (We've seen a 10x better price-performance ratio.) Contact us for more details.

CrateDB for warm storage

Warm storage is for data that sees frequent use and when small delays in access times for that data can be tolerated. This means lower infrastructure costs.

The Azure reference architecture uses CosmosDB for warm storage.

CosmosDB is a globally distributed, multi-model database with Service Level Agreements (SLAs) and a focus on throughput, latency, availability, and consistency.

Since CrateDB is a time-series database, you have only one product to integrate with (as opposed to using both CosmosDB for warm data and Azure TSI for hot data).

Additionally, CrateDB may be a better alternative here because it is more cost efficient for extreme time-series use cases. CosmosDB can become expensive to run with large, non-transactional workloads.

Azure Blobs for cold storage

Cold storage is for data that sees infrequent use and when large delays in access times for that data can be tolerated.

Typically, cold storage is used for historical data and batch processing is more common than real-time querying.

This is the least expensive sort of data storage.

Azure Blob Storage can archive data indefinitely at low cost, and, per the Azure reference architecture, this is what we recommend for cold storage.

Additional benefits

Now you know how CrateDB fits into the Azure IoT reference architecture, let's take a look at some of the additional benefits of pairing CrateDB with Microsoft Azure:

CrateDB connects directly to the Azure IoT Hub and the Azure Event Hub.

CrateDB can run on Azure IoT Edge to extend time-series analysis to edge devices.

CrateDB integrates with Microsoft Power BI and other reporting, monitoring, and visualization tools (like Grafana) via JDBC, ODBC, the PostgreSQL wire protocol, and the CrateDB REST API.

Microsoft Azure provides comprehensive security features in the cloud and at the edge.

CrateDB's architecture makes it well suited for high-availability and disaster-recovery setups.

CrateDB and the Azure Machine Learning services make a great combo for applications such as predictive maintenance.

Wrap up

In this post, I introduced you to CrateDB and the Microsoft Azure IoT reference architecture. I then showcased how CrateDB fits in the architecture and can help you improve performance, increase capacity, reduce complexity, add features, and reduce costs.

Here are some real-life single-database stats we've seen:

Data flowing in from hundreds of factories and thousands of production lines

Thousands of different sensor data structures

Millions of inserts per day

Hundreds of terabytes of queryable data

Real-time dashboards executing millions of queries per day

If you want to know more, please get in touch!