Hi all! VictoriaMetrics founders here:

We are happy to shed some light on VictoriaMetrics.

A bit of history

We started using Prometheus and Grafana two years ago. This was a breath of fresh air comparing to Zabbix. Now developers could scatter arbitrary metrics around their code, build custom dashboards in Grafana and monitor their apps without dedicated sysadmins / DevOps.

The number of unique metrics scraped by our Prometheus instance quickly grew up from a few hundred to more than 300K in half a year. We switched to Prometheus 2.0 during the growth, since pre-2.0 Prometheus became too slow for our metric volumes. But the new Prometheus had a few issues:

It was not-so-fast on query ranges exceeding a few days. We used such ranges for long-term trends and capacity planning dashboards.

and dashboards. It started eating a lot of storage space after the gradual retention increase from default 15 days to a year.

It was unclear how to prevent Prometheus data loss in the event of storage crash. We end up with two distinct Prometheus instances scraping the same set of targets (aka HA pair ). This doubled our monitoring costs.

We started exploring possible solutions and discovered that Prometheus supports remote storage. But all the existing solutions were unsatisfying due to various reasons:

Complex setup and fragile operation.

Reported crashes and data loss.

Slowness.

Zero or sub-optimal scalability.

During the same time we successfully used ClickHouse for storing and analyzing huge event streams — up to 300 billion events per day. We were amazed with the operational simplicity, the query speed and the compression level of its’ MergeTree table engine.

Our experience with ClickHouse was so great, so we open sourced the following projects for it:

clickhouse-grafana — Grafana datasource for ClickHouse.

chproxy — load balancer and caching proxy for ClickHouse.

chclient — fast Go client for ClickHouse.

We tried using ClickHouse as a remote storage for Prometheus. Initial results were great — ClickHouse was able to scan billions of data points per second on a single server. Unfortunately we couldn’t find a good solution for building efficient index for Prometheus labels.

Then a crazy idea has been emerged — let’s create our own TSDB with the following requirements:

Efficient index for Prometheus labels aka Metrics 2.0 tags, which easily stores and queries billions of distinct labels.

Fast speed for queries on big date ranges, big number of unique metrics and huge number of data points.

Good storage compression, so more data may be stored on the limited disk space.

Easy and fast online backups similar to FREEZE PARTITION in ClickHouse.

The prototype of this TSDB was promising, so I (valyala) left my work at VertaMedia and started working on the TSDB full time. Later the TSDB got nice name — VictoriaMetrics.

Technical details

VictoriaMetrics is written in Go. Go has been chosen because of the following reasons:

Many existing TSDB solutions are written in Go — Prometheus, InfluxDB, Thanos, M3, Cortex, etc. This hints Go is quite good for TSDB writing.

I have good experience in Go. See my repos on GitHub.

I’m the author of fasthttp, so I know how to write efficient apps in Go.

VictoriaMetrics’ storage is built from scratch using ideas from Clickhouse’s MergeTree table engine:

Store separately timeseries names, timestamps and values (aka columnar storage). This speeds up queries by scanning only the required columns.

Store each column in a datastructure similar to log-structured merge tree (LSM). This reduces random I/O when adding / scanning sorted values comparing to B-tree-like data structures. This speeds up the storage on HDDs. LSM files are immutable. This simplifies making fast snapshots and backups. LSM has a drawback comparing to B-tree — the stored data is rewritten multiple times when smaller files are merged into bigger files. This wastes disk bandwidth, but ClickHouse practice shows this is quite good tradeoff.

Process data in chunks that fit CPU cache. This maximizes CPU performance, since it doesn’t wait for data from slow RAM. See Latency Numbers Every Programmer Should Know for details.

Initially index for Prometheus labels has been built on top of LevelDB port in Go. Later I tried substituting it with more efficient alternative — RocksDB. But it wasn’t successful because of high cgo overhead, which must be paid on each scanned label. Eventually LevelDB has been substituted by a custom data structure — mergeset . This data structure is specially optimized for Prometheus labels' index.

mergeset has the following differences comparing to LevelDB:

It operates only on keys. It isn’t aware of values.

It has lower write amplification.

It has faster seeks for many ordered keys.

It compresses keys better, so they require less storage space.

It provides instant data snapshots and easy backups.

It uses ideas from ClickHouse’s MergeTree table engine.

We plan to open source mergeset in the near future, so others may benefit from it.

Initially VictoriaMetrics was a single-server solution. Later it has been transformed into a clustered solution. A single service has been split into three services during the transformation:

vmstorage - stores metric values received from vminsert , returns raw metric values for queries from vmselect .

- stores metric values received from , returns raw metric values for queries from . vminsert - accepts metric values via Prometheus remote_write API and sends them to vmstorage .

- accepts metric values via Prometheus remote_write API and sends them to . vmselect - implements Prometheus querying API. Fetches raw data from vmstorage .

The splitting gives the following benefits:

Each service may scale independently.

Each service may run on hardware ideally optimized for the service needs.

Heavy inserts don’t interfere with heavy selects.

Bugs in vminsert don't break vmselect and vice versa.

don't break and vice versa. Better vmstorage durability, since it offloads complex querying logic to vmselect .

Now VictoriaMetrics runs in Google Cloud. We use Infrastructure as Code approach for resource management and provisioning via Deployment Manager.

Query engine

Initially vmselect provided Prometheus remote read API. But this was suboptimal, since the API required transferring all the raw data points to Prometheus for each query. For instance, if Prometheus builds response over 1K metrics with 10K data points each, then vmselect should send 1K*10K = 10M data points to Prometheus on each query. This is a waste of egress traffic and money. So later the remote read API has been replaced by query engine fully compatible with PromQL.

The query engine supports additional features aimed towards complex queries’ simplification. Below are a few examples:

WITH templates resembling common table expressions:

WITH (

commonFilters = {job=~"$job", instance=~"$instance"}

) node_filesystem_size_bytes{commonFilters} / node_filesystem_avail_bytes{commonFilters}

Read more about WITH templates and play with them on the WITH templates playground.

Many useful functions. For instance, union function for combining query results:

union(

node_filesystem_size_bytes,

node_filesystem_avail_bytes,

)

The full list of additional functions is available here.

Performance facts

Initial tests show that VictoriaMetrics uses 10x less storage space comparing to Prometheus 2.0–0.4 bytes per data point vs 4 bytes per data point in our case. Data point is (timestamp, metric_value) tuple.

A single vmstorage service accepts up to 4 million data points per second on 8xCPU server.

service accepts up to 4 million data points per second on 8xCPU server. A single vmselect service scans up to 500 million data points per second on 8xCPU server.

service scans up to 500 million data points per second on 8xCPU server. VictoriaMetrics uses 70x times less storage space comparing to TimescaleDB on test data from Time Series Benchmark Suite — 250MB vs 18GB. The test data consists of 1B data points — see TSBS description on GitHub.

There is a room for performance improvements. All the VictoriaMetrics services are equipped with pprof handler, so we are ready to tune their performance on production workload.

VictoriaMetrics features

Supports full PromQL plus PromQL extensions with WITH templates. PromQL extensions may be tried on Grafana playground.

templates. PromQL extensions may be tried on Grafana playground. Easy setup — just copy-n-paste the provided remote_write URL to Prometheus config.

Reduced operational costs. Prometheus may be converted into stateless service after enabling remote write to VictoriaMetrics.

Wide range of retention periods is available — from 1 month to 5 years.

Insert performance scales to millions metric values per second.

Select performance scales to billions metric values per second.

Storage scales to trillions of metric values and millions of unique metrics (aka high cardinality ).

). Provides global querying view across arbitrary number of distinct Prometheus instances if they use the same remote_write URL.

Who may benefit from VictoriaMetrics?

Anybody who uses Prometheus. Just set up VictoriaMetrics as a remote storage and stop bothering about local storage operational issues such as backups, replication, capacity planning and other maintenance burdens.

Users deploying Prometheus in Kubernetes. Currently, such users should carefully manage persistent volumes for Prometheus local storage. Usually, they set up Prometheus as a stateful app, which may limit Kubernetes in scheduling decisions. Just use VictoriaMetrics as a remote storage and run Prometheus as a stateless app.

Users with many distinct Prometheus instances located in distinct networks/datacenters. VictoriaMetrics provides global querying view across all the Prometheus instances.

Future features

We are planning to implement the following features in the future:

Automatic downsampling of old data.

Last values for the given label filters.

Time-windowed counters.

Conclusion

We are sure VictoriaMetrics will become the best remote storage for Prometheus.

Continue exploring it. Read the FAQ. Register at VictoriaMetrics playground, use it as a test remote storage for your Prometheus instances. It is absolutely safe, since Prometheus continues writing data into local storage alongside remote storage, so your local data isn’t lost when enabling remote storage. See Quick Start for more details.

Edit dashboards and graphs on Grafana playground. This playground uses VictoriaMetrics datasource pointing to internal metrics of VictoriaMetrics playground, so all the features from PromQL extensions are available there.

Production VictoriaMetrics will be available soon. Stay tuned and spread the word about it!

Update: Docker images with single-server VictoriaMetrics are available here. If you don’t like Docker, then just use the correponding static binaries.

Update2: Read our new post — High-cardinality TSDB benchmarks: VictoriaMetrics vs TimescaleDB vs InfluxDB.

Update3: VictoriaMetrics is open source now!

Join our community Slack and read our weekly Faun topics ⬇

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇