Elasticsearch is one of the most widely used distributed systems out there; ranked #11 amongst all database systems and the #1 search engine. At Appbase.io, we offer a hosted streaming Elasticsearch API for building great search, geolocation and feed experiences.

We first wrote on the topic of scaling writes in the real world in 2015, benchmarking over 100,000 writes per second on a 18-node cluster comprised of C4.2xlarge EC2 machines.

In the two years since,

Elasticsearch has had two major version releases — 2.x and 5.x, with v6.0.0 available today as an alpha release. Elasticsearch now has a comprehensive macro benchmarking suite for measuring different performance metrics in the Rally project. Docker containers are emerging as the new standard for distributing software, even stateful ones like database systems.

For the first time, we are announcing 📢 a preview version of appbase.io that runs as a container within a server environment of your choosing.

In building this container version, we ended up designing an automated test suite on top of Rally for benchmarking indexing rates for

a.) measuring our streaming latency numbers and,

b.) to better understand how an appbase.io streaming proxy would affect the Elasticsearch cluster throughput.

Benchmarking: 3, 2, 1 .. GO

When we did a similar test in 2015, our calculation indicated that it would require a 150 node cluster to hit a sustained 1 Million writes per second. After all, it took Cassandra — a write optimized database system some 300 nodes to do the same in 2014.

Boy, we were in for a big surprise!

Image: Median indexing rate for ES v5.4 against the geo points dataset of 180MM records.

Scaling to 1 Million writes per second required a modest setup of 15 m4.2xlarge instances running a 15-node Elasticsearch cluster on docker containers.

We observed a near perfect linear scalability of writes as we scaled the number of cluster nodes. In the rest of the post, we will cover the test setup, data set used, and cluster settings that we used to hit these numbers. We also ran a series of other benchmarks for comparison and we will also go over them.

The Test

We use the current production version of Elasticsearch, v5.4 for running the actual indexing cluster.

We use Rally as our test benchmarking framework, Elasticsearch itself uses Rally for tracking a variety of performance metrics. benchmarks.elastic.co is a good place to start.

Image: Test tracks on esrally, captured from benchmarks.elastic.io.

Rally is set up on a benchmark master node of C4.8xlarge type, which uses 96 client threads to generate requests to the Elasticsearch cluster.

The track we picked for benchmarking the setup is Geopoint, it consists of 60.8 Million POIs from PlanetOSM.

We triplicated this dataset to get to 180 Million points to ensure that we can stabilize the write throughput when running on a 15-node setup.

The average data size is 40 bytes, and the mapping for the data looks like:

{

"type":{

"dynamic":"strict",

"properties":{

"location":{

"type":"geo_point"

}

}

}

}

and an individual data record to be indexed looks like

{"location": [-0.1485188, 51.5250666]}

We have created an automated suite on top of Rally that:

uses docker-machine to spin up EC2 instances and runs Elasticsearch in them, uses docker-machine to set up an EC2 instance for the benchmark master and runs Rally against the cluster, has a set of switches to customize configurations, repeat the benchmarks, parallelize the tests and print meta data reports on the dataset.

As far as tuning Elasticsearch itself goes, we have used the default heap size (2 GB in v5.x) and set the shards equal to the number of nodes in the cluster and used a replication factor of 1, i.e. one replica copy for each primary shard.

Comparisons

We also ran some comparison benchmarks to find out how we would fare on indexing a dataset of a different size and a more complex structure, or using a different major version of Elasticsearch.

Elasticsearch v5.4 v/s v2.4

We use the same Geopoint dataset for measuring throughput difference between two major versions of Elasticsearch.

Image: Median indexing rate difference between ES v5.4 and ES v2.4

As nodes in the cluster increase, Elasticsearch v5.4 outperforms v2.4 by a 20% margin. But overall, ES v2 does a solid job. If you are already using ES v2 in production, you might just want to wait a bit longer for the ES v6.x general candidate before making the switch ;).

A different dataset: NYC Taxis

Next, we take a very different dataset that is highly structured in comparison, 165M taxi rides performed by Yellow and Green Cabs in NYC in December, 2015. The average size of a document in this dataset is 489 bytes (~12x the size of Geopoint).

In the graph below, we compare the NYC Taxis (ES v5.4) benchmarking against the Geopoints (ES v5.4, referencing the original run).

Image: Comparing indexing rates b/w Geopoints dataset and NYC taxis.

When indexing this dataset, we observe a 2–3x throughput difference from Geopoints. However, the linear scalability still holds. It seems that with 36 nodes of m4.2xlarge instances, we would have hit 1M writes per second.

We also compared against the Geonames dataset with similar results.