These are the most important Elasticsearch techniques I’ve had improve write throughput for day-to-day operation.

Use the Bulk API

The bulk API makes it possible to perform multiple index operations in a single API call. The number of documents you index in a bulk depends on the size or simplicity of your JSON documents. 100 is a good number to start at and tune from there.

Set index.translog.durability to Async

When a document is indexed in Elasticsearch, it’s first written to write ahead log file called the translog. When the translog is flushed (by default is flushed after every index, delete, update, or bulk request, or when the translog becomes a certain size, or after a time interval) Elasticsearch then persists the data to disk during a Lucene commit, an expensive operation. By setting this to async you’re telling Elasticsearch not to commit after every request. This makes a tremendous boost to write performance.

The tradeoff is that if there’s a hardware failure you may lose data that hasn’t been committed yet (up to 5s or 512mb worth of data by default, you could decrease the index.translog.interval or index.translog.flush_threshold_size respectively.)

P.S. I found this late one night as I worked to speed-up our write throughput because our Kafka consumer lagged after an unexpected load increase. After I changed the index.translog.durability, I couldn’t help but stay up and watch the lag fall.

Increase index.refresh_interval

Changes made to an index aren’t available until Elasticsearch performs a refresh operation, another expensive operation. This happens on a 1s interval by default, but even increasing that to 5s can make a huge difference. 30s and above and you’ll probably start to see diminishing returns.

Use Nginx as a Proxy for Persistent Connections and Load Balancing

With Nginx in front of your Elasticsearch client nodes, you can have Nginx keep persistent connections rather than hammering Elasticsearch and having it deal with the stress of opening and closing a connection for each request. And with Nginx you can load balance across multiple client nodes.

Split Data, Master, and Client Nodes

Elasticsearch nodes can have any 3 of the following roles:

Master: Controls the cluster (which nodes should this shard be stored on, etc.)

Data: Holds data and performs data related operations like CRUD, search and aggregations.

Client: Forwards requests to master and data nodes, handles search reduce phase and distributes bulk indexing.

The data and client nodes will eat up resources under pressure and it’s important that the master always gets the little resources it needs to remain stable. So consider dedicating each node to a specific role. Having dedicated client nodes also makes it easy to add and remove master and data nodes.

Use Multiple Disks

You can set multiple paths on each data node’s path.data setting to increase the total storage space and I/O performance. It’s a bit safer than RAID–0 as well since with RAID–0 if any disk fails you’ll lose all shards. With multiple path.data, Elasticsearch stores all files related to a given shard on one disk. If a disk fails only the shards on that disk will be lost. Make sure you have at least 1 replica for your indices too of course.

Compression

Elasticsearch’s HTTP API doesn’t allow compressed requests by default. Using compression will lower write performance slightly, but you’ll probably want to compress to save bandwidth and prevent saturation.

Monitoring With Plugins

HQ is the monitoring plugin I find most useful. Its node diagnostics page shows a table of metrics for each node and also uses thresholds to judge and indicate how things are looking.

Thanks for reading and hope these help you. Let me know if you’ve heard of, tried anything else, or have any questions.