Elasticsearch is an open source search and analytic engine based on Apache Lucene that allows users to store, search, analyze data in near real time. Pronto, the platform that hosts Elasticsearch clusters at eBay, makes it easy for eBay internal customers to deploy, operate, and scale Elasticsearch for full-text search, real-time analysis, and log/event monitoring. Today there are 60+ Elasticsearch clusters and 2000+ nodes managed by Pronto. The daily ingestion reaches 18 billion documents, and daily search requests reach 3.5 billion. The platform offers a full spectrum of value from provision, remediation, and security to monitoring, alerting, and diagnostics.

While Elasticsearch is designed for fast queries, the performance depends largely on the scenarios that apply to your application, the volume of data you are indexing, and the rate at which applications and users query your data. This document summarizes the challenges as well as the process and tools that the Pronto team builds to address the challenges in a strategic way. It also shows certain results of benchmarking various configurations for illustration.

Challenges

The challenges for the Pronto/Elasticsearch use cases observed so far include:

High throughput: Some clusters have up to 5TB data ingested per day, and some clusters take more than 400 million search requests per day. Requests would accumulate at upstream if Elasticsearch could not handle them in time. Low search latency: For performance-critical clusters, especially for site-facing systems, a low search latency is mandatory, otherwise user experience would be impacted. Optimal settings always change since the data or query is mutable. There is no optimal setting for all scenarios. For example, splitting an index into more shards would be good for time-consuming queries, but it may hurt other query performance.

Solutions

To help our customer address these challenges, the Pronto team builds strategic ways for performance testing, tuning, and monitoring, starting from user case onboarding and continuing throughout the cluster life cycle.

Sizing: Before a new use case comes onboard, collect customer-provided information like throughput, document size, document count, and search type to estimate the Elasticsearch cluster initial size. Optimize index design: Review the index design with the customer. Tune index performance: Tune indexing performance and search performance based on the user scenario. Tune search performance: Run performance tests with user real data/query, compare and analyze test results with combinations of Elasticsearch configuration parameters. Run performance tests: After the case is onboard, the cluster is monitored, and users are free to re-run performance test whenever data changes, query changes, or traffic increases.

Sizing

The Pronto team runs benchmarks for every type of machine and every supported Elasticsearch version to gather performance data, and then uses it with customer-provided information to estimate cluster initial size, including:

Indexing throughput

Document size

Search throughput

Query type

Hot index document count

Retention policy

Response time requirement

SLA level

Optimize index design

Let’s think twice before starting to ingest data and run queries. What does the index stand for? The Elastic official answer is “a collection of documents that have somewhat similar characteristics.” So, the next question is “Which characteristics should I use to group my data? Should I put all documents into one index or multiple indices?” The answer is that it depends on the query you used. Below are some suggestions about how to organize indices according to the query that's used most frequently.

Split your data into multiple indices if your query has a filter field and its value is enumerable. For example, you have a lot of global products information ingested to Elasticsearch, most of your queries have a filter clause “region,” and there are few chances to run cross-region queries. A query body can be optimized:

{ "query": { "bool": { "must": { "match": { "title": "${title}" } }, "filter": { "term": { "region": "US" } } } } }

Under this scenario, we can get better performance if the index is split into several smaller indices based on region, like US, Euro, and others. Then the filter clause can be removed from the query. If we need to run a cross-region query, we can just pass multiple indices or wildcards to Elasticsearch.

Use routing if your query has a filter field and its value is not enumerable. We can split the index into multiple shards by using the filter field value as a routing key and removing the filter.

For example, there are millions of orders ingested to Elasticsearch, and most queries need to query orders by buyer ID. It’s impossible to create an index for every buyer, so we cannot split data into multiple indices by buyer ID. A proper solution is to use routing to put all orders with the same buyer ID into the same shard. Then nearly all of your queries can be completed within the shard matching the routing key.

Organize data by date if your query has a date range filter. This works for most logging or monitoring scenarios. We can organize indices by daily, weekly, or monthly, and then we can get an index list by a specified date range. Elasticsearch only needs to query on a smaller data set instead of the whole data set. Plus, it would be easy to shrink/delete old indices when data has expired.

Set mapping explicitly. Elasticsearch can create mapping dynamically, but it might be not suitable for all scenarios. For example, the default string field mappings in Elasticsearch 5.x are both "keyword" and "text" types. It's unnecessary in a lot of scenarios.

Avoid imbalanced sharding if documents are indexed with user-defined ID or routing. Elasticsearch uses a random ID generator and hash algorithm to make sure documents are allocated to shards evenly. When you use a user-defined ID or routing, the ID or routing key might not be random enough, and some shards may be obviously bigger than others. In this scenario, a read/write operation on this shard would be much slower than others. We can optimize the ID/routing key or use index.routing_partition_size (available in 5.3 and later).

Make shards distributed evenly across nodes. If one node has more shards than other nodes, it will take more load than other nodes and may become the bottle neck of whole system.

Tune indexing performance

For indexing heavy scenarios like logging and monitoring, indexing performance is the key metric. Here are some suggestions.

Use bulk requests .

. Use multiple threads/works to send requests.

Increase the refresh interval. Every time a refresh event happens, Elasticsearch creates a new Lucene segment and merges them later. Increasing the refresh interval would reduce the cost of creating/merging. Please note, documents only can be available for search after a refresh happens.

Relationship between performance and refresh interval

From the above diagram, we can see that the throughput increased and response time decreased as the refresh interval increased. We can use the request below to check how many segments we have and how much time is spent on refresh and merge.

Index/_stats?filter_path= indices.**.refresh,indices.**.segments,indices.**.merges

Reduce replica number. Elasticsearch needs to write documents to the primary and all replica shards for every indexing request. Obviously, a big replica number would slow down indexing speed, but on the other side, it would improve search performance. We will talk about it later in this article.

Relationship between performance and replica number

From above diagram, we can see that throughput decreased and response time increased as the replica number increased.

Use auto generated IDs if possible. An Elasticsearch auto generated ID is guaranteed to be unique to avoid version lookup. If a customer really needs to use a self-defined ID, our suggestion is to pick an ID that is friendly to Lucene, such as zero-padded sequential IDs, UUID-1, or Nano time. These IDs have consistent, sequential patterns that compress well. In contrast, IDs such as UUID-4 are essentially random, which offers poor compression and slows Lucene down.

Tune search performance

A primary reason for using Elasticsearch is to support searches through data. Users should be able to quickly locate the information they are looking for. Search performance depends on quite a few factors.

Use filter context instead of query context if possible. A query clause is used to answer “How well does this document match this clause?” A filter clause is used to answer “Does this document match this clause?” Elasticsearch only needs to answer “Yes” or “No.” It does not need to calculate a relevancy score for a filter clause, and the filter results can be cached. See Query and filter context for details.

Compare between query and filter

Increase refresh interval. As we mentioned in the tune indexing performance section, Elasticsearch creates new segment every time a refresh happens. Increasing the refresh interval would help reduce the segment count and reduce the IO cost for search. And, the cache would be invalid once a refresh happens and data is changed. Increasing the refresh interval can make Elasticsearch utilize cache more efficiently.

Increase replica number. Elasticsearch can perform a search on either a primary or replica shard. The more replicas you have, the more nodes can be involved in your search.

Relationship between performance and replica number

From above diagram, you can see the search throughput is nearly linear to the replica number. Note in this test, the test cluster has enough data nodes to ensure every shard has an exclusive node. If this condition cannot be satisfied, search throughput would not be as good.

Try different shard numbers. "How many shards should I set for my index?" This may be the most frequently question we've seen. Unfortunately, there is no correct number for all scenarios. It fully depends on your case.

Too small a shard number would make the search unable to scale out. For example, if the shard number is set to 1, all documents in your index would be stored in one shard. For every search, only one node can be involved. It’s time consuming if you have a lot of documents. From another side, creating an index with too many shards is also harmful to performance, because Elasticsearch needs to run queries on all shards, unless a routing key is specified in the request, then fetch and merge all returned results together.

From our experience, if the index is smaller than 1G, it’s fine to set the shard number to 1. For most scenarios, we can leave the shard number as the default value 5, but if shard size exceeds 30GB, we should increase the shard number to split the index into more shards. The shard number cannot be changed once an index is created, but we can create a new index and use the reindex API to move data.

We tested an index that has 100 million documents and is about 150GB. We used 100 threads to send search requests.

Relationship between performance and shard number

From above diagram, we can see the optimized shard number is 11. Search throughput increased (Response time decrease) at the beginning, but decreased (Response time increase) as the shard number keeps increasing.

Note that in this test, just like in the replica number test, every shard has an exclusive node. If this condition cannot be satisfied, search throughput would not be as good as this diagram.

In this case, we would like to recommend you try a shard number less than the optimized value, since it would need a lot of nodes if you use big shard number, and make every shard have an exclusive data node.

Node query cache. Node query cache only caches queries that are being used in a filter context. Unlike a query clause, a filter clause is a "Yes" or "No" question. Elasticsearch used a bit set mechanism to cache filter results, so that later queries with the same filter will be accelerated. Note that only segments that hold more than 10,000 documents (or 3% of the total documents, whichever is larger) will enable a query cache. For more details, see All about caching.

We can use the following request to check whether a node query cache is having an effect.

GET index_name/_stats?filter_path=indices.**.query_cache { "indices": { "index_name": { "primaries": { "query_cache": { "memory_size_in_bytes": 46004616, "total_count": 1588886, "hit_count": 515001, "miss_count": 1073885, "cache_size": 630, "cache_count": 630, "evictions": 0 } }, "total": { "query_cache": { "memory_size_in_bytes": 46004616, "total_count": 1588886, "hit_count": 515001, "miss_count": 1073885, "cache_size": 630, "cache_count": 630, "evictions": 0 } } } } }

Shard query cache. If most of the queries are aggregate queries, we should look at the shard query cache, which can cache the aggregate results so that Elasticsearch will serve the request directly with little cost. There are several things to take care with: Set "size":0. A shard query cache only caches aggregate results and suggestion. It doesn't cache hits, so that if you set size to non-zero, you cannot benefit from caching. Payload JSON must be the same. A shard query cache uses JSON body as the cache key, so you need to ensure the JSON body doesn't change and make sure the keys in the JSON body are in the same order. Round your date time. Do not use variables like Date.now in your query directly. Round it. Otherwise you will have a different payload body for every request, which makes the cache always invalid. We suggest you round your date time to hour or day to utilize a cache more efficiently.

If most of the queries are aggregate queries, we should look at the shard query cache, which can cache the aggregate results so that Elasticsearch will serve the request directly with little cost. There are several things to take care with:

We can use the request below to check whether the shard query cache has an effect.

GET index_name/_stats?filter_path=indices.**.request_cache { "indices": { "index_name": { "primaries": { "request_cache": { "memory_size_in_bytes": 0, "evictions": 0, "hit_count": 541, "miss_count": 514098 } }, "total": { "request_cache": { "memory_size_in_bytes": 0, "evictions": 0, "hit_count": 982, "miss_count": 947321 } } } } }

Retrieve only necessary fields. If your documents are large, and you need only a few fields, use stored_fields to retrieve fields you need instead of all fields.

Avoid searching stop words. Stop words like “a" and "the” may cause the query hit results count to explode. Image you have a million documents. Searching for “fox” may return tens of hits, but searching “the fox” may return all documents in your index since “the” appeared in nearly all documents. Elasticsearch needs to score and sort all hit results, so that a query like “the fox” slows down whole system. You can use the stop token filter to remove stop words, or use the “and” operator to change the query from “the fox” to “the AND fox” to get more a precise hit result.

If some words are frequently used in your index but not in the default stop words list, you can use cutoff-frequency to handle them dynamically.

Sort by _doc if you don’t care about the order in which documents are returned. Elasticsearch uses the “_score” field to sort by score as default. If you don’t care about the order, you can use “sort”: “_doc” to let Elasticsearch return hits by index order.

Avoid using a script query to calculate hits in flight. Store the calculated fields when indexing. For example, we have an index with a lot of user information, and we need to query all users whose number start with "1234." You might want to run a script query like "source": "doc['num'].value.startsWith('1234')." This query is really resource-consuming and slows down the whole system. Consider adding a field named "num_prefix" when indexing. Then we can just query "name_prefix": "1234."

Avoid wildcard queries.

Run performance tests

For every change, it’s necessary to run performance tests to verify whether the change is applicable. Because Elasticsearch is a restful service, you can use tools like Rally, Apache Jmeter, and Gatling to run performance tests. Because the Pronto team needs to run a lot of benchmark tests on every type of machines and Elasticsearch versions, and we need to run performance tests for combinations of Elasticsearch configuration parameters on many Elasticsearch clusters, these tools cannot satisfy our requirements.

The Pronto team built an online performance analysis service based on Gatling to help customers and us run performance tests and do regression. The features provided by the service allows us to:

Easily add/edit tests. Users can generate tests according to user input query or document structure, without Gatling or Scala knowledge. Run multiple tests in sequence without human involvement. It can check status and change Elasticsearch settings before/after every test. Help users compare and analyze test result analysis. Test results and cluster statistics during testing are persisted and can be analyzed by predefined Kibana visualizations. Run tests from command line or web UI. The Rest API is also provided for each integration with other systems.

Here is the architecture.

Performance test service architecture (click to enlarge diagram)

Users can view Gatling reports for every test and view Kibana predefined visualizations for further analysis and comparison, as shown below.

Gatling report Gatling report Gatling report

Summary

This article summarizes the index/shard/replica design as well as some other configurations that you should consider when designing an Elasticsearch cluster to meet the high expectation of ingestion and search performance. It also illustrates how Pronto strategically helps customers to do initial sizing, index design and tuning, and performance testing. As of today, the Pronto team has helped a number of customers, including Order Management System (OMS) and Search Engine Optimization (SEO), to achieve their demanding performance goals and thus contribute to eBay's key business.

Elasticsearch performance depends on a lot of factors, including document structure, document size, index settings/mappings, request rate, dataset size, query hit count, and so on. A recommendation for one scenario does not necessarily work for another one. It’s important to test performance thoroughly, gather telemetry, tune the configuration based on your workloads, and optimize the clusters to meet your performance requirement.

Editor's note: The Chinese version of this article is posted on InfoQ.