If you’ve spent time trying to ramp up on Elasticsearch and configure a local cluster, you may be wondering if there’s a better way. Perhaps you’re in need of solid advice, and maybe you’d like to find an easier path. In this article, we summarize a number of best practices in managing an ES cluster. Also, we provide many links to specific information that you can find in our extensive blog and knowledge base.

Choose the Right Number of Shards

Specifying too few shards per index will keep you from exploiting the full potential of your cluster. However, you risk performance issues by increasing the number of shards too much. OK then, you may ask: How many is too many? Read all about it in our hot article, Optimizing Elasticsearch: How Many Shards per Index?.

A Test Cluster is Essential

If you’re doing anything other than trivial experimentation, you will most definitely need a test cluster. Elasticsearch has an extensive array of configuration options and rich APIs, and we strongly recommend that you experiment with different approaches as you seek to find the best solution for your app stack. You can try different settings and look for variations in behavior. If you’re running locally, try making changes in both the elasticsearch.yml configuration file and using the cluster settings API.

Also, don’t forget to test failure modes. These are questions that are important to answer: How many nodes can you lose and still accept reads and writes? What happens if one of your nodes runs out of disk space? What happens if you set ES_HEAP_SIZE too small?

We also want you to know that there’s another way. A path of less resistance. You could avoid much of this tedium and ease your administration burden by migrating to hosted Elasticsearch services. We offer a number of tools and support options, including the following:

Qbox also offers troubleshooting guidance and free 24/7 support.

Practice Cluster Restarts and Node Outages

Since cluster restarts can be common, you’ll want to do rolling restarts so that you minimize or downtime. You’ll need to do these for most configuration changes, and whenever you upgrade Elasticsearch versions. This is also an important consideration if your dataset grows and you need to scale. Read more in our article Thoughts on Launching and Scaling Elasticsearch.

Size your Cluster Carefully

Since Elasticsearch is resource intensive, you may not be able to justify as much hardware for your test cluster. We strongly recommend that your test cluster should be as close to the configuration as your production cluster.

It should be as similar as possible, aiming for the same VM size and type as the production cluster. Remember, you can run multiple nodes on the same machine—if absolutely necessary.

Be Generous with Memory

If you have a large dataset, try to allocate as much memory as you can. The adequate amount of memory varies with application type and load, so it’s important to measure memory usage on your cluster from the very beginning.

We recommend these articles for thoughtful guidance on sizing your cluster:

Setting Up Visual Tools

For quick troubleshooting and frequent stats, it’s good to have a easy-to-use dashboard to check the current state of our clusters. We highly recommend the Kopf tool.

Avoid Routing on Data Nodes

There are several options for routing in Elasticsearch. One popular option is to place a round-robin proxy in front of all your nodes. However, if your cluster will experience intensive use, then it’s better to handle the routing with a dataless node.

The reason non-data node routing is more effective (at scale) than a simple round-robin HTTP proxy is because a dataless node will have a copy of the cluster state—a table of shards and the corresponding nodes. Since it knows the state of the entire cluster, the dataless node will know the specific node(s) that will get the request.

For the case in which a simple HTTP proxy is placed in front of data nodes, the request goes to whichever data node (usually random or round-robin). The node that receives the request must examine its state and then perform the search locally or pass the request off to the appropriate node.

We hope that you find this article helpful, and we invite you to make comments below.