Unless you are using Elasticsearch for development and testing, creating and maintaining an Elasticsearch cluster will be a task that will occupy quite a lot of your time. Elasticsearch is an extremely powerful search and analysis engine, and part of this power lies in the ability to scale it for better performance and stability.

This Elasticsearch tutorial will provide some information on how to set up and run an Elasticsearch cluster and will add some operational tips and best practices to help you get started. It should be stressed though that each Elasticsearch setup will likely differ from one another depending on multiple factors, including the workload on the servers, the amount of indexed data, hardware specifications, and even the experience of the operators.

What Is an Elasticsearch Cluster?

As the name implies, an Elasticsearch cluster is a group of one or more Elasticsearch nodes instances that are connected together. The power of an Elasticsearch cluster lies in the distribution of tasks, searching and indexing, across all the nodes in the cluster.

The nodes in the Elasticsearch cluster can be assigned different jobs or responsibilities:

Data nodes - stores data and executes data-related operations such as search and aggregation.

Master nodes - in charge of cluster-wide management and configuration actions such as adding and removing nodes.

Client nodes - forwards cluster requests to the master node and data-related requests to data nodes.

Ingest nodes - for pre-processing documents before indexing.

By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment.

When installed, a single Elasticsearch node will form a new single-node cluster entitled "elasticsearch," but, as we shall see later on in this article, it can also be configured to join an existing cluster using the cluster name. Needless to say, these nodes need to be able to identify each other to be able to connect.

Installing an Elasticsearch Cluster

As always, there are multiple ways of setting up an Elasticsearch cluster. You can use a configuration management tool such as Puppet or Ansible to automate the process. In this case, though, we will be showing you how to manually set up a cluster consisting of one master node and two data nodes, all on Ubuntu 16.04 instances on AWS EC2 running in the same VPC. The security group was configured to enable access from anywhere using SSH and TCP 5601 (Kibana).

Installing Java

Elasticsearch is built on Java and requires at least Java 8 (1.8.0_131 or later) to run. Our first step, therefore, is to install Java 8 on all the nodes in the cluster. Please note that the same version should be installed on all Elasticsearch nodes in the cluster.

Repeat the following steps on all the servers designated for your cluster.

First, update your system:

sudo apt-get update

Then, install Java with:

sudo apt-get install default-jre

Checking your Java version now should give you the following output or similar:

openjdk version "1.8.0_151" OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12) OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)

Installing Elasticsearch Nodes

Our next step is to install Elasticsearch. As before, repeat the steps in this section on all your servers.

First, you need to add Elastic's signing key so that the downloaded package can be verified (skip this step if you've already installed packages from Elastic):

wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

For Debian, we need to then install the apt-transport-https package:

sudo apt-get install apt-transport-https

The next step is to add the repository definition to your system:

echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

All that's left to do is to update your repositories and install Elasticsearch:

sudo apt-get update sudo apt-get install elasticsearch

Configuring the Elasticsearch Cluster

Our next step is to set up the cluster so that the nodes can connect and communicate with each other.

For each node, open the Elasticsearch configuration file:

sudo vim /etc/elasticsearch/elasticsearch.yml

This file is quite long and contains multiple settings for different sections. Browse through the file, and enter the following configurations (replace the IPs with your node IPs):

#give your cluster a name. cluster.name: my-cluster #give your nodes a name (change node number from node to node). node.name: "es-node-1" #define node 1 as master-eligible: node.master: true #define nodes 2 and 3 as data nodes: node.data: true #enter the private IP and port of your node: network.host: 172.11.61.27 http.port: 9200 #detail the private IPs of your nodes: discovery.zen.ping.unicast.hosts: ["172.11.61.27", "172.31.22.131","172.31.32.221"]

Save and exit.

Running Your Elasticsearch Cluster

You are now ready to run Elasticsearch verify the nodes are communicating with each other as a cluster.

For each instance, run the following command:

sudo service elasticsearch start

If everything was configured correctly, your Elasticsearch cluster should be up and running. To verify everything is working as expected, query Elasticsearch from any of the cluster nodes:

curl -XGET 'http://localhost:9200/_cluster/state?pretty'

The response should detail the cluster and its nodes:

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 { "cluster_name" : "my-cluster", "compressed_size_in_bytes" : 351, "version" : 4, "state_uuid" : "3LSnpinFQbCDHnsFv-Z8nw", "master_node" : "IwEK2o1-Ss6mtx50MripkA", "blocks" : { }, "nodes" : { "IwEK2o1-Ss6mtx50MripkA" : { "name" : "es-node-2", "ephemeral_id" : "x9kUrr0yRh--3G0ckESsEA", "transport_address" : "172.31.50.123:9300", "attributes" : { } }, "txM57a42Q0Ggayo4g7-pSg" : { "name" : "es-node-1", "ephemeral_id" : "Q370o4FLQ4yKPX4_rOIlYQ", "transport_address" : "172.31.62.172:9300", "attributes" : { } }, "6YNZvQW6QYO-DX31uIvaBg" : { "name" : "es-node-3", "ephemeral_id" : "mH034-P0Sku6Vr1DXBOQ5A", "transport_address" : "172.31.52.220:9300", "attributes" : { } } }, …

Elasticsearch Cluster Configurations for Production

We already defined the different roles for the nodes in our cluster, but there are some additional recommended settings for a cluster running in a production environment.

Avoiding "Split Brain"

A "split-brain" situation is when communication between nodes in the cluster fails due to either a network failure or an internal failure with one of the nodes. In this kind of scenario, more than one node might believe it is the master node, leading to a state of data inconsistency.

For avoiding this situation, we can make changes to the discovery.zen.minimum_master_nodes directive in the Elasticsearch configuration file which determines how many nodes need to be in communication (quorum) to elect a master.

A best practice to determine this number is to use the following formula to decide this number: N/2 + 1 . N is the number of master eligible nodes in the cluster. You then round down the result to the nearest integer.

In the case of a cluster with three nodes, then:

discovery.zen.minimum_master_nodes: 2

Adjusting JVM Heap Size

To ensure Elasticsearch has enough operational leeway, the default JVM heap size (min/max 1 GB) should be adjusted.

As a rule of the thumb, the maximum heap size should be set up to 50% of your RAM, but no more than 32GB (due to Java pointer inefficiency in larger heaps). Elastic also recommends that the value for maximum and minimum heap size be identical.

These value can be configured using the Xmx and Xms settings in the jvm.options file.

On DEB:

sudo vim /etc/elasticsearch/jvm.options -Xms2g -Xmx2g

Disabling Swapping

Swapping out unused memory is a known behavior but, in the context of Elasticsearch, can result in disconnects, bad performance, and, in general, an unstable cluster.

To avoid swapping, you can either disable all swapping (recommended if Elasticsearch is the only service running on the server), or you can use it to lock the Elasticsearch process to RAM.

To do this, open the Elasticsearch configuration file on all nodes in the cluster:

sudo vim /etc/elasticsearch/elasticsearch.yml

Uncomment the following line:

bootstrap.mlockall: true

Next, open the /etc/default/elasticsearch file:

sudo vim /etc/default/elasticsearch

Make the following configurations:

MAX_LOCKED_MEMORY=unlimited

Restart Elasticsearch when you're done.

Adjusting Virtual Memory

To avoid running out of virtual memory, increase the amount of limits on mmap counts:

sudo vim /etc/sysctl.conf

Update the relevant settings accordingly:

vm.max_map_count=262144

On DEB/RPM, this setting is configured automatically.

Increasing Open File Descriptor Limit

Another important configuration is the limit of open file descriptors. Since Elasticsearch makes use of a large number of file descriptors, you must ensure the defined limit is enough, otherwise, you might end up losing data.

The common recommendation for this setting is 65,536 and higher. On DEB/RPM the default settings are already configured to suit this requirement but you can of course fine tune it.

sudo vim /etc/security/limits.conf

Set the limit:

- nofile 65536

Elasticsearch Cluster APIs

Elasticsearch supports a large number of cluster-specific API operations that allow you to manage and monitor your Elasticsearch cluster. Most of the APIs allow you to define which Elasticsearch node to call using either the internal node ID, its name, or its address.

Below is a list of a few of the more basic API operations you can use. For advanced usage of cluster APIs, read this blog post.

Cluster Health

This API can be used to see general information on the cluster and gauge its health:

curl -XGET 'localhost:9200/_cluster/health?pretty'

Response:

{ "cluster_name" : "my-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 0, "active_shards" : 0, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }

Cluster State

This API can be sued to see a detailed status report on your entire cluster. You can filter results by specifying parameters in the call URL.

curl -XGET 'localhost:9200/_cluster/state?pretty'

Response:

{ "cluster_name" : "my-cluster", "compressed_size_in_bytes" : 347, "version" : 4, "state_uuid" : "uMi5OBtAS8SSRJ9hw1-gUg", "master_node" : "sqT_y5ENQ9SdjHiE0oco_g", "blocks" : { }, "nodes" : { "sqT_y5ENQ9SdjHiE0oco_g" : { "name" : "node-1", "ephemeral_id" : "-HDzovR0S0e-Nn8XJ-GWPA", "transport_address" : "172.31.56.131:9300", "attributes" : { } }, "mO0d0hYiS1uB--NoWuWyHg" : { "name" : "node-3", "ephemeral_id" : "LXjx86Q5TrmefDoq06MY1A", "transport_address" : "172.31.58.61:9300", "attributes" : { } }, "it1V-5bGT9yQh19d8aAO0g" : { "name" : "node-2", "ephemeral_id" : "lCJja_QtTYauP3xEWg5NBQ", "transport_address" : "172.31.62.65:9300", "attributes" : { } } }, "metadata" : { "cluster_uuid" : "8AqSmmKdQgmRVPsVxyxKrw", "templates" : { }, "indices" : { }, "index-graveyard" : { "tombstones" : [ ] } }, "routing_table" : { "indices" : { } }, "routing_nodes" : { "unassigned" : [ ], "nodes" : { "it1V-5bGT9yQh19d8aAO0g" : [ ], "sqT_y5ENQ9SdjHiE0oco_g" : [ ], "mO0d0hYiS1uB--NoWuWyHg" : [ ] } }, "snapshots" : { "snapshots" : [ ] }, "restore" : { "snapshots" : [ ] }, "snapshot_deletions" : { "snapshot_deletions" : [ ] } }

Cluster Stats

Extremely useful for monitoring performance metrics on your entire cluster:

curl -XGET 'localhost:9200/_cluster/stats?human&pretty'

Response:

{ "_nodes" : { "total" : 3, "successful" : 3, "failed" : 0 }, "cluster_name" : "my-cluster", "timestamp" : 1517224098451, "status" : "green", "indices" : { "count" : 0, "shards" : { }, "docs" : { "count" : 0, "deleted" : 0 }, "store" : { "size" : "0b", "size_in_bytes" : 0 }, "fielddata" : { "memory_size" : "0b", "memory_size_in_bytes" : 0, "evictions" : 0 }, "query_cache" : { "memory_size" : "0b", "memory_size_in_bytes" : 0, "total_count" : 0, "hit_count" : 0, "miss_count" : 0, "cache_size" : 0, "cache_count" : 0, "evictions" : 0 }, "completion" : { "size" : "0b", "size_in_bytes" : 0 }, "segments" : { "count" : 0, "memory" : "0b", "memory_in_bytes" : 0, "terms_memory" : "0b", "terms_memory_in_bytes" : 0, "stored_fields_memory" : "0b", "stored_fields_memory_in_bytes" : 0, "term_vectors_memory" : "0b", "term_vectors_memory_in_bytes" : 0, "norms_memory" : "0b", "norms_memory_in_bytes" : 0, "points_memory" : "0b", "points_memory_in_bytes" : 0, "doc_values_memory" : "0b", "doc_values_memory_in_bytes" : 0, "index_writer_memory" : "0b", "index_writer_memory_in_bytes" : 0, "version_map_memory" : "0b", "version_map_memory_in_bytes" : 0, "fixed_bit_set" : "0b", "fixed_bit_set_memory_in_bytes" : 0, "max_unsafe_auto_id_timestamp" : -9223372036854775808, "file_sizes" : { } } }, "nodes" : { "count" : { "total" : 3, "data" : 3, "coordinating_only" : 0, "master" : 3, "ingest" : 3 }, "versions" : [ "6.1.2" ], "os" : { "available_processors" : 3, "allocated_processors" : 3, "names" : [ { "name" : "Linux", "count" : 3 } ], "mem" : { "total" : "10.4gb", "total_in_bytes" : 11247157248, "free" : "4.5gb", "free_in_bytes" : 4915200000, "used" : "5.8gb", "used_in_bytes" : 6331957248, "free_percent" : 44, "used_percent" : 56 } }, "process" : { "cpu" : { "percent" : 10 }, "open_file_descriptors" : { "min" : 177, "max" : 178, "avg" : 177 } }, "jvm" : { "max_uptime" : "6m", "max_uptime_in_millis" : 361766, "versions" : [ { "version" : "1.8.0_151", "vm_name" : "OpenJDK 64-Bit Server VM", "vm_version" : "25.151-b12", "vm_vendor" : "Oracle Corporation", "count" : 3 } ], "mem" : { "heap_used" : "252.1mb", "heap_used_in_bytes" : 264450008, "heap_max" : "2.9gb", "heap_max_in_bytes" : 3195076608 }, "threads" : 63 }, "fs" : { "total" : "23.2gb", "total_in_bytes" : 24962703360, "free" : "19.4gb", "free_in_bytes" : 20908818432, "available" : "18.2gb", "available_in_bytes" : 19570003968 }, "plugins" : [ ], "network_types" : { "transport_types" : { "netty4" : 3 }, "http_types" : { "netty4" : 3 } } } }

Nodes Stats

If you want to inspect metrics for specific nodes in the cluster, use this API. You can see information for all nodes, a specific node, or ask to see only index or OS/process specific stats.

All nodes:

curl -XGET 'localhost:9200/_nodes/stats?pretty'

A specific node:

curl -XGET 'localhost:9200/_nodes/node-1/stats?pretty'

Index-only stats:

curl -XGET 'localhost:9200/_nodes/stats/indices?pretty'

This is just the tip of the iceberg and there are plenty more APIs available. Refer to the official Cluster API docs for reference.

What's Next?

"When all else fails, read the fuc%^&* manual" goes the famous saying. Thing is, the manual in question, and the technology it documents, are not straightforward, to say the least.

This tutorial made a brave attempt to provide users with the basics of setting up and configuring their first Elasticsearch cluster, knowing full well that it is virtually impossible to provide instructions that suit every environment and use case.

Together with this tutorial, I strongly recommend doing additional research. Other than Elastic's official documentation, here are some additional informative resources:

Good luck!