Scaling Elasticsearch

Elasticsearch is a distributed search engine that is designed for high availability. In this post, we will look at two common operations with K8S that you will need to perform in a production deployment.

Scaling of nodes with almost zero downtime. Elasticsearch throughput scales horizontally, so this is a very common operation to scale up / down ES throughput based on production requirements. Changing the underlying hardware or migrating your ES deployment from one IaaS provider to another or a bare-metal system.

To prevent any data loss while scaling out, I also recommend ensuring the following two things:

Each Elasticsearch container pod should run on at least and at most one K8S agent node — this is to preserve the HA guarantees of Elasticsearch. All Elasticsearch indices should have at least one replica. This ensures that at least one copy of a shard is present in the cluster during rolling upgrades, thus minimizing .

Finally, if you are doing this on a production cluster, I recommend taking a snapshot and backing it up externally (like AWS S3) before starting out.

Image: Things gone wrong!

Growing a 3-Node ES Cluster to a 5-Node Cluster

Before we get started, let’s recap how a K8S cluster works. Before we set out to grow our ES cluster to 5 nodes, we should ensure that our K8S cluster has 5 nodes, otherwise more than one pod will get scheduled on the same node which is not desired if we want high availability.

Since we started with only 3 nodes for our K8S cluster, we need to first scale the K8S agent nodes to 5 and then scale our ES nodes.

Image: Representational diagram of horizontally scaling nodes in K8S cluster

Kops has the concept of “instance groups”, which are a group of similar machines. To scale the nodes, we need to fetch the instance group name to which our nodes belong and then increase the number of nodes in the group. Lets first get the list of instance groups present in our cluster.

The following command will list down the instance groups of your cluster.

[ec2-user@ip-172-31-35-145 examples]$ kops get ig

Using cluster from kubectl context: k8s.appbase NAME ROLE MACHINETYPE MIN MAX ZONES

masters Master t2.large 1 1 us-east-2c

nodes Node t2.medium 3 3 us-east-2c

1. Run the following command to edit instance group configuration for the agent nodes. Replace <nodes> with your instance group name.

kops edit ig <nodes>

You should see a config file open.

apiVersion: kops/v1alpha2

kind: InstanceGroup

metadata:

creationTimestamp: 2018-02-14T12:26:43Z

labels:

kops.k8s.io/cluster: k8s.appbase

name: nodes

spec:

image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14

machineType: t2.medium

maxSize: 3

minSize: 3

nodeLabels:

kops.k8s.io/instancegroup: nodes

role: Node

subnets:

- us-east-2c

In order to scale the nodes we need to edit minSize and maxSize in the config file of the instance group to which our agent nodes belong. The maxSize defines the maximum number of nodes we want up at any given time while minSize defines the minimum number of nodes that must be up at any given time.

2. Here, we will change the maxSize and minSize values to 5.

apiVersion: kops/v1alpha2

kind: InstanceGroup

metadata:

creationTimestamp: 2018-02-14T12:26:43Z

labels:

kops.k8s.io/cluster: k8s.appbase

name: nodes

spec:

image: kope.io/k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-01-14

machineType: t2.medium

maxSize: 5

minSize: 5

nodeLabels:

kops.k8s.io/instancegroup: nodes

role: Node

subnets:

- us-east-2c

3. Next, we need to update the cluster to apply the configuration change we just made.

kops update cluster --yes

kops rolling-update cluster

It may take few minutes to add the nodes as new VMs will be added to the cluster. You can verify the new node count by getting the nodes of the cluster.

4. Run: kubectl get nodes

NAME STATUS ROLES AGE VERSION

ip-172-20-44-33.us-east-2.compute.internal Ready master 12h v1.8.6

ip-172-20-52-48.us-east-2.compute.internal Ready node 2d v1.8.6

ip-172-20-62-30.us-east-2.compute.internal Ready node 2d v1.8.6

ip-172-20-64-53.us-east-2.compute.internal Ready node 2d v1.8.6

ip-172-20-67-30.us-east-2.compute.internal Ready node 1m v1.8.6

ip-172-20-87-68.us-east-2.compute.internal Ready node 1m v1.8.6

As you can see, we now have a K8S cluster of 5 agent nodes.

Now let’s scale the Elasticsearch nodes from 3 → 5.

We can scale Elasticsearch horizontally by simply specifying the number of replicas. This method can be used for both — scaling up as well as scaling down.

kubectl scale --replicas=5 statefulset es

You can adjust the number of replicas to suite your need (make sure you have enough nodes for the replicas).

[ec2-user@ip-172-31-35-145 ~]$ kubectl scale --replicas=5 statefulset es

statefulset "es" scaled

You can verify the number of Elasticsearch pods and the nodes on which they are running by the following command:

kubectl get pods -o wide

You should get a table of your pods along with the nodes on which they are deployed.

Embed: Pod details with the nodes

When you hit the URL of Elasticsearch to check the health of your nodes, you should now notice the number of Elasticsearch nodes increase to five.

Image: Elasticsearch cluster health

Vertical scaling of Elasticsearch nodes

Image: Scaling ES cluster vertically

Scaling Elasticsearch vertically means we are going to let our ES nodes use more/less memory, disk and computing resources. In order to achieve this, we need to vertically scale our K8S node by changing the instance type of the node VM to suit our need and then editing the JAVA_HEAP_SIZE and number of CPU of the Elasticsearch statefulset resource that we deployed on K8S. The statefulset will update the Elasticsearch to use the updated resource limits.

Image: Vertical scaling of K8S cluster

We start by vertically scaling the K8S nodes. The process is similar to horizontally-scaling K8S nodes.

Get the instance group name of nodes.

Run: kops get ig Edit the config of instance group for the nodes.

Run: kops edit ig <ig name> Change the type of machine machineType to whichever machine you want to scale up. Make sure that machine type is available in the AZ you have provisioned the cluster. We have our nodes provisioned with t2.medium VMs. Change it to t2.large . Save the file and update the cluster.

kops update cluster --yes

kops rolling-update cluster

Wait for few minutes untill all your ec2 instances are terminated and new machines come up. Then check the validity of cluster.

5. Run: kops validate cluster

It should show you the cluster with new machine types:

[ec2-user@ip-172-31-35-145 examples]$ kops validate cluster

Using cluster from kubectl context: k8s.appbase Validating cluster k8s.appbase INSTANCE GROUPS

NAME ROLE MACHINETYPE MIN MAX SUBNETS

masters Master t2.large 1 1 us-east-2c

nodes Node t2.large 5 5 us-east-2c NODE STATUS

NAME ROLE READY

ip-172-20-44-33.us-east-2.compute.internal master True

ip-172-20-52-48.us-east-2.compute.internal node True

ip-172-20-62-30.us-east-2.compute.internal node True

ip-172-20-64-53.us-east-2.compute.internal node True

ip-172-20-67-30.us-east-2.compute.internal node True

ip-172-20-87-68.us-east-2.compute.internal node True

We have just scaled our K8S nodes vertically.

Next, we can scale our ES nodes vertically. For this, we need to update the Elasticsearch statefulset.

kubectl edit statefulset es

Now edit the value of JAVA_HEAP_SIZE in the list of environment variables from -Xms256m -Xms256m to -Xms1g -Xms1g .

Elasticsearch settings defaults to use total number of available processors. Since instance type t2.large has 2 processors, ES nodes will use all 2 processors.

After saving the file, the statefulSet should upgrade the ES nodes with the updated configurations and our ES cluster will scale up vertically. To verify the vertical scaling, you can check the ES _nodes API endpoint by hitting the Elasticsearch cluster’s URL.