Many Kubernetes users, especially those at enterprise level, swiftly come across the need to autoscale environments. Fortunately, the K8s Horizontal Pod Autoscaler (HPA) allows you to configure your deployments to scale horizontally in a myriad number of ways to do just that. One of the biggest advantages of using Kube Autoscaling is that your Cluster can track the load capabilities of your existing Pods and calculate if more Pods are required or not.

The Kubernetes Autoscaling Framework

Leverage efficient Kubernetes Autoscaling by harmonizing the two layers of scalability on offer:

1 – Autoscaling at Pod level: This plane includes the Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA); both of which scale your containers available resources

2 – Autoscaling at Cluster level: The Cluster Autoscaler (CA) manages this plane of scalability by scaling the number of nodes inside your Cluster up or down as necessary

The Kubernetes Autoscaling FrameWork in Detail:

Horizontal Pod Autoscaler (HPA)

HPA scales the number of Pod replicas for you in your Cluster. The move is triggered by CPU or memory to scale up or down as necessary. However, it’s possible to configure HPA to scale Pods according to varied, external, and custom metrics (metrics.k8s.io, external.metrics.k8s.io, and custom.metrics.k8s.io).

Vertical Pod Autoscaler (VPA)

Built predominantly for stateful services, VPA adds CPU or memory to Pods as required—it also works for both stateful and stateless Pods too though. To make these changes, VPA restarts Pods to update new CPU and memory resources, which can be configured to set off in reaction to OOM (out of memory) events. Upon restarting Pods, VPA always ensures there is the minimum number according to the Pods Distribution Budget (PDB) which you can set along with a resource allocation maximum and minimum rate.

Cluster Autoscaler (CA)

The second layer of autoscaling involves CA, which automatically adjusts the size of the cluster when:

– Any Pod/s fail to run and fall into a pending state due to insufficient capacity in the Cluster (in which case CA will scale up).

– Nodes in the cluster have been underutilized for a certain period of time and there is a chance to relocate their pods on reaming nodes (in which case CA will scale down).

CA makes routine checks to determine whether any pods are in a pending state waiting for extra resources or if Cluster nodes are being underutilized. The function then adjusts the number of Cluster nodes accordingly if more resources are required. CA interacts with the cloud provider to request additional nodes or shut down idle ones and ensures the scaled-up Cluster remains within the limitations set by the user. It works with AWS, Azure, and GCP.

5 Steps to Using HPA and CA with Amazon EKS

This article offers a step-by-step guide to installing and autoscaling through HPA and CA with an Amazon Elastic Container Service for Kubernetes (Amazon EKS) Cluster. Following the guidelines are two test use case examples to show the features in situ:

Cluster Prerequisites:

An Amazon VPC and a dedicated security group that meets the necessary set-up for an Amazon EKS Cluster.

Alternatively, to avoid a manual step-by-step VPC creation, AWS provides a CloudFormation stack which creates a VPC for EKS here. The stack is highlighted here.

An Amazon EKS service role to apply to your Cluster.

1- Create an AWS EKS Cluster (control plane and workers) in line with the official instructions here. Once you launch an Auto Scaling group of worker nodes they can register to your Amazon EKS Cluster and you can begin deploying Kube applications to them.

2- Deploy a Metrics Server so that HPA can scale Pods in a deployment based on CPU/memory data provided by an API (as described above). The metrics.k8s.io API is usually provided by the metrics-server (which collects the CPU and memory metrics from the Summary API, as exposed by Kubelet on each node).

3- Add the following policy to the Role created by EKS for the K8S workers & nodes (this is for the K8S CA to work alongside the AWS Autoscaling Group (AWS AG)).

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeLaunchConfigurations", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": "*" } ] }

4- Deploy the K8S CA feature.

Depending on the Linux distribution in use, you might need to update the deployment file and the certificate path. For example, if using AMI Linux 2; replace /etc/ssl/certs/ca-certificates.crt by /etc/ssl/certs/ca-bundle.crt in the deployment definition.

5- Update the deployment definition for the CA to find specific tags in the AWS AG ( k8s.io/cluster-autoscaler/<CLUSTER NAME> should contain the real Cluster name). Also, update the environment variable AWS_REGION .

Add the following tags to the AWS AG, for the K8S CA to automatically identify the AWS AG:

> k8s.io/cluster-autoscaler/enabled

> k8s.io/cluster-autoscaler/

Kubernetes Autoscaling Use Test Case #1

Test K8S HPA working in conjunction with the K8S CA feature:

Prerequisites:

– An AWS EKS Cluster is deployed and working

– A Metric Server is installed to feed the Metrics API

– The K8S CA feature installed

1- Deploy a sample App and create an HPA resource for the App deployment.

2- Increase the load by hitting the App K8S service from several locations.

3- The HPA should now start to scale the number of Pods in the deployment as the load increases. This scaling takes place according to what is specified in the HPA resources. At some point, the new Pods fall into a ‘pending state’ while waiting for extra resources.

$ kubectl get nodes -w NAME STATUS ROLES AGE VERSION ip-192-168-189-29.ec2.internal Ready 1h v1.10.3 ip-192-168-200-20.ec2.internal Ready 1h v1.10.3 $ kubectl get Pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE ip-192-168-200-20.ec2.internal php-apache-8699449574-4mg7w 0/1 Pending 0 17m php-apache-8699449574-64zkm 1/1 Running 0 1h 192.168.210.90 ip-192-168-200-20 php-apache-8699449574-8nqwk 0/1 Pending 0 17m php-apache-8699449574-cl8lj 1/1 Running 0 27m 192.168.172.71 ip-192-168-189-29 php-apache-8699449574-cpzdn 1/1 Running 0 17m 192.168.219.71 ip-192-168-200-20 php-apache-8699449574-dn9tb 0/1 Pending 0 17m ...

4- The CA detects pending Pods due to insufficient capacity and adjusts the size of the AWS auto-scaling group. One extra node is added:

$ kubectl get nodes -w NAME STATUS ROLES AGE VERSION ip-192-168-189-29.ec2.internal Ready 2h v1.10.3 ip-192-168-200-20.ec2.internal Ready 2h v1.10.3 ip-192-168-92-187.ec2.internal Ready 34s v1.10.3

5- The HPA can now schedule the creation of pending Pods. They start to run in the new Cluster node. The average CPU utilization is now below the specified target, so there is no need to schedule extra Pods.

$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 40%/50% 2 25 20 1h $ kubectl get Pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE php-apache-8699449574-4mg7w 1/1 Running 0 25m 192.168.74.4 ip-192-168-92-187 php-apache-8699449574-64zkm 1/1 Running 0 1h 192.168.210.90 ip-192-168-200-20 php-apache-8699449574-8nqwk 1/1 Running 0 25m 192.168.127.85 ip-192-168-92-187 php-apache-8699449574-cl8lj 1/1 Running 0 35m 192.168.172.71 ip-192-168-189-29 ...

6- Now, stop the load started at Point 2) by closing some terminals down (but not all of them). As some are still hitting the app service endpoint it’s possible to check if there is an outage when the Cluster scales down.

7- The average CPU utilization decreases and so, HPA starts to kill some Pods by updating the number of replicas in the deployment.

$ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 47%/50% 2 20 7 1h $ kubectl get Pods -o wide -w NAME READY STATUS RESTARTS AGE IP NODE ... php-apache-8699449574-v5kwf 1/1 Running 0 36m 192.168.250.0 ip-192-168-200-20 php-apache-8699449574-vl4zj 1/1 Running 0 36m 192.168.242.153 ip-192-168-200-20 php-apache-8699449574-8nqwk 1/1 Terminating 0 26m 192.168.127.85 ip-192-168-92-187 php-apache-8699449574-dn9tb 1/1 Terminating 0 26m 192.168.124.108 ip-192-168-92-187 php-apache-8699449574-k5ngv 1/1 Terminating 0 26m 192.168.108.58 ip-192-168-92-187 ...

8- CA detects a node is now being underutilized and the running Pods can be placed on the other nodes. The AWS AG is updated accordingly:

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-189-29.ec2.internal Ready 2h v1.10.3 ip-192-168-200-20.ec2.internal Ready 2h v1.10.3 ip-192-168-92-187.ec2.internal NotReady 23m v1.10.3 $ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-189-29.ec2.internal Ready 2h v1.10.3 ip-192-168-200-20.ec2.internal Ready 2h v1.10.3

9- During the scale down, there should be no visible connection timeout for any of the terminals that were hitting the app service endpoint (at Point 6).

Kubernetes Autoscaling Use Test Case #2

Test if the CA automatically adjusts the Cluster size if there is an insufficient capacity to schedule a Pod that requests more CPU than is available.

Prerequisites:

– An AWS EKS Cluster deployed and working

– The K8S CA feature installed

1- Create two deployments that request less than 1vCPU.

$ kubectl run nginx --image=nginx:latest --requests=cpu=200m $ kubectl run nginx2 --image=nginx:latest --requests=cpu=200m

2- Create a new deployment that requests more than the available CPU.

$ kubectl run nginx3 --image=nginx:latest --requests=cpu=1

3- The new Pod will remain in a pending state because there are no available resources:

$ kubectl get Pods -w NAME READY STATUS RESTARTS AGE nginx-5fcb54784c-lcfht 1/1 Running 0 13m nginx2-66667bf959-2fmlr 1/1 Running 0 3m nginx3-564b575974-xcm5t 0/1 Pending 0 41s

When describing the Pod, it’s possible to see the event indicating that there is insufficient CPU:

$ kubectl describe Pod nginx3-564b575974-xcm5t ….. ….. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 32s (x7 over 1m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu

4- Now, the CA automatically adjusts the Cluster size (of the AWS AG). One extra node is added:

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-142-179.ec2.internal Ready 1m v1.10.3 << ip-192-168-82-136.ec2.internal Ready 1h v1.10.3

5- The Cluster now has enough resources to run the Pod:

$ kubectl get Pods NAME READY STATUS RESTARTS AGE nginx-5fcb54784c-lcfht 1/1 Running 0 48m nginx2-66667bf959-2fmlr 1/1 Running 0 37m nginx3-564b575974-xcm5t 1/1 Running 0 35m

6- Two deployments are deleted. After some time, the CA detects a node in the Cluster which is underutilized and the running Pod/s can be placed on the other existing node. The AWS AG is also updated, reducing the desired capacity by 1.

$ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-82-136.ec2.internal Ready 1h v1.10.3 $ kubectl get Pods -o wide NAME READY STATUS RESTARTS AGE IP NODE nginx-5fcb54784c-lcfht 1/1 Running 0 1h 192.168.98.139 ip-192-168-82-136 Clean Up

Steps to clean up the environment are:

1- Delete the custom policy added into the role created by EKS for the K8S workers and nodes (Step 3 of this guide).

2- Delete the whole cluster (K8s control plane and workers) following the instructions here.

For another great resource on autoscaling, read Stefan Prodan’s article on Kubernetes Horizontal Pod Autoscaler with Prometheus Custom Metrics.

Don’t forget to check out our other great resources on Kubernetes here, here, and here.

Caylent provides a critical DevOps-as-a-Service function to high growth companies looking for expert support with microservices, containers, cloud infrastructure, and CI/CD deployments. Our managed and consulting services are a more cost-effective option than hiring in-house, and we scale as your team and company grow. Check out some of the use cases, learn how we work with clients, and profit from our DevOps-as-a-Service offering too.