One of the challenges I have faced in the last few months is the autoscaling of my Kubernetes cluster. This is perfectly working on Google Cloud, however as my cluster is deployed on AWS I have no such fortune. However since recently the autoscaling support for AWS has been made possible due to this little contribution that was made: https://github.com/kubernetes/contrib/tree/master/cluster-autoscaler/cloudprovider/aws

In this post I want to describe how you can autoscale your kubernetes cluster based on the container workload. This is a very important principle because normal autoscaling by AWS can not do this on metrics available to the cluster, it can only do it on for example memory or cpu utilisation. What we want is that in case a container cannot be scheduled on a cluster because there are not enough resources the cluster gets scaled out.

Defining resource usage

One of the very important things to do when you are defining your deployments against Kubernetes is to define the resource usage of your containers. In each deployment object it is therefore imperative that you specify the resource allocation that is expected. Kubernetes will then use this to allocate this to a node that has enough capacity. In this exercise we will deploy two containers that both at their limit require 1.5GB of memory, this looks as following in fragment of one of the deployment descriptors:

spec: containers: - name: amq image: rmohr/activemq:latest resources: limits: memory: "1500Mi" cpu: "500m"

Setting up scaling

So given this we will start out with a cluster of one node of type m3.medium which has 3.75GB of memory, we do this on purpose with a limited initial cluster to test out our autoscaling.

If you execute kubectl get nodes we see the following response:

NAME STATUS AGE ip-10-0-0-236.eu-west-1.compute.internal Ready 10m

In order to apply autoscaling we need to deploy a specific deployment object and container that checks the Kubernetes Cluster for unscheduled workloads and if needed will trigger an AWS autoscale group. This deployment object looks as following:

apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cluster-autoscaler labels: app: cluster-autoscaler spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: labels: app: cluster-autoscaler spec: containers: - image: gcr.io/google_containers/cluster-autoscaler:v0.4.0 name: cluster-autoscaler resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi command: - ./cluster-autoscaler - --v=4 - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --nodes=MIN_SCALE:MAX_SCALE:ASG_NAME env: - name: AWS_REGION value: us-east-1 volumeMounts: - name: ssl-certs mountPath: /etc/ssl/certs/ca-certificates.crt readOnly: true imagePullPolicy: "Always" volumes: - name: ssl-certs hostPath: path: "/etc/ssl/certs/ca-certificates.crt"

Note: The image we are using is a google supplied image with the autoscaler script on there, you can check here for the latest version: https://console.cloud.google.com/kubernetes/images/tags/cluster-autoscaler?location=GLOBAL&project=google-containers&pli=1

Settings

In the above deployment object ensure to replace the MIN_SCALE and MAX_SCALE settings for the autoscaling and ensure the right autoscaling group (ASG_NAME) is set. Please note that the minimum and maximum scaling rule need to be allowed in the AWS scaling group as the scaling process cannot modify the auto scaling group rule itself.

AWS Policy

In AWS we need to ensure there is an IAM policy in place that allows all resources to query the auto scaling groups and modify the desired capacity of the group. I have used the below role definition, which is very wide:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ], "Resource": "*" } ] }

Make sure to attach this to the role that is related to the worked nodes, in my core-os kube-aws generated cluster this is something like ‘testcluster-IAMRoleController-GELKOS5QWHRU’ where testcluster is my clustername.

Deploying the autoscaler

Now let’s deploy the autoscaler just like any other deployment object against the Kubernetes cluster

kubectl create -f autoscaler.yaml

Let’s check next that the autoscaler is working:

➜ test kubectl get po NAME READY STATUS RESTARTS AGE cluster-autoscaler-2111878067-nhr01 1/1 Running 0 2m

We can also check the logs using kubectl logs cluster-autoscaler-2111878067-nhr01

No unschedulable pods Scale down status: unneededOnly=true lastScaleUpTime=2017-01-06 08:44:01.400735149 +0000 UTC lastScaleDownFailedTrail=2017-01-06 08:44:01.400735354 +0000 UTC schedulablePodsPresent=false Calculating unneded nodes Node ip-10-0-0-135.eu-west-1.compute.internal - utilization 0.780000 Node ip-10-0-0-135.eu-west-1.compute.internal is not suitable for removal - utilization to big (0.780000) Node ip-10-0-0-236.eu-west-1.compute.internal - utilization 0.954000 Node ip-10-0-0-236.eu-west-1.compute.internal is not suitable for removal - utilization to big (0.954000)

We can see the autoscaler checks regulary the workload on the nodes and check if they can be scaled down and check if additional worker nodes are needed.

Let’s try it out

Now that we have deployed our autoscaling container let’s start to schedule our workload against AWS. In this case we will deploy two objects, being ActiveMQ and Cassandra where both require a 1.5GB memory footprint. The combined deployment plus the system containers will cause the scheduler of Kubernetes to determine there is no capacity available, and in this case Cassandra cannot be scheduled as can be seen in below snippet from kubectl describe po/cassandra-2599509460-g3jzt :

FailedScheduling pod (cassandra-2599509460-g3jzt) failed to fit in any node

When we check in the logs of the autoscaler we can see the below:

Estimated 1 nodes needed in testcluster-AutoScaleWorker-19KN6Y4AR18Z Scale-up: setting group testcluster-AutoScaleWorker-19KN6Y4AR18Z size to 2 Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"cassandra-2599509460-mt275", UID:"af53ac7b-d3ec-11e6-bd28-0add02d2d0c1", APIVersion:"v1", ResourceVersion:"2224", FieldPath:""}): type: 'Normal' reason: 'TriggeredScaleUp' pod triggered scale-up, group: testcluster-AutoScaleWorker-19KN6Y4AR18Z, sizes (current/new): 1/2

It is scheduling an additional worker by increasing the desired capacity of our auto scaling group in AWS. After a small wait we can see the additional node has been made available:

NAME STATUS AGE ip-10-0-0-236.eu-west-1.compute.internal Ready 27m ip-10-0-0-68.eu-west-1.compute.internal Ready 11s

And a short while after the node came up we also see that the pod with Cassandra has become active:

NAME READY STATUS RESTARTS AGE amq-4187240469-hlkhh 1/1 Running 0 20m cassandra-2599509460-mt275 1/1 Running 0 8m cluster-autoscaler-2111878067-nhr01 1/1 Running 0 11m

Conclusion

We have been able to autoscale our AWS deployed Kubernetes cluster which is extremely useful. I can use this in production to quickly scale out and down my cluster. But perhaps even more important for my case in development i can use it to during idle moments run a minimum size cluster and during workloads it scales back up to full capacity, saving me quite some money.