One of the challenges I faced recently was the ability to autoscale my containers on my Kubernetes cluster. I realised I had not written yet about this concept and thought I would share how this can be done and what the pitfalls there were for me.

If you combine this concept with my previous post about autoscaling your kube cluster (https://renzedevries.wordpress.com/2017/01/10/autoscaling-your-kubernetes-cluster-on-aws/) you can create a very nice and balanced scalable deployment at lower costs.

Preparing your cluster

In my case I have used Kubernetes KOPS to create my cluster in AWS. However by default this does not install some of the add-ons we need for autoscaling our workloads like Heapster.

Heapster monitors and analyses the resource usage in our cluster. These metrics it monitors are very important to build scaling rules, it allows us for example to scale based on a cpu percentage. Heapster records these metrics and offers an API to Kubernetes so it can act based on this data.

In order to deploy heapster I used the following command:

kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.3.0.yaml

Please note that in your own kubernetes setup you might already have heapster or want to run a different version.

Optional dashboard

I also find it handy to run the Kubernetes dashboard, which you can deploy as following under KOPS:

kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/kubernetes-dashboard/v1.5.0.yaml

Deploying Workload

In order to get started I will deploy a simple workload, in this case its the command service for my robotics framework (see previous posts). This is a simple HTTP REST endpoint that takes in JSON data and passes this along to a message queue.

This is the descriptor of the deployment object for Kubernetes:

apiVersion: extensions/v1beta1 kind: Deployment metadata: name: command-svc spec: replicas: 1 template: metadata: labels: app: command-svc spec: containers: - name: command-svc image: ecr.com/command-svc:latest ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 30 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 60 periodSeconds: 10 env: - name: amq_host value: amq - name: SPRING_PROFILES_ACTIVE value: production

Readyness and Liveness

I have added a liveness and readyness probe to the container, this allows Kubernetes to detect when a container is ready and if its still alive. This is important in autoscaling as otherwise you might get pods already enabled in your loadbalanced service that are not actually ready to accept work. This is because Kubernetes by default can only detect if a pod has started, not if the process in the pod is ready for accepting workloads.

These probes test if a certain condition is true and only then the pod will get added to the load balanced service. In my case I have a probe to check if the port 8080 of my rest service is available. I am using a simple TCP probe as the HTTP probe that is also offered gave strange errors and the TCP probe works just as well for my purpose.

Deploying

Now we are ready to deploy the workload and we deploy this as following:

kubectl create -f command-deployment.yaml ## Enabling Autoscaling The next step is to enable autoscaling rules on our workload, as mentioned above we have deployed heapster which can monitor resource usage. In this case I have set some resource constraints for the pods to indicate how much CPU its allowed to consume. For the command-svc per pod we have a limit of 500m, which translates to roughly 0.5 CPU core. This means if we create a rule to scale at 80 cpu usage this is based on this limit, so it will scale 80% usage of the 0.5 CPU limit. We can create a rule that says there is always minimum of 1 pod and a maximum of 3 and we scale-up once the cpu usage exceeds 80% of the pod limit. kubectl autoscale deployment command-svc --cpu-percent=80 --min=1 --max=3

We can ask for information on the autoscaling with the following command and monitor the scaling changes:

kubectl get hpa -w

Creating a load

I have deployed the command-svc pod and want to simulate a load using a simple tool. For this I have simple resorted to Apache JMeter, its not a perfect tool but it works well and most important its free. I have created a simple thread group with 40 users doing 100k requests against the command-svc from my desktop.

This is the result when monitoring the autoscaler:

command-svc Deployment/command-svc 1% / 80% 1 3 1 4m command-svc Deployment/command-svc 39% / 80% 1 3 1 6m command-svc Deployment/command-svc 130% / 80% 1 3 1 7m command-svc Deployment/command-svc 130% / 80% 1 3 1 7m command-svc Deployment/command-svc 130% / 80% 1 3 2 7m command-svc Deployment/command-svc 199% / 80% 1 3 2 8m command-svc Deployment/command-svc 183% / 80% 1 3 2 9m command-svc Deployment/command-svc 153% / 80% 1 3 2 10m command-svc Deployment/command-svc 76% / 80% 1 3 2 11m command-svc Deployment/command-svc 64% / 80% 1 3 2 12m command-svc Deployment/command-svc 67% / 80% 1 3 2 13m command-svc Deployment/command-svc 91% / 80% 1 3 2 14m command-svc Deployment/command-svc 91% / 80% 1 3 2 14m command-svc Deployment/command-svc 91% / 80% 1 3 3 14m command-svc Deployment/command-svc 130% / 80% 1 3 3 15m command-svc Deployment/command-svc 133% / 80% 1 3 3 16m command-svc Deployment/command-svc 130% / 80% 1 3 3 17m command-svc Deployment/command-svc 126% / 80% 1 3 3 18m command-svc Deployment/command-svc 118% / 80% 1 3 3 19m command-svc Deployment/command-svc 137% / 80% 1 3 3 20m command-svc Deployment/command-svc 82% / 80% 1 3 3 21m command-svc Deployment/command-svc 0% / 80% 1 3 3 22m command-svc Deployment/command-svc 0% / 80% 1 3 3 22m command-svc Deployment/command-svc 0% / 80% 1 3 1 22m

You can also see that it neatly scales down at the end once the load goes away again.

Pitfalls

I have noticed a few things about the autoscaling that are important to take into account:

1. The CPU percentage is based on the resource limits you define in your pods, if you don't define them it won't work as expected

2. Make sure to have readyness and liveness probes in your container else your pods might not be ready but already get hit with external requests

3. I could only have probes that use TCP for some reason in AWS, unsure why this is the case, HTTP probes failed for me with timeout exceptions.

Conclusion

I hope this post helps people get the ultimate autoscaling setup of both your workloads and also your cluster. This is a very powerfull and dynamic setup on AWS in combination with the cluster autoscaler as described in my previous post: https://renzedevries.wordpress.com/2017/01/10/autoscaling-your-kubernetes-cluster-on-aws/