Parts

Manually monitor pod resources Automatically set pod resources with Vertical Pod Autoscaling (this article)

Related

TL;DR

VPA (Vertical Pod Autoscaling) will suggest or even automatically set values for resource requests and limits for pods inside the cluster.

Resource requests and limits

What?

What are resource requests and limits? This great blog post and video will get you up to date.

Why?

Kubernetes clusters work best when all containers of all pods have resource requests+limits for CPU+memory assigned. This effects pod scheduling, lifetime, termination and priority.

Though often it’s hard to know the resources for your application. If you set these too low, your application might get throttled or even gets terminated. If you set these too high, you might waste costly resources. It’s possible to monitor the resource usage of pods as we did in part 1.

But what if your cluster could set the requests and limits automatically for you?

Horizontal vs Vertical scaling

Horizontal scaling means raising the amount of your instance. For example adding new nodes to a cluster/pool. Or adding new pods by raising the replica count (Horizontal Pod Autoscaler).

Vertical scaling means raising the resources (like CPU or memory) of each node in the cluster (or in a pool). This is rarely possible without creating a completely new node pool. When it comes to pods though, vertical scaling would mean to dynamically adjust the resource requests and limits based on the current application needs (Vertical Pod Autoscaler).

VPA components

VerticalPodAutoscaler (VPA) is a Kubernetes resource which can be created. It references a specific deployment and some more options in the spec: section. The status: section will contain information and recommendations about the scaling process going on.

VPA Recommender

The Recommender looks at the metric history, OOM events and the VPA spec of a deployment and suggests fitting values for requests. The limits raised/lowered based on the limits:requests (more further down) proportion defined. Hence the Recommender could just be used by itself if one is unsure what the application actually needs. Further down we see resource suggestions for our example app.

VPA Auto Adjuster

Whatever the Recommender will recommend, the Adjuster will implement if the updateMode: Auto is defined.

Due to Kubernetes limitations, the only way to modify the resource requests of a running Pod is to recreate the Pod. If you create a VerticalPodAutoscaler with an updateMode of "Auto", the VerticalPodAutoscaler evicts a Pod if it needs to change the Pod's resource requests. (source)

As far as I can see, in-place updating pod resources is planned. Till then pods need to be deleted and recreated to achieve auto-adjusting.

Example App

We use the example repo https://github.com/wuestkamp/k8s-example-vpa which comes with Prometheus, Grafana and an example deployment to stress resources.

App Image

The app uses image gcr.io/kubernetes-e2e-test-images/resource-consumer:1.5 . It provides an HTTP endpoint and can receive commands to use resources:

curl --data "millicores=400&durationSec=600" 10.12.0.11:8080/ConsumeCPU curl --data "megabytes=300&durationSec=600" 10.12.0.11:8080/ConsumeMem

Use VPA to FIND fitting resource requests

Set VPA recommendation mode YAML

We want to use VPA only in “suggestion” mode. This is great to see if we even would like to use it:

apiVersion: autoscaling.k8s.io/v1beta2

kind: VerticalPodAutoscaler

metadata:

name: vpa

spec:

targetRef:

apiVersion: "extensions/v1beta1"

kind: Deployment

name: compute

updatePolicy:

updateMode: "Off" # only recommodation mode

Resource usage in the test app

We created some resource usage and monitored it using Prometheus and Grafana: