Why bother?

Why indeed should you bother how much memory and CPU the containers in your Kubernetes cluster consume? Let’s see what practitioners have to say on this topic:

Unfortunately, Kubernetes has not yet implemented dynamic resource management, which is why we have to set resource limits for our containers.

I imagine that at some point Kubernetes will start implementing a less manual way to manage resources, but this is all we have for now.

Ben Visser, 12/2016 in: Kubernetes — Understanding Resources

Kubernetes doesn’t have dynamic resource allocation, which means that requests and limits have to be determined and set by the user. When these numbers are not known precisely for a service, a good approach is to start it with overestimated resources requests and no limit, then let it run under normal production load for a certain time: hours, days, weeks according to the nature of the service.

Antoine Cotten, 05/2016 in: 1 year, lessons learned from a 0 to Kubernetes transition

Now, it turns out that Google—in the context of Borg—has such a resource consumption predictor/adjuster called autopilot, which does in fact offer an automatic resource consumption management.

Enter the person who brought this topic up at KubeCon 2016: Tim Hockin, one of the early Kubernetes committers when it was still a Google-internal project:

As the wise Tim stated in his talk Everything You Ever Wanted To Know About Resource Scheduling (also, check out the slide deck and the video):

Some 2/3 of the Borg users use autopilot.

This sounds quite convincing; way more than half of the Borg users seem to be happy to not only have the resource consumption set explicitly but also to delegate this job to autopilot.

Back to Kubernetes.

Already in mid-2015 we find that the Kubernetes community — informed by Google’s autopilot experience—raised an issue to capture the need for an automated resource consumption management. In early 2017 the according design proposal was completed and now the initial work on the resulting Vertical Pod Autoscaler (VPA) is on its way.

A demonstrator: resorcerer

To gather experience on the problem domain—assessing container resource consumption and adjusting the containers accordingly—we put together a POC, a demonstrator called resorcerer.

Check out the video walkthrough for resorcerer where I show you how you can use it in a Kubernetes cluster (in a namespace, really), how it works and what the limitations are:

A video walkthrough of resorcerer, its features and inner workings (~27min).

Under the hood: heavy lifting is done by Prometheus

To come up with recommendations for resource consumption limits, the resorcerer demonstrator leverages Prometheus. For example, to determine the CPU usage of a certain container for the past 30 min, we can use the following PromQL query:

sum(rate(container_cpu_usage_seconds_total{pod_name=~"nginx.+", container_name="nginx"}[30m]))

This will result in something like the following:

Screen shot of Prometheus, showing container CPU usage over time.

The tricky part here is to pick meaningful PromQL queries as well as the right parameter for the observation time period. For starters, think of three cases:

idle—no load on the container, this is the minimum amount of CPU/memory resources required.

normal—observing the container for a longer period of time (days or weeks) and figuring out the baseline; this is the typical case of resources required to operate.

spike—think of an external event, for example a half-prize sales event or a mention in a popular news outlet, driving tons of traffic to your app; this is an exceptional case of resource consumption which you still want to be able to accommodate.

In above case distinction we conveniently only considered the simple case of a stateless application such as an app server or Web server. Things do get considerably more complicated for stateful services, involving databases, for example.

In the context of resorcerer, we’d like to thank Julius Volz for his support around Prometheus scraping and PromQL as well as Stefan Schimanski, who helped navigating the tricky parts of the Kubernetes client-go library.