Kubernetes is a fantastic tool for building large containerised software systems in a manner that is both resilient and scalable. But the architecture and design of Kubernetes has evolved over time, and there are some areas that could do with tweaking or rethinking. This post digs into some issues related to how image tags are handled in Kubernetes and how they are treated differently in plain Docker.

First, let's take a look at one of the first issues that people can face. I have the following demo video that shows a developer trying to deploy a new version of a Rust webapp to a Kubernetes cluster:

The video starts by building and pushing a version of the pizza webapp that serves quattro formaggi pizza. The developer then tries to deploy the webapp to Kubernetes and ends up in a confusing situation where it's running, yet not serving the kind of pizza we expect. We can see what's going on by doing some more inspection:

It turns out 3 different versions of our webapp are running inside a single Kubernetes Replica Set, as evidenced by the 3 different digests.

The reason this can happen comes down to the Kubernetes imagePullPolicy . The default is IfNotPresent , which means nodes will use an existing image rather than pull a new one. In our demo, each node happened to have a different version of the image left over from previous runs. Personally, I'm disappointed that this is the default behaviour, as it's unexpected and confusing for new users. I understand that it evolved over-time and in some cases it is the wanted behaviour, but we should be able to change this default for the sake of usability.

The simplest mitigation for the problem is to set the pull policy to AlwaysPull :

This can even be made the default for all deployments by using the AlwaysPullImages Admission Controller.

However, there is still a rather large hole in this solution. Imagine a new deployment occurs concurrently with the image being updated in the registry. It's quite likely that different nodes will pull different versions of the image even with AlwaysPull set. We can see a better solution in the way Docker Swarm Mode works - the Swarm Mode control plane will resolve images to a digest prior to asking nodes to run the image, that way all containers are guaranteed to run the same version of the image. There's no reason we can't do something similar in Kubernetes using an Admission Controller, and my understanding is that Docker EE does exactly this when running Kubernetes pods. I haven't been able to find an existing open source Admission Controller that does this, but we're working on one at CS and I'll update this post when I have something.

Going a little deeper, the real reason behind this trouble is a difference between the way image tags are viewed in Kubernetes and Docker. Kubernetes assumes image tags are immutable. That is to say, if I call my image amouat/pizza:today , Kubernetes assumes it will only ever refer to that unique image; the tag won't get reused for new versions in the future. This may sound like a pain at first, but immutable images solve a lot of problems; any potential confusion about which version of an image a tag refers to simply evaporates. It does require using an appropriate naming convention; in the case of amouat/pizza:today a better version would be to use the date e.g. amouat/pizza:2018-05-12 , in other cases SemVer or git hashes can work well.

In contrast, Docker treats tags as mutable and even trains us to think this way. For example, when building an application that runs in a container, I will repeatedly run docker build -t test . or similar, constantly reusing the tag so that the rest of my workflow doesn't need to change. Also, the official images on the Docker Hub typically have tags for major and minor versions of images that get updated over time e.g. redis:3 is the same image as redis:3.2.11 at the time of writing, but in the past would have pointed at redis:3.2.10 etc.

This split is a real practical problem faced by new users. Solving it seems reasonably straightforward; can't we have both immutable and mutable tags? This would require support from registries and (preferably) the Docker client, but the advantages seem worth it. I am hopeful that the new OCI distribution specification will tackle this issue.

To sum up; be careful when deploying images to Kubernetes and make sure you understand how images actually get deployed to your cluster. And if you happen

to have any influence on the direction of Kubernetes or the Distribution spec; can we please try to make the world a bit nicer?

Because of these and some other issues, Container Solutions have started work on Trow; an image management solution for Kubernetes that includes a registry component that runs inside the cluster. Trow will support immutable tags and include admission controllers that pin images to digests. If this sounds useful to you, please head over to trow.io and let us know!

Further Viewing

This blog was based on my talk Establishing Image Provenance and Security in Kubernetes given at KubeCon EU 2018, which goes deeper into some of the issues surrounding images.

Looking for a new challenge? We're hiring!