Banzai Cloud’s Pipeline platform is an operating system which allows enterprises to develop, deploy and scale container-based applications. It leverages best-of-breed cloud components, such as Kubernetes, to create a highly productive, yet flexible environment for developers and operation teams alike.

One of the main features of the Pipeline platform is that it allows enterprises to run workloads cost effectively by mixing spot instances with regular ones, without sacrificing overall reliability. This requires quite a lot of behind the scenes magic to be built on top of core Kubernetes building blocks. In a previous post we already discussed how we use Taints and tolerations, pod and node affinities , and in this post we’d like to delve into Kubernetes webhooks. Webhooks are widely used across Pipeline - but, in keeping with our spot instance example above, we use them to validate and/or mutate deployments when placing pods on spot or preemptible instances.

Kubernetes provides a lot of ways to extend its built-in functionality. Perhaps the most frequently utilized extension points are custom resource types and custom controllers. However, there are some other very interesting features in Kubernetes like admission webhooks or initializers. These are also extension points in the API, so they can be used to modify the basic behaviour of some Kubernetes features. This definition is a little vague, so let’s get our hands dirty and take a closer look at dynamic admission control, specifically, within those admission webhooks.

Admission controllers 🔗︎

To start, let’s take a look at the definition of admission controllers as it appears in the official Kubernetes documentation. We haven’t arrived at admission webhook yet, but we’ll be there in a moment.

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. […] Admission controllers may be “validating”, “mutating”, or both. Mutating controllers may modify the objects they admit; validating controllers may not. […] If any of the controllers in either phase reject the request, the entire request is rejected immediately and an error is returned to the end-user.

This means that there are special controllers that can intercept Kubernetes API requests, and modify or reject them based on custom logic. A list of previously implemented controllers comes with Kubernetes, or you can write your own. While they sound powerful, these controllers need to be compiled into kube-apiserver , and can only be enabled when the apiserver starts up.

That’s where the dynamic part comes in. Admission webhooks and initializers address these limitations and provide a method of dynamic configuration. Of the two, initializers are the new guys on the block. They’re only an alpha feature, and, as of writing, they are seldom used. We may write a blog post about initializers later, but for now let’s turn our attention to admission webhooks.

What is an admission webhook? 🔗︎

There are two special admission controllers in the list included in the Kubernetes apiserver : MutatingAdmissionWebhook and ValidatingAdmissionWebhook . These are special admission controllers that send admission requests to external HTTP callbacks and receive admission responses. If these two admission controllers are enabled, a Kubernetes administrator can create and configure an admission webhook in the cluster.

In broad strokes, the steps for doing that are as follows:

Check if the admission webhook controllers are enabled in the cluster, and configure them if needed. Write the HTTP callback that will handle admission requests. The callback can be a simple HTTP server that’s deployed to the cluster, or even a serverless function just like in Kelsey’s validating webhook demo. Configure the admission webhook through the ValidatingWebhookConfiguration and MutatingWebhookConfiguration resources.

The difference between the two types of admission webhook are pretty self-explanatory: validating webhooks can reject a request, but they cannot modify the object they are receiving in the admission request, while mutating webhooks can modify objects by creating a patch that will be sent back in the admission response. If a webhook rejects a request, an error is returned to the end-user.

If you’re looking for a real world example of admission webhooks, check out how the Istio service mesh uses mutating webhooks to automatically inject Envoy sidecar containers into pods.

Creating and configuring an admission webhook 🔗︎

Now that we’ve covered theory, let’s jump into the action and try it out in a real cluster. We’ll create a webhook server and deploy it to a cluster first, then create the webhook configuration and see if it works.

If you’d like to follow along, you’ll need a Kubernetes cluster first. You can use Pipeline to create K8s clusters on one of the six supported cloud providers, but you can use any Kubernetes cluster. In the example below, I’ve created a Kubernetes cluster with Pipeline on Amazon EKS.

Make sure that the MutatingAdmissionWebhook and ValidatingAdmissionWebhook controllers are enabled in the apiserver , and check if the admission registration API is enabled in your cluster by running:

kubectl api-versions

also, check if admissionregistration.k8s.io/v1beta1 is among the results.

Writing the webhook 🔗︎

We can now write our admission webhook server. In our example, it will serve as both a validating and a mutating webhook by listening on two different HTTP paths: validate and mutate . Next, we’ll figure out a simple task that can be easily implemented:

The Kubernetes documentation contains a common set of recommended labels that allows tools to work interoperably, describing objects in a common manner that all tools can understand. In addition to supporting tooling, the recommended labels describe applications in a way that can be queried.

In our validating webhook example, we’ll make these labels required on deployments and services, so our webhook will reject every deployment and every service that doesn’t have these labels set. Then we’ll configure our mutating webhook, which will add any of the missing required labels with not_available set as the value.

The complete code for the webhook is available on Github. There is a great tutorial about mutating admission webhooks by morvencao , and we’ve used that repo as the basis for our blog post by forking and modifying it.

Our webhook will be a simple HTTP server with TLS that’s deployed to our cluster as a deployment.

The main logic is in two files: main.go and webhook.go . The main.go file contains the parts necessary to create the HTTP server, while webhook.go contains the webhook logic that validates and/or mutates requests. For the sake of keeping this blog post clear, we won’t copy large code snippets here, but feel free to follow the links in the text that point to sources on Github.

Most of the code is pretty simple; you should take a look. Start by checking out main.go ; note how the HTTP server is started by using standard go packages, and how the certificates for the TLS configuration are read from command line flags.

The next interesting part is the serve function. This is the entry point for handling both the incoming mutate functions, and validating HTTP requests. The function unmarshals the AdmissionReview from the request, does some basic content-type validation, calls either the corresponding mutate or validate function based on the URL path, and then marshals the AdmissionReview response.

The main admission logic is in the validate and mutate functions. validate checks if the admission is required: we don’t want to validate resources in the kube-system and kube-public namespaces, and don’t want to validate a resource if there’s an annotation explicitly telling us to ignore it ( admission-webhook-example.banzaicloud.com/validate is set to false ). If validation is required, the service or deployment resource is unmarshaled from the request, based on resource kind , and the labels are compared to their counterparts. If some labels are missing, Allowed is set to false in the response. If validation fails, the reason for failure will be written in the response and the end-user will receive it when trying to create a resource.

The code for mutate is very similar, but instead of merely comparing the labels and putting Allowed in the response, a patch is created that adds the missing labels to the resource with not_available set as its value.

Building the project 🔗︎

It is not necessary to build the project to complete the following steps, because we already have a Docker container built and available that can be used. If you’re comfortable with the codebase and you’d like to modify something, you can build the project, create the Docker container, and push the container to Docker Hub. The build script does that for you. Make sure you have go , dep and docker installed, that you are logged into a Docker registry, and that DOCKER_USER is set like so:

./build

Deploying the webhook server to the cluster 🔗︎

To deploy the server, we’ll need to create a service and a deployment in our Kubernetes cluster. It’s pretty straightforward, except one thing, which is the server’s TLS configuration. If you’d care to examine the deployment.yaml file, you’ll find that the certificate and corresponding private key files are read from command line arguments, and that the path to these files comes from a volume mount that points to a Kubernetes secret:

args: - -tlsCertFile = /etc/webhook/certs/cert.pem - -tlsKeyFile = /etc/webhook/certs/key.pem [ ... ] volumeMounts: - name: webhook-certs mountPath: /etc/webhook/certs readOnly: true volumes: - name: webhook-certs secret: secretName: spot-mutator-webhook-certs

In a production cluster it’s important to properly handle your TLS certificates and especially private keys, so you may want to use something like cert-manager, or store your keys in Vault, instead of as plain Kubernetes secrets.

We can use any kind of certificates here. The most important thing to remember is to set the corresponding CA certificate later in the webhook configuration, so the apiserver will know that it should be accepted. For now, we’ll reuse the script originally written by the Istio team to generate a certificate signing request. Then we’ll send the request to the Kubernetes API, fetch the certificate, and create the required secret from the result.

First, run this script and check if the secret holding the certificate and key has been created:

$ ./deployment/webhook-create-signed-cert.sh creating certs in tmpdir /var/folders/3z/ \_ d8d8kl951ggyvw360dkd_y80000gn/T/tmp.xPApwE5H Generating RSA private key, 2048 bit long modulus ..............................................+++ ...........+++ e is 65537 ( 0x10001 ) certificatesigningrequest.certificates.k8s.io "admission-webhook-example-svc.default" created NAME AGE REQUESTOR CONDITION admission-webhook-example-svc.default 1s ekscluster-marton-423 Pending certificatesigningrequest.certificates.k8s.io "admission-webhook-example-svc.default" approved secret "admission-webhook-example-certs" created $ kubectl get secret admission-webhook-example-certs NAME TYPE DATA AGE admission-webhook-example-certs Opaque 2 2m

Once the secret is created, we can create deployment and service. These are standard Kubernetes deployment and service resources. Up until this point we’ve produced nothing but an HTTP server that’s accepting requests through a service on port 443:

$ kubectl create -f deployment/deployment.yaml deployment.apps "admission-webhook-example-deployment" created $ kubectl create -f deployment/service.yaml service "admission-webhook-example-svc" created

Configuring the webhook 🔗︎

Now that our webhook server is running, it can accept requests from the apiserver. However, we should create some configuration resources in Kubernetes first. Let’s start with our validating webhook, then we’ll configure the mutating webhook later. If you take a look at the webhook configuration, you’ll notice that it contains a placeholder for CA_BUNDLE :

clientConfig: service: name: admission-webhook-example-webhook-svc path: "/validate" caBundle: ${ CA_BUNDLE }

As mentioned earlier, the CA certificate should be provided to the admission webhook configuration, so the apiserver can trust the TLS certificate of the webhook server. Because we’ve signed our certificates with the Kubernetes API, we can use the CA cert from our kubeconfig to simplify things. There is a small script that substitutes the CA_BUNDLE placeholder in the configuration with this CA. Run this command before creating the validating webhook configuration:

cat ./deployment/validatingwebhook.yaml | ./deployment/webhook-patch-ca-bundle.sh > ./deployment/validatingwebhook-ca-bundle.yaml

Then take a look at validatingwebhook-ca-bundle.yaml . If the script ran properly, the CA_BUNDLE should be populated like so:

$ cat deployment/validatingwebhook-ca-bundle.yaml apiVersion: admissionregistration.k8s.io/v1beta1 kind: ValidatingWebhookConfiguration metadata: name: validation-webhook-example-cfg labels: app: admission-webhook-example webhooks: - name: required-labels.banzaicloud.com clientConfig: service: name: admission-webhook-example-webhook-svc namespace: default path: "/validate" caBundle: LS0...Qo = rules: - operations: [ "CREATE" ] apiGroups: [ "apps" , "" ] apiVersions: [ "v1" ] resources: [ "deployments" , "services" ] namespaceSelector: matchLabels: admission-webhook-example: enabled

The webhook’s clientConfig is pointing to our previously deployed service, with the path /validate . Remember, we’ve created two different paths in our HTTP server for validation and mutation.

The second section contains the rules - the operations and resources that the webhook will validate. We’d like to intercept API requests when a deployment or a service is CREATED , so apiGroups and apiVersions are filled out accordingly ( apps/v1 for deployments , v1 for services ). We can use wildcards ( * ) for these fields as well.

The last part of the webhook contains the namespaceSelector . We can define a selector for specific namespaces where our webhook will work. It’s not a required property, but we’ll try it out now. Our webhook will only work in namespaces where the admission-webhook-example: enabled label is set. You can check out the complete layout of this resource configuration in the Kubernetes reference docs.

So let’s label the default namespace first:

$ kubectl label namespace default admission-webhook-example = enabled namespace "default" labeled $ kubectl get namespace default -o yaml apiVersion: v1 kind: Namespace metadata: creationTimestamp: 2018-09-24T07:50:11Z labels: admission-webhook-example: enabled name: default ...

Finally, create the configuration for the validating webhook. This will dynamically add the webhook to the chain, so, as soon as the resource is created, requests will be intercepted and our webhook will be called:

$ kubectl create -f deployment/validatingwebhook-ca-bundle.yaml validatingwebhookconfiguration.admissionregistration.k8s.io "validation-webhook-example-cfg" created

Try it out 🔗︎

Now the exciting part: let’s create a deployment and see if our validation works. We’ll take a dummy deployment that contains a container that only sleeps. The command should fail and produce an error like this:

$ kubectl create -f deployment/sleep.yaml Error from server ( required labels are not set ) : error when creating "deployment/sleep.yaml" : admission webhook "required-labels.banzaicloud.com" denied the request: required labels are not set

Okay, let’s see if we can make it work. There is another dummy deployment in the repo that contains these labels on the deployment’s metadata:

$ kubectl create -f deployment/sleep-with-labels.yaml deployment.apps "sleep" created

It’s now working, but let’s try one more thing. Delete the deployment and create the last one, where the required labels are not present, but set the admission-webhook-example.banzaicloud.com/validate annotation to false . It should work as well.

$ kubectl delete deployment sleep $ kubectl create -f deployment/sleep-no-validation.yaml deployment.apps "sleep" created

Trying out the mutating webhook 🔗︎

To try out the mutating webhook: first, delete the validating webhook’s configuration, so it won’t interfere, then deploy the new configuration. The mutating webhook configuration is basically the same as the validating one, but the webhook service path is set to /mutate , so the apiserver will send requests to the other path of our HTTP server. It contains a CA_BUNDLE placeholder as well, so we need to populate that first.

$ kubectl delete validatingwebhookconfiguration validation-webhook-example-cfg validatingwebhookconfiguration.admissionregistration.k8s.io "validation-webhook-example-cfg" deleted $ cat ./deployment/mutatingwebhook.yaml | ./deployment/webhook-patch-ca-bundle.sh > ./deployment/mutatingwebhook-ca-bundle.yaml $ kubectl create -f deployment/mutatingwebhook-ca-bundle.yaml mutatingwebhookconfiguration.admissionregistration.k8s.io "mutating-webhook-example-cfg" created

Now we can deploy our sleep application again, and see if the labels were properly added:

$ kubectl create -f deployment/sleep.yaml deployment.apps "sleep" created $ kubectl get deploy sleep -o yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: annotations: admission-webhook-example.banzaicloud.com/status: mutated deployment.kubernetes.io/revision: "1" creationTimestamp: 2018-09-24T11:35:50Z generation: 1 labels: app.kubernetes.io/component: not_available app.kubernetes.io/instance: not_available app.kubernetes.io/managed-by: not_available app.kubernetes.io/name: not_available app.kubernetes.io/part-of: not_available app.kubernetes.io/version: not_available ...

For our last example, recreate the validating webhook so both of them are available. Now, try to create sleep again. It should succeed because, as it’s put in the documentation:

The admission control process proceeds in two phases. In the first phase, mutating admission controllers are run. In the second phase, validating admission controllers are run.

So the mutating webhook adds the missing labels in the first phase, then the validating webhook won’t reject the deployment in the second phase, because the labels are already present, with not_available set as their value.