Kubernetes Best Practices - Declarative Deployments

Deployments are higher-level controllers that use ReplicaSets to ensure that no downtime is incurred when you need to update your Pods. Deployments helps you to orchestrate your upgrades in different ways for best reliability and availability of your applications.

I will discuss in this article how you can do zero-downtime releases using Deployments. We will also dig into more advanced deployment scenarios such as Blue/Green and Canary workflows that give you more control on your release process.

Let’s Prepare Our Images

We’ll use a custom web server image. We use Nginx as the base image. To test different versions of the image, we create three different images. The only difference is the text displayed in the default page the web server displays.

We create three Dockerfiles as follows:

Dockerfile_1:

FROM nginx:latest COPY v1.html /usr/share/nginx/html/index.html

Dockerfile_2:

FROM nginx:latest COPY v2.html /usr/share/nginx/html/index.html

Dockerfile_3:

FROM nginx:latest COPY v3.html /usr/share/nginx/html/index.html

The HTML files are as follows:

v1.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<meta http-equiv="X-UA-Compatible" content="ie=edge">

<title>Release 1</title>

</head>

<body>

<h1>This is release #1 of the application</h1>

</body>

</html>

v2.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<meta http-equiv="X-UA-Compatible" content="ie=edge">

<title>Release 2</title>

</head>

<body>

<h1>This is release #2 of the application</h1>

</body>

</html>

v3.html

<!DOCTYPE html>

<html lang="en">

<head>

<meta charset="UTF-8">

<meta name="viewport" content="width=device-width, initial-scale=1.0">

<meta http-equiv="X-UA-Compatible" content="ie=edge">

<title>Release 3</title>

</head>

<body>

<h1>This is release #3 of the application</h1>

</body>

</html>

Finally, we need to build and push those images:

docker build -t magalixcorp/mywebserver:1 -f Dockerfile_1 . docker build -t magalixcorp/mywebserver:2 -f Dockerfile_2 . docker build -t magalixcorp/mywebserver:3 -f Dockerfile_3 . docker push magalixcorp/mywebserver:1 docker push magalixcorp/mywebserver:2 docker push magalixcorp/mywebserver:3

Zero Downtime with Rolling Updates

Let’s create a Deployment for running v1 for our web server. Create a YAML file called nginx_deployment.yaml and add the following:

--- apiVersion: v1 kind: Service metadata: name: mywebservice spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: NodePort --- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver spec: replicas: 4 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: magalixcorp/mywebserver:1 name: nginx readinessProbe: httpGet: path: / port: 80 httpHeaders: - name: Host value: K8sProbe

The above YAML file is defining two entities: (1) the Service that enables external access to the Pods, and (2) the Deployment Controller.

We are using here the RollingUpdate strategy, which is the default update strategy if you don’t explicitly define a Deployment Strategy parameter. This Deployment uses the Pod template to schedule four Pods, each hosting one container running magalix/mywebserver:1 image instance. Let’s deploy the Service and the Deployment to see that in action:

kubectl apply -f nginx_deployment.yaml

Now, let’s ensure that our Pods are in the running state:

$ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-68cd66868f-78jgt 1/1 Running 0 5m29s mywebserver-68cd66868f-kdxx9 1/1 Running 0 29m mywebserver-68cd66868f-lh6wz 1/1 Running 0 29m mywebserver-68cd66868f-vvqrh 1/1 Running 0 5m29s

If we want to actually see the contents of the web page nginx is serving, we need to know the port that the Service is listening at:

$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 443/TCP 2d10h mywebservice NodePort 10.107.1.198 80:32288/TCP 32m

Our mywebservice Service is using port 32288 to route traffic to the Pod on port 80. If you navigate to http://node_ip:3288 you should see something similar to the following:

Upgrading Your Deployments

There is more than one way to update a running Deployment, one of them is modifying the definition file to reflect the new changes and applying it using kubectl. Change the .spec.template.spec.containers[].image in the definition file to look as follows:

spec: containers: - image: magalixcorp/mywebserver:2

Clearly, the only thing that changed is the image tag: we need to deploy the second version of our application. Apply the file using kubectl:

$ kubectl apply -f nginx_deployment.yaml service/mywebservice unchanged deployment.apps/mywebserver configured

Testing the Deployment strategy

Now, quickly run kubectl get pods to see what the Deployment is doing to the Pods it manages:

$ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-68cd66868f-7w4fc 1/1 Terminating 0 83s mywebserver-68cd66868f-dwknx 1/1 Running 0 94s mywebserver-68cd66868f-mv9dg 1/1 Terminating 0 94s mywebserver-68cd66868f-rpr5f 0/1 Terminating 0 84s mywebserver-77d979dbfb-qt58n 1/1 Running 0 4s mywebserver-77d979dbfb-sb9s5 1/1 Running 0 4s mywebserver-77d979dbfb-wxqfj 0/1 ContainerCreating 0 0s mywebserver-77d979dbfb-ztpc8 0/1 ContainerCreating 0 0s $ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-68cd66868f-dwknx 0/1 Terminating 0 100s mywebserver-77d979dbfb-qt58n 1/1 Running 0 10s mywebserver-77d979dbfb-sb9s5 1/1 Running 0 10s mywebserver-77d979dbfb-wxqfj 1/1 Running 0 6s mywebserver-77d979dbfb-ztpc8 0/1 Running 0 6s $ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-77d979dbfb-qt58n 1/1 Running 0 25s mywebserver-77d979dbfb-sb9s5 1/1 Running 0 25s mywebserver-77d979dbfb-wxqfj 1/1 Running 0 21s mywebserver-77d979dbfb-ztpc8 1/1 Running 0 21s

Running the command quickly several times shows us how the Deployment is creating new Pods that use the new Pod template while - at the same time - terminating the old Pods. However, it’s evident that at no point we have zero running Pods. The Deployment is ensuring that replacing old Pods with new ones gradually while always keeping a portion of them running. This portion can be controlled by changing maxSurge and maxUnavailable parameters.

maxSurge: the number of Pods that can be deployed temporarily in addition to the new replicas. Setting this to 1 means we can have a maximum total of five running Pods during the update process (the four replicas + 1).

maxUnavailable: the number of Pods that can be killed simultaneously during the update process. In our example, we can have at least three Pods running while the update is in progress (4 replicas - 1).

While the update is in progress, you can try refreshing the web page several times; you’ll notice that the webserver is always responsive. You never encountered an unanswered HTTP request.

If you navigate to the page now, you should see something like the following:

A note for high-traffic environments

As noted by our Reddit reader, mym6, if we are in a high-traffic environment, the service may not be instantly aware that a Pod was down so that it does not route traffic to it. This causes some connections to drop while the update is in progress. To address this potential problem, we can add a lifecycle step that pauses the thread for a few seconds to ensure that no connections get dropped:

lifecycle: preStop: exec: command: ["/bin/bash","-c","sleep 20"]

Fixed Deployment Strategy

Most of the time, you need to have zero downtime when deploying a new version of your software. However, in some cases, you require to deny access to the old application version totally, even if that entails displaying an error message or an “under maintenance” message to your clients for a brief period. The most common use case for such a scenario is when you have a severe bug or security vulnerability that was not patched yet (aka zero-day vulnerability). When you encounter that sort of situation, the rolling update strategy may work against you. Clients will still use the old, vulnerable application version while the patched version deployment is in progress. Compromising your users’ security poses a much more significant threat than giving them a “come back later” message.

Your best option here is to use the Fixed Deployment Strategy. In this Deployment workflow, the Deployment will not gradually replace the Pods. Instead, it kills all the Pods running the old application version; then it recreates them using the new Pod template. To use this strategy, you set the strategy type to recreate.

Deployment Using The Recreate Strategy Type

Assuming that version 2 of our web server contained severe security bugs. Our engineers did the necessary patching, and we are now ready to deploy the new version. Change the deployment part of the definition file to look as follows:

--- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver spec: replicas: 4 strategy: type: Recreate selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: magalixcorp/mywebserver:3 name: nginx readinessProbe: httpGet: path: / port: 80 httpHeaders: - name: Host value: K8sProbe

Now, let’s apply the new definition and immediately check the status of the Pods to view what’s happening:

$ kubectl apply -f nginx_deployment.yaml service/mywebservice unchanged deployment.apps/mywebserver configured $ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-77d979dbfb-ztpc8 0/1 Terminating 0 60m $ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-5f6bbd8587-4qpzt 0/1 ContainerCreating 0 4s mywebserver-5f6bbd8587-76c4v 0/1 ContainerCreating 0 4s mywebserver-5f6bbd8587-s6v64 0/1 ContainerCreating 0 4s mywebserver-5f6bbd8587-tt6nc 0/1 ContainerCreating 0 4s $ kubectl get pods NAME READY STATUS RESTARTS AGE mywebserver-5f6bbd8587-4qpzt 0/1 Running 0 15s mywebserver-5f6bbd8587-76c4v 0/1 Running 0 15s mywebserver-5f6bbd8587-s6v64 0/1 Running 0 15s mywebserver-5f6bbd8587-tt6nc 0/1 Running 0 15s

As you can see, the Deployment started by terminating all the running Pods, then it created the new ones all at once until the desired number of Pod replicas was reached. If you tried refreshing the web page during the deployment, you might have found that the web page was unresponsive. Perhaps you received an “Website unreachable” or a similar message depending on the browser you’re using. Again, this is the desired behavior as we don’t need someone using our v2 of the hypothetical app until v3 is deployed. The web page should look something like this now:

Blue/Green Release Strategy

Also referred to as A/B deployment, the Blue/Green deployment involves having two sets of identical hardware. The software application is deployed to both environments at the same time. However, only one of the environments receives live traffic while the other remains idle. When a new version of the application is ready, it gets deployed to the blue environment. The network is directed to the blue environment through a router or a similar mechanism. If problems are detected on the new release and a rollback is needed, the only action that should be done is redirecting traffic back to the green environment.

In the next software release iteration, the new code gets deployed to the green environment. The blue environment now can be used as a staging environment or for disaster recovery purposes.

The advantages of this strategy are that - unlike rolling update - there is zero downtime during the deployment process, although there’s never more than one version of the application running at the same time.

The drawback, however, is that you need to double the resources hosting the application, which may increase your costs.

Zero downtime and no concurrent versions with Blue/Green deployments

Let’s have a quick revisit to the Recreate strategy that was used in an earlier example. While we were able to protect our clients from our application version that was compromised, we still incurred a downtime. In mission-critical environments, this is not acceptable. However, using the Deployment and Service resource controllers, we can quickly achieve both targets. The idea is to create a second Deployment with the new application version (blue) while the original one is still running (green). Once all the Pods in the Blue deployment are ready, we instruct the Service to switch to the Blue deployment (by changing the Pod Selector appropriately). If a rollback is required, we shift the selector back to the green Pods. Let’s see how this can be done in our example.

First, let’s destroy the current Deployment:

kubectl delete deployment mywebserver

First, let’s split our definition file so that each: the Service and the Deployment lives in its file. So, you may have a file called nginx_deployment.yaml and nginx_service.yaml. Copy nginx_deployment.yaml to nginx_deployment_blue.yaml and rename the source file to be nginx_deployment_green.yaml. So, to wrap up, you should have the following three files in your directory:

nginx_deployment_green.yaml

nginx_deployment_blue.yaml

Nginx_service.yaml

So far, the first two files are the same. Let’s change nginx_deployment_green.yaml to look as follows:

--- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver-green spec: replicas: 4 strategy: type: Recreate selector: matchLabels: app: nginx_green template: metadata: labels: app: nginx_green spec: containers: - image: magalixcorp/mywebserver:1 name: nginx

The deployment is using v1 of our application. We also appended “-green” to the Pod tags and their selectors denoting that this is the green deployment. Additionally, the deployment name is mywebserver_green, indicating that this is the green deployment.

Notice that we’re using the Recreate deployment strategy. Using Recreate or RollingUpdate is of no significance here as we are not relying on the Deployment controller to perform the update.

Although we are doing this exercise manually since the only supported deployment strategies - currently - are RollingUpdate and Recreate, it is worth noting that there are tools that can automate this process like Istio and Knative.

Let’s now change the nginx_deployment_blue.yaml file to look as follows:

--- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver-blue spec: replicas: 4 strategy: type: Recreate selector: matchLabels: app: nginx_blue template: metadata: labels: app: nginx_blue spec: containers: - image: magalixcorp/mywebserver:2 name: nginx

The changes are similar to what we did with the green deployment file except that we are labelling this one as the blue version.

Our Service should be defined as follows:

--- apiVersion: v1 kind: Service metadata: name: mywebservice spec: selector: app: nginx_blue ports: - protocol: TCP port: 80 targetPort: 80 type: NodePort

The only difference between this Service definition file and the one used earlier is that we changed the Pod Selector to match the blue Pods that are part of our blue deployment.

Testing the Blue/Green Deployment

Let’s create the blue deployment by running:

$ kubectl apply -f nginx_deployment_blue.yaml deployment.apps/mywebserver-blue created

And the green one:

$ kubectl apply -f nginx_deployment_green.yaml deployment.apps/mywebserver-green created

And finally the service:

$ kubectl apply -f nginx_service.yaml service/mywebservice configured

Navigating to http://node_ip:32288 shows that we are using version 2 of our application. If we need to quickly rollback to version 1, we just change the Service definition in nginx_service.yaml to look as follows:

--- apiVersion: v1 kind: Service metadata: name: mywebservice spec: selector: app: nginx_green ports: - protocol: TCP port: 80 targetPort: 80 type: NodePort

Now, refreshing the web page shows that we have reverted to version 1 of our application. There was no downtime during this process. Additionally, we had only one version of our application running at any particular point in time.

Canary Deployment Strategy

Canary Deployment is a popular release strategy that focuses more on “testing the air” before going with the full deployment.

The name of Canary Deployment Strategy has its origins rooted back to coal miners. When a new mine is discovered, workers used to carry a cage with some Canary birds. They placed the cage at the mine entrance. If the birds die, that was an indication of toxic Carbon Monoxide gas emission.

So, what does coal mining have to do with software deployment? While the implementation is different (way to go, Canaries!), the concept remains the same. When software is released using a Canary deployment, a small subset of the incoming traffic is directed to the new application version while the majority remains routed to the old, stable version.

The main advantage of this method is that you get customer feedback quickly on any new features your application offers. If things go wrong, you can easily route all the traffic to the stable version. If enough positive feedback is received, you can gradually increase the portion of traffic going to the new version until it reaches 100%.

Canary Testing Using Kubernetes Deployments And Services

Assuming that we are currently running version 1 of our application and we need to deploy version 2. We want to test some metrics like latency, CPU consumption under different load levels. We’re also collecting feedback from the users. If everything looks good, we do a full deployment.

We’re doing this the old-fashioned way for demonstration purposes, yet some tools, such as Istio, can automate this process.

The first thing we need to do it create two Deployment definition files; one of them uses version 1 of the image, and the other one uses version 2. Both Deployments use the same Pod labels and selectors. The files should look something like the following:

nginx_deployment_stable.yaml:

--- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver-stable spec: replicas: 6 strategy: type: Recreate selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: magalixcorp/mywebserver:1 name: nginx

nginx_deployment_canary.yaml

--- apiVersion: apps/v1 kind: Deployment metadata: name: mywebserver-canary spec: replicas: 2 strategy: type: Recreate selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: magalixcorp/mywebserver:2 name: nginx

Both files look identical except for the Deployment name and the Pod image. Notice that we set the number of replicas on the “stable” deployment to 6 while we’re deploying only 2 Pods on the Canary one. This is intentional; we need 25% only of our application to serve version 2 while the remaining 75% continues to serve version 1.

Unlike the Blue/Green strategy, we don’t make changes to the Service definition:

--- apiVersion: v1 kind: Service metadata: name: mywebservice spec: selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 type: NodePort

The Service routes traffic to any Pod labelled app=nginx. The Deployment controls which Pods get more traffic by increasing/decreasing the number of replicas.

Testing the Canary deployment

Let’s apply our Deployments and the Service:

$ kubectl apply -f nginx_deployment_stable.yaml deployment.apps/mywebserver-stable created $ kubectl apply -f nginx_deployment_canary.yaml deployment.apps/mywebserver-canary created $ kubectl apply -f nginx_service.yaml service/mywebservice configured

If you refresh the web page http://node_port:32288 several times, you may occasionally see version 2 of the application shows.

Increasing the percentage of users going to version 2 is as simple as increasing the replicas count on the Canary deployment and decreasing the replicas count on the stable one. If you need to rollback, you just set the number of replicas to be 8 (100%) on the stable Deployment and deleting the Canary Deployment. Alternatively, you can go ahead with the Canary Deployment be reversing this operation.

TL;DR