In this article i’m going to introduce you to one useful technique for delivering automatically web application serenely : the Canary deployment method.

What is Canary deployment ?

Canary is a type of deployment method that reduce the risk of introducing a new software version in production by gradually shifting traffic to the new version while measuring metrics like Http request success rate and latency. It allows do to capacity testing of the new version in a production environment with a safe rollback strategy if issues are found. By switching slowly the traffic, you can monitor and capture metrics about how the new version impact the production environment.

How Canary work with Linkerd and Flagger ?

Linkerd is responsible of collecting application metrics and Flagger is responsible of executing the automatic deployment delivery. The job of Flagger is to create a Canary pod with the new release and send a little amount of traffic to determine the success from the Linkerd metrics. If the Canary pod is working correctly, Flagger trigger the deployment update.

What is Linkerd ?

Linkerd is a service mesh for Kubernetes, it gives you monitoring on you services without requiring any changes in your applications. Linkerd stored by default the monitoring metrics in a Prometheus Tsdb and include a web user interface.

What is Flagger ?

Flagger is a progressive delivery operator for Kubernetes, it is designed to give you confidence in automating deployment releases with progressive delivery techniques. It support many tools including Linkerd and other deployment method like A/B testing or classic Blue/Green.

Our use case : hosting a Node.js Api

To demonstrate how to implement Canary, i will take a simple common use case : hosting a Node.js Api written by myself.

Requirement

A Kubernetes cluster (1.14+)

Linkerd Cli

Kubectl

The Test Api

The test Api is pretty simple, it just return a Hello World message with the actual Api version : Hello v1 . My Node.js source code and Dockerfiles are stored on my Github here if you want to take a look.

For this test i have built 3 Docker images with tags reflecting 3 releases of my Api. My Docker images are available publicly on Dockerhub :

cyrilbkr/testapp:1.0

cyrilbkr/testapp:2.0

cyrilbkr/testapp:3.0

The 3.0 release return a 404 error, it’s use for simulating a bug in the application for showing you how the automatic rollback is working.

Linkerd & Flagger setup

Create a namespace called Linkerd then install Linkerd with the Cli tool and setup Flagger with kubectl :

$ kubectl create ns linkerd $ linkerd install | kubectl apply -f - $ kubectl apply -k github.com/weaveworks/flagger/kustomize/linkerd

Look in the documentation for more information or to customize your Linkerd & Flagger setup if needed. For example, by default Linkerd is shipped with a Prometheus server but you can plug your own already existent Prometheus.

Also don’t forget to expose by yourself the webui with an ingress definition or use portforward on your local pc.

Deploying our V1 Api on Kubernetes

We will deployed our Api on Kubernetes based on a traditionnal deployment & service configuration.

Create a namespace called testapi with the linkerd inject: enabled annotation :

annotations:

linkerd.io/inject: enabled

Create a deployment called api with the linkerd inject: enabled annotation :

annotations:

linkerd.io/inject: enabled

Create a service targeting port 3000 with 2 Flagger annotations

annotations:

app.kubernetes.io/name: loadtester

app.kubernetes.io/instance: flagger

Now it’s time to define what are the parameters of our Canary, in my case i want to test if my new release http request test are higher than 99% of success during one minute. You can use other parameters to define success like latency.

Create a Canary definition with the metrics to analysis

canaryAnalysis:

interval: 10s

threshold: 5

stepWeight: 10

metrics:

- name: request-success-rate

threshold: 99

interval: 1m

Create a Flagger config file with the Api target service and set how many queries per second you want to send by the load generator

- name: QPS

value: "10"

- name: CONCURRENCY

value: "1"

Deploy all the yaml files with kubectl and check the status

$ kubectl get pod -n testapi

NAME READY STATUS RESTARTS AGE

api-primary-f6b5df849-6fhgl 2/2 Running 0 2m59s

frontend-66db854c46-592z7 2/2 Running 0 2m52s

load-5b8c87b478-vp2nd 2/2 Running 0 2m52s $ kubectl get canaries -n testapi

NAME STATUS WEIGHT LASTTRANSITIONTIME

api Initialized 0 2020-03-08T02:57:04Z $ kubectl -n testapi get ev --watch | grep -e canary -e Initia

[...]

Synced canary/api Initialization done!

As you can see our deployment is now called api-primay, it’s because the canary configuration takes over the initial deployment.

On the Linkerd webui you can now see http service monitoring in real time in the namespace testapi

You can check in Grafana that the Api is working properly and return a http request success rate of 100%. Success rate is calculated by sending http request automatically from the load generator to our Api as defined in the flagger.yaml : — name: QPS value: “10”

Upgrading to V2 using Canary progressive delivery

Update the Docker image tag for starting the delivery

$ kubectl -n testapi set image deployment/api api=cyrilbkr/testapp:2.0

A new single pod independent from the production one with the new Docker image is deployed, this is the Canary :

$ kubectl get pod -n testapi

[...]

api-865f969b87-9jj8v 0/2 Init:0/1 0 1s

Flagger start switchting the traffic by 10% :

$ kubectl -n testapi get ev --watch

0s Normal Synced canary/api Advance api.testapi canary weight 10

Then after verify success rate, it switches the traffic 10% per 10% to the Canary

$ kubectl get canaries -n testapi

NAME STATUS WEIGHT LASTTRANSITIONTIME

api Progressing 70 2020-03-08T03:07:24Z $ kubectl get canaries -n testapi

NAME STATUS WEIGHT LASTTRANSITIONTIME

api Succeeded 0 2020-03-08T03:08:24Z

After switching 100% of the traffic to the new release, the old deployment and the original Canary pod are terminated

Upgrading to V3 containing an error

Now we will deploy a new release of our Api (3.0) containing a 404 error to check how the system will reject this new release and don’t deliver it in production.

Update the Docker image tag for starting the delivery :

$ kubectl -n testapi set image deployment/api api=cyrilbkr/testapp:3.0

Flagger start to send 10% of the traffic to the Canary, API respond 404 :

$ kubectl get pod -n testapi

$NAME READY STATUS RESTARTS api-c4f659496-4pqzt 0/2 PodInitializing 0 4s $ kubectl get canaries -n testapi

NAME STATUS WEIGHT LASTTRANSITIONTIME

api Progressing 10 2020-03-08T03:21:54Z

As you can see in Grafana, the http success rate is null due to only 404 errors from the Canary.

After reaching the threshold setup earlier in the configuration, the system put the delivery in rollback state, reroute the 10% of traffic to the production service and destroy the Canary.

$ kubectl -n testapi get ev --watch

0s Warning Synced canary/api Rolling back api.testapi failed checks threshold reached 5 $ kubectl get canaries -n testapi

NAME STATUS WEIGHT LASTTRANSITIONTIME

api Failed 0 2020-03-08T03:31:04Z

Conclusion

Canary delivery with Linkerd & Flagger is a powerful technique that reduce errors by automatically ensure your application is working before delivering it in production. It give you also tools to monitor in real time what’s happen during a deployment.

Reference

Linkerd : https://linkerd.io/2/tasks/canary-release/

Flagger : https://docs.flagger.app/tutorials/linkerd-progressive-delivery