One of the Platform team's biggest principles is making our platform self-service to engineers. GitOps has allowed us to make this possible.

At Mettle we fully leverage GitOps to deploy everything into our clusters, we chose to use Flux CD (https://github.com/fluxcd/flux) as our GitOps controller of choice.

Flux is a tool that automatically ensures that the state of a cluster matches the config in git. It uses an operator in the cluster to trigger deployments inside Kubernetes, which means you don’t need a separate CD tool. Flux monitors all relevant image repositories, detects new images, triggers deployments and updates the desired running configuration based on that (and a configurable policy).

Overview of how Flux works (ref https://github.com/fluxcd/helm-operator-get-started)

How we use Flux @ Mettle

We have a number of flux instances running within our clusters, each of them reconciling a specific repository within GitHub. Specifically, regarding workloads, the whole process works off three main repositories which I will explain in detail below.

k8s-helm-charts

This is where all our custom helm charts live, on merging into the master branch we create a new image tag container the existing charts plus any new versions created as part of the Pull Request.

We serve these charts from our own Helm registry running inside the cluster which is accessible via http://k8s-helm-charts.flux.svc.cluster.local.

The deployment specification for our helm registry has the following annotations so that Flux automatically deploys newly created images tags.

annotations:

flux.weave.works/automated: "true"

flux.weave.works/tag.k8s-helm-charts: semver:~1.1

So now we have our helm registry inside the cluster, but how do we deploy?

kubernetes-resources

This repository contains a mixture of raw Kubernetes manifests ( yaml files) and HelmReleases . Resources include RBAC policies and PodSecurityPolicy manifests as well as HelmReleases for utilities such as the NGINX ingress controller and Istio.

I want to take a second to dig into the makeup of a HelmRelease that leverages one of our custom charts (see Prometheus below):

spec:

chart:

name: prometheus

repository: http://k8s-helm-charts.flux.svc.cluster.local

version: 2.6.43

releaseName: prometheus

You can see from above the helm chart version specified and the location is the helm registry running locally inside the cluster. There is obviously a race-condition which I will cover later in this post.

We would have a directory per environment containing all the resources that the environment required to deploy our definition of a “vanilla cluster”. The directory structure used to look something like below:

.

├── environments

│ ├── sbx

│ ├── dev

│ └── stg

│ ├── prd

We bootstrap our clusters using bootkube (https://github.com/kubernetes-sigs/bootkube) and as part of our bootkube assets, we deploy a Flux instance looking at this repository at a specific environment path (see below).

Once a full reconciliation loop has completed the “vanilla cluster” is ready to receive custom workloads.

k8s-releases-mettle

The repository simply contains the HelmRelease definitions for every microservice which makes up Mettle, one per each environment. At present, we deploy ~100 microservices per environment. A snippet from one of the HelmReleases can be seen below:

apiVersion: helm.fluxcd.io/v1

kind: HelmRelease

metadata:

annotations:

flux.weave.works/automated: "true"

name: account-balance

namespace: eevee

spec:

chart:

name: backend

repository: http://k8s-helm-charts.flux.svc.cluster.local

version: 2.0.43

releaseName: account-balance

For consistency, all our backend microservices are derived from one of two backend specific helm charts to ensure consistency around things such as our labeling taxonomy. Again, the same as kubernetes-resources we would have a directory per environment containing all the HelmRelease definitions.

Adding Kustomize into the mix

In the early days, we had a lot of duplication between environments both within kubernetes-resources and k8s-releases-mettle and this is when we started to look closely at Kustomize (https://github.com/kubernetes-sigs/kustomize).

The Platform team started with kubernetes-resources since we wanted to prove Kustomize ourselves here and not impact the engineering self-service workflow. Additionally, we don’t see many changes to the base resources so we felt comfortable using this repository as the testbed.

We started by creating the following directories

.

└── kustomize

├── base

├── dev

├── prd

├── sbx

├── stg

The base directory

Everything contained in kustomize/base is non-environment specific configuration and aligned around areas of the cluster (e.g. ingress or single-sign-on). An example of this can be seen below:

cluster

├── helmreleases

├── namespaces

├── priorityclasses

├── psps

├── rbac

└── storageclasses

An example HelmRelease in kustomize/base looks like this:

---

apiVersion: helm.fluxcd.io/v1

kind: HelmRelease

metadata:

annotations:

flux.weave.works/automated: "false"

flux.weave.works/tag.chart-image: semver:~1.17

name: cluster-autoscaler

namespace: kube-system

spec:

chart:

name: cluster-autoscaler

repository: https://kubernetes-charts.storage.googleapis.com/

version: 6.3.0

releaseName: cluster-autoscaler

values:

autoDiscovery:

tags:

- k8s.io/cluster-autoscaler/enabled

awsRegion: eu-west-2

image:

repository: k8s.gcr.io/cluster-autoscaler

tag: "v1.17.0"

nodeSelector:

node.kubernetes.io/role: critical

podAnnotations:

iam.amazonaws.com/role: cluster-autoscaler

priorityClassName: system-cluster-critical

tolerations:

- key: node.kubernetes.io/role

value: critical

operator: Equal

effect: NoSchedule

Note how no environment-specific ingress annotations are applied here.

The “environment-specific” directories

The environment-specific directory structure aligns with the directory structure within the base directory. However, it only contains the directories it requires for its environment. For example, every environment inherits the base/cluster directory but not necessarily the tools directory. Let’s take a deep look into the makeup of an environment directory…

At the top level, we have a single kustomization.yaml file which references the sub-directories within it (see below)

apiVersion: kustomize.config.k8s.io/v1beta1

kind: Kustomization

bases:

- cert-manager

- cluster

Inside these sub-directories sits the following directories and files:

cluster

├── helmreleases

└── kustomization.yaml

This kustomization.yaml is slightly different, it inherits from the corresponding directory in base but here is where we patch the environment-specific configuration.

apiVersion: kustomize.config.k8s.io/v1beta1

kind: Kustomization

bases:

- ../../base/cluster

patches:

- helmreleases/cluster-autoscaler.yaml

- helmreleases/kiam.yaml

The patch for cluster-autoscaler simply looks like below:

apiVersion: helm.fluxcd.io/v1

kind: HelmRelease

metadata:

name: cluster-autoscaler

namespace: kube-system

spec:

values:

autoDiscovery:

clusterName: dev

The use of “kustomize build”

To see what Flux is going to reconcile we can execute the following:

kustomize build kustomize/dev > dev.yaml

The above will create a file containing all the resources to be deployed to the cluster with the environment-specific patches also applied.

What does this give us?

This provides us with 95% of the resource definitions in kustomize/base and allows us to easily see the differences between environments.

Additionally, it means making changes directly to base are propagated across all environments (e.g. upgrading to a new chart version in a HelmRelease) .

On the contrary, we can make a change to a single environment easily for testing, such as upgrading the version of the chart used.

Importantly, it allows us to perform validation that resources to be applied to the cluster are within spec via CI and before they ever reach the cluster. We leveraging both rego policies and strict schema linting using Kubeval.

For more information on this see links below

k8s-releases-mettle

The Platform team then worked with the Engineering team to move k8s-releases-mettle to Kustomize since we had learned the lessons already. They aligned to the same top-level directory structure as us:

.

└── kustomize

├── base

├── dev

├── prd

├── sbx

├── stg

Their sub-directory structure is similar to kubernetes-resources but is scoped at the namespace level instead. The engineers defined this structure as they are the ones responsible for that repository so it needs to make sense to them.

Application Promotion

I think it's important to talk about how micro-service versions are promoted through our environments.

Let’s start by looking at a HelmRelease in the base directory of k8s-releases-mettle:

apiVersion: helm.fluxcd.io/v1

kind: HelmRelease

metadata:

annotations:

flux.weave.works/automated: "true"

name: account-balance

namespace: eevee

spec:

chart:

name: backend

repository: http://k8s-helm-charts.flux.svc.cluster.local

releaseName: account-balance

values:

application:

replicaCount: 3

image:

repository: quay.io/example/account-balance

The key things to note above are the chart version and image tag aren’t specified as those are environment-specific pieces of configuration.

Now let’s look at the corresponding HelmRelease in the dev directory:

apiVersion: helm.fluxcd.io/v1

kind: HelmRelease

metadata:

annotations:

flux.weave.works/tag.application: 'glob:dev-*'

name: account-balance

namespace: eevee

spec:

values:

dependsOn:

schemaRegistry:

enabled: true

application:

image:

tag: dev-305a3cf56ef6d9505838bdf779e4173f0bad25jg

chart:

version: 2.0.34

The important part here is the flux.weave.works/tag.application annotation looking specifically for image tags starting dev- . This is how we promote new images to environments. We re-tag images with the environment prefix before the commit SHA.

Then the engineers have a script in Concourse which uses kubectl to wait until the release is successful, see below:

echo "Testing kubeconfig works against environment";

echo "Checking for replica set creation" attempt_counter=1;

max_attempts=60;



until [[ $(kubectl get replicaset -n "${NAMESPACE}" -l app_name="${SERVICE}",app_version="${APP_VERSION}" | wc -l) -gt 1 ]]; do if [ ${attempt_counter} -eq ${max_attempts} ]; then

echo "Max attempts reached";

exit 1;

fi

attempt_counter=$(($attempt_counter+1));

sleep 10;

done; echo 'Replica OK';



echo 'Waiting for deployment to rollout';

kubectl rollout status -n "$NAMESPACE" deployment/"$SERVICE" --watch=true --timeout=10m;

Summary

In summary, leveraging GitOps has allowed us to create a self-service platform for engineers that allows them to concentrate on delivering business value without the full need for Platform Team assistance. They focus on building container images and managing the testing of their microservices and Flux handles the deployments.

What is left to do?

This journey is not yet completed and there are a few changes we need to make:

Out of cluster helm registry

Currently, when making changes to our custom helm charts we have to wait for the new registry image to get deployed across our clusters, this takes time and ends up with HelmReleases failing due to the chart version being specified not being presented inside the registry containers image.

Therefore, we are going to move to store our helm registry inside a Google Cloud Platform bucket. This serves two benefits for us:

Unlimited storage at a relatively inexpensive cost

The immediate availability of new helm chart versions in all environments.

Finally, on a cluster rebuild it means that HelmReleases will not go into a status of pending or failed due to the helm registry image not yet being deployed.

Helm v3

We still need to upgrade to Helm 3 we will be looking to complete this in the coming week or two. Luckily the HelmRelease makes this trivial and I will make sure to write another blog post about how we managed it.

Shoutouts

Firstly, I would like to personally thank Stefan Prodan (https://twitter.com/stefanprodan) for his help in the past and his continued help, he is always willing to listen to my ideas and advise where possible, its a pleasure my man!

More widely, I would like to thank the Flux community for their continued assistance throughout this journey, I hope we at Mettle, and myself specifically, can continue to give back to the community about the lessons we have learned and continue to learn in the future.

Now its time to try it for yourself 😊

After lots of questions about how I would structure a GitOps repository, I actually created an example which can be found at https://github.com/swade1987/gitops-with-kustomize.

Additionally, I would highly recommend checking out https://github.com/fluxcd/multi-tenancy.