Every Kubernetes cluster needs its tooling. It helps to keep everything up and running. Taking backups, monitoring and keeping everything clean is a must. Security increases when you automate protecting your cluster from inter-namespace communication and when you control what will be deployed on your cluster by enforcing admission control. I will focus on a few topics; cleaning/robustness/redundancy/security and oversight.

Every person needs to have proper tooling to increase efficiency. (Photo by Fleur on Unsplash)

A few of the tools which I’m going to discuss I made myself. I will refer you to my Github where you can explore all of the code involved in these tools. This list is obviously not exhaustive but will get you started. I’m omitting monitoring and alerting tools like Kibana, Prometheus, Alertmanager, Grafana, etc. I’m more focused on small tools that can be executed as jobs and recurring cronjobs.

Cleaning

Everyone likes to live in a clean house, so why wouldn’t you keep your Kubernetes cluster clean by running some cleaning cronjobs.

Elasticsearch Curator

If you have an efk-stack, you will need curator. Curator removes old indices and thus keeps your storage space free for the most recent logs. You can for example set it to delete indices older than a week, so you always have the most recent logging visible in Kibana.

Docker Registry garbage collect (Github)

One of the things most people with Docker registries struggle with is its maintenance. It keeps on growing and it’s not that straight forward to remove old images. I made two cronjobs which will mark images for deletion and will afterward garbage collect them. It uses Deckschrubber to annotate which images are ready for deletion.

Cleanup completed pods (Github)

Once in a while, you would like to have a clean namespace with just the currently running pods. If there are a lot of cronjobs in that namespace, it will soon get cluttered with completed ones. It’s always possible to remove those with following kubectl command, but you would want to automate it.

kubectl delete pods --field-selector=status.phase=Succeeded --all-namespaces

Robustness

Everything can happen and will happen eventually, so you’d better be sure that your cluster can handle some failures and some disruptions.

Chaoskube

Every once in a while you will have to cope with failing pods. Chaoskube is derived from Netflix’ Chaosmonkey and specifically designed for running in Kubernetes clusters. It will randomly kill a pod in your cluster within a specified interval. It will not say which pod. If you set this aggressively at a kill interval of 10 minutes and don’t notice any alerts of downtime of your applications, you can rest assured that your cluster will handle some failures. For extensive failure testing, I would advice Netflix’ SymianArmy.

Redundancy

Data loss can happen to anyone, having some backups and certainty when everything goes south will make sure you can have a good sleep at night.

Postgres snapshots (Github)

Postgres backups are much needed if migrations go sideways. If you have a Postgres cluster running in your cluster, let’s say Stolon, you would want to have automated backups. They saved me from disaster multiple times.

Bitbucket backups (Github)

If you have all your projects on, for example Bitbucket, you’ll want these in your own control. This in case something happens and Bitbucket experiences unrecoverable data loss (although unlikely). An automated script will sync to AWS S3 and if the unthinkable happens, you’ll have a fallback.

Zookeeper Burry

Burry is a tool which will backup necessary systems such as Zookeeper which can store state of Kafka. It’s unthinkable to lose the Zookeeper configuration of your Kafka cluster. This will help with recovering configuration of your Kafka cluster and ensures that you will never experience configuration loss.

Security

Limiting access to certain parts and making sure you can enforce admission constraints will help you in guarding the integrity of your cluster.

Namespaces policies (Github)

Multi-tenancy if desired in a shared Kubernetes cluster. By default, each namespace can get to resources in another namespace. This will be blocked by using network policies. I’ve made a Kubernetes event listener on namespace creation and this automatically adds a network policy that will block incoming traffic. A useful insight in this you can find here. Check out my Github repository for all necessary deployment files.

Let’s Encrypt certbot (Github)

Using https on your applications is a must nowadays. This can happen automatically by specifying annotations on your Ingress Kubernetes objects. It will automatically create AWS Route53 entries en request a certificate from Let’s Encrypt and store these as a Kubernetes Secret which can then be used by an Ingress controller such as Traefik.

AWS audit

Clusters running on AWS mostly have a lot of configuration which you’re not keeping track of. Running an AWS audit will expose those issues and will ensure you are at least aware of each possible flaw and possible intrusion. I used Prowler for this and made a custom Dockerfile with aha. This will transform the shell output to an html file which you can view in your browser.

Gatekeeper

Admission control on your cluster is much desired. Feel free to check out my other Medium post which will go more in-depth about Gatekeeper and how you can enforce constraints. With Gatekeeper you can control admission on your cluster and for example block deployments that did not specify resource requests and limits or which did not specify certain tags.

Oversight

Sometimes you just need some oversight, nothing more, nothing less. Keeping track of ongoing things in your Kubernetes cluster.

Komiser

Visualizing your AWS configuration, EC2 instances, billing, AWS Lamda calls etc, will help to keep track of your expenses and the physical locations of your ec2 instances. Komiser can be used with AWS or Google Cloud and does exactly this, providing a UI on your cloud infrastructure.

Kube resource report

Keeping track of resource usage is extremely important. This can reduce your running costs a lot. Kube resource report will produce a detailed report about each namespace and each pod with their current cpu and memory usages. By fine-tuning your requests you can save a lot on your monthly bill. Feel free to check out my Medium post about things I learned by running a Kubernetes cluster and the importance of setting requests and limits on my resources.

Docker registry UI

Getting the last pushed image out of your repository is not that easy if you’re not familiar with the Docker Registry API. A dedicated UI will help to visualize your Docker images, namespaces and tags. I used the Docker Registry UI of Quiq which can be found on this Github page and is written in Go. It is the most fluent registry UI I tried and used so far.