A few chart tricks I wish I knew when I get started with K8S. They completely changed our infrastructure automation philosophy.

From Basic to Advanced

In most application chart implementations, besides the deployment yaml, we may include an Ingress, ConfigMap or Secret. Most of us will stop there and move on to other challenges. However, Helm Chart is more powerful than we thought since it opens the door to the whole self-service ecosystem powered by Kubernetes.

Background

There is a constant battle between the product development team(Dev) and the DevOps team. Dev team wants to deliver features as fast as possible without waiting for infrastructure changes; DevOps team wants to guarantee the best possible infrastructure solution are chosen proactively.

I really enjoy this game since this process encourages healthy competition and force both Dev and DevOps to Up their Games.

Goal

“Dev should never wait for DevOps”. Instead, DevOps should foresee and solve the upcoming microservice and infrastructure concerns(as below) before Dev team needs them.

Microservice Concerns. Redhat

Before Kubernetes, there isn’t an easy or elegant solution to achieve this.

Luckily with Kubernetes, we have the foundation and the necessary tools for this game.

By adopting Istio, we can take our DevOps experience to the next level. These Istio features are built from the ground up to solve infrastructure and microservice concerns.

Expand and customize our CRD collection with Operators/Controllers, and use Helm Charts to unite them all, we can unleash a lot more power.

If you are interested in building a Super Saiyan chart and let it do all the work for you in a self-service manner, please continue reading to find out how.

Dragon Ball Super Saiyan Goku forms

Evolving Landscape

Looking back for the past 5+ years, the landscape has evolved in several phases:

Traditional Application servers Dockerized microservices. Traditionally, using Spring-boot + Netflix-OSS with limited microservices capabilities. (So Yesterday! 😜) Adopting Kubernetes/Mesos/Nomad/Swarm for container orchestration Kubernetes with Service Mesh: Istio, LinkerD, etc (Yes, K8S is the winner of the container orchestration war!)

The picture above is an example of my own journey. Four years ago, Mesos was more mature and provided more features/frameworks than Kubernetes, such as Marathon scheduler and Kafka API framework. A lot of teams adopted this solution, for example, in 2015, Apple rebuilds Siri backend with Apache Mesos.

However, since 2–3 years ago, Kubernetes just beat all the other competitors and become the obvious Winner in this container-orchestration war.

Luckily, all of our applications and infrastructure components are container-based inside Mesos, it is very easy to migrate them to Kubernetes, especially with the packaging system: Helm Charts.

To be honest, in the beginning, to me, Helm chart is just another new packaging system. However, the more I use it, the more power it reveals to me, which completely changed my philosophy of automation.

A Super Saiyan Chart

Nowadays, everyone is adopting “Pipeline as Code”. Most CI/CD tools support this feature including Tekton(Knative pipelines), ConcourseCI, Jenkins, etc. This is often used together with “Infrastructure as Code”, meaning inside your pipeline, some tasks may invoke Terraform or Ansible modules for infrastructure-related work.

We tend to use these infrastructure pipelines for various kinds of tasks, such as managing Queue, Database, File Storage, Object storage, Secrete storage, API Routing, Message routing, etc. Among them, here are some shared concerns for both application and infrastructure:

Database automation : DDL/DML/DCL managed by flyway, liquibase, etc

: DDL/DML/DCL managed by flyway, liquibase, etc API Routing : add/update rest API path, e.g. /api/blogs

: add/update rest API path, e.g. Traffic Management : Canary release, Dark Launch, Zero-downtime deployment, Blue-green deployment

: Canary release, Dark Launch, Zero-downtime deployment, Blue-green deployment Queue : Message queue creation for a new app (AWS SQS, GCP PubSub)

: Message queue creation for a new app (AWS SQS, GCP PubSub) Message routing : Message Filtering with subscription filter(AWS SNS)

: Message Filtering with subscription filter(AWS SNS) Object/File Storage : S3 or EFS/NFS folder creation.

: S3 or EFS/NFS folder creation. Authentication and Authorization: secure and restrict service-to-service traffic.

Formerly, for microservices inside the same environment, we handle these tasks inside an infrastructure pipeline. The problem is these pipelines grow quickly over time, especially when you have hundreds of microservices(or thousands of them like Uber). Eventually, they become a monolith and take a long time to run for each requirement in each environment.

In this microservices world, things evolve a lot faster, changes are needed even more frequently, for example:

New microservices come and join the cluster all the time.

Irrelevant microservices get deprecated when not needed

Cloud-resources requirements change on the fly. (migrate from S3 to EFS)

Security requirements change dynamically and proactively (authorization for new API)

If we have to keep updating pipeline for each new requirement, testing it, promoting it across all environments. DevOps can quickly become a bottleneck.

Luckily, with Helm charts, we can combine several powerful solutions to solve these problems.

Multiple Solutions

CRD based solution

A custom resource is an extension of the Kubernetes API that is not necessarily available in a default Kubernetes installation. It represents a customization of a particular Kubernetes installation. However, many core Kubernetes functions are now built using custom resources, making Kubernetes more modular.

In most cases, we leverage built-in K8S CRDs such as Ingress, ConfigMap or Secret. To make it more powerful and intelligent, we also use optional CRDs which are installed together with powerful tools like Istio, Gloo, Knative Serving, Knaitve Eventing, etc. Here are some common CRD usage examples:

Init-Container based solution

A Pod can have multiple containers running apps within it, but it can also have one or more init containers, which are run before the app containers are started. During the startup of a Pod, each init container starts in order, after the network and volumes are initialized. Each container must exit successfully before the next container starts.

Tasks as below can easily fit into an Init-container. Besides, we can easily manage the sequential order of each step when needed. Examples:

Database automation: DDL and DML changes (flyway/liquibase)

App-specific Pod privileges(AWS IAM+Kube2IAM for permissions).

App-specific Storage creation(AWS S3)

App-specific Message queue subscription(AWS SNS/SQS)

Kubernetes-job based solution

Comparing to Init-containers, Kubernetes job has its own advantages. Although they both run to completion. K8S job can guarantee to only run once. When using init-containers, if your replica set has multiple pods, each pod will run its init-container and we have to be careful and make sure they are always idempotent.

Besides, Chart-hooks can be used on K8S jobs to control the execution cycle and weight of the job execution, as below in code section.

Execution cycle is straightforward by their name, such as pre-install, post-install and pre-delete.

Weight is helpful to build a deterministic executing order. For each cycle, Tiller sorts hooks by weight and by name(for those hooks with the same weight) in ascending order. For example, -5, 0, 5 are executed in this order.

apiVersion: batch/v1

kind: Job

metadata:

name: my-job-preinstall-first-to-run

annotations:

"helm.sh/hook": pre-install

"helm.sh/hook-weight": "-5"

Tip: Make sure to clean the jobs before next Helm deploy since duplicate jobs are not allowed. Chart hooks can be used to achieve this, such as pre-install hook.

Operator and Custom-CRD based solution

To take it to the next level, we define our own CRDs to empower developer’s application-chart to manage its own cloud resources, such as SNS, SQS Queue which are mentioned in the shared concerns above. Comparing to both K8S Job and Init-Container, a CRD inside the app chart is cleaner, more consistent and more friendly to Dev teams.

There are several popular frameworks that are quite handy. They are all opinionated in their design, so pick your favorite.

Operatorhub is an awesome place for K8S community to share operators, make sure to check it out before you write your own. Possibly someone already created one.

A Chart of Two Gateways

Ingram Pinn’s version

Here is a recent use case, I was migrating our API Gateway(Ingress controller)from Nginx-Ingress controller to Gloo API Gateway, which is Envoy powered. It gives us a lot of useful features including Canary release, Dark launch, light-weight direct Knative support(if Istio is not installed), CloudEvent support(Invoke Lambda from Gloo), etc.

If you are also interested in learning more about Ingress controllers, I have a blog post here with more details.

However, I want to do this in a smoother and safer way, meaning I only want to deploy Gloo API Gateway in some of the lower environment and test it out thoroughly before promoting it to higher environments.

Instead of adding messy conditional logic in the pipeline or application chart, we keep both our existing Ingress file and the new Gloo VirtualService file , which are both used to define API Routing rules. The trick here is only one of them will get activated. Whichever gets activated is depending on which Ingress controller is running. Only when we finish all the chaos testing and are ready to promote Gloo gateway, we will do this at our own pace, without affecting any other Dev teams.

Tip: Make sure all the Gloo CRDs are deployed to avoid warnings

Show me the code

Built-in K8S Ingress CRD for Nginx Ingress controller

apiVersion: extensions/v1beta1

kind: Ingress

metadata:

name: blogs

spec:

rules:

- host: "example.com"

http:

paths:

- path: /api/blogs

backend:

serviceName: blogs

servicePort: 8080

Gloo VirtualService CRD for Gloo API Gateway

apiVersion: gateway.solo.io/v1

kind: VirtualService

metadata:

name: blogs

namespace: dev

spec:

displayName: blogs

virtualHost:

domains:

- "example.com"

name: "dev.blogs"

routes:

- matcher:

prefix: "/api/blogs"

routeAction:

single:

upstream:

name: "dev-blogs-8080"

namespace: dev

routePlugins:

timeout: 100

Summary

Well, That’s it. We went through several usage patterns of Helm charts which empower applications to be more autonomous and intelligent. Some may work for you, some may not fit in your use cases. However, I hope this is informational and could help you to come up with other creative ideas for your own needs.