Last Updated on August 4, 2019

Reading Time: 9 minutes

In this blog we’ll compare a bunch of methods that can be used to manage installing Helm charts onto your Kubernetes cluster(s).

We’ll look at Helmfile, Helmsman, Pulumi, Weave Flux, Faros, Eunomia, Helm Controller, Chart Operator, Keel and the Terraform Helm Provider.

Context and caveats: It’s hard to write these posts concisely unless there are some limits set. We’re making an assumption that you want to use Helm charts for packaging and installing your apps and control plane services. The discussion for if Helm is worth using and Helm vs Kustomize vs Kapitan vs other projects will appear in another blog.

This blog contains some theory explaining some of the shortcomings of Helm which helped to inform the comparison and recommendation below. If you’d prefer to skip the theory you can click here.

The issue of state

Conceptually we can group each deployment mechanism into a distinct bucket depending upon where state lives and what wins when there is a conflict.

I’ll group the solutions I’ve tried loosely into these categories:

Pipeline

A binary or docker container that runs inside a pipeline that uses standard Helm commands. Or, something more fancy like Helmfile or Helmsman or Pulumi to manage charts and diffs between current and desired state. A hybrid option is using Helm to install your charts with annotations that Keel watches to handle subsequent updates based on polling a registry or webhook trigger.

GitOps

Operators like Weave Flux, Faros and Eunomia that watch a Git repo for changes and handle the logic of how to apply them to the cluster. Some GitOps operators use CRD’s for some config, but essentially this category looks at a Git repo for its source of truth.

CRD: Operators like Helm Controller or Chart Operator that watch CRD’s and handle the logic of how to apply them to the cluster. A Custom Resource Definition (CRD) is a resource that lives inside the Kubernetes cluster that extends the Kubernetes API. A total mess: This covers things like the Terraform Helm Provider where state is a massive mess that lives between Terraform remote state, what Helm says is installed on the cluster (which may not match cluster state) and what you have defined in code.

What state wins?

The source of truth should always be code that is deployed through a pipeline. However, the reality is that the source of truth for Helm chart deployments on a running cluster varies between deployment mechanism.

When you have state in multiple places you need to diff and somehow decide what wins and how to reconcile. State already exists on the cluster in Etcd, plus Tiller (by default). For every additional place that state is stored the complexity and subtle edge cases rise by a magnitude.

Kubernetes clusters are reasonably prone to manual tinkering. It isn’t reasonable to assume that resources deployed and managed by whatever chart deployment mechanism you choose will never be modified outside of its purview.

This causes problems ranging from mildly annoying pipeline errors all the way up to total cluster outages when your deployment mechanism decides it needs to remove everything and then fails to reinstall it.

2-way vs 3-way merge patching

Ever wondered why kubectl is so friendly with regards to applying new config on top of old even in the face of a bit of tinkering in-between? It’s because it uses a 3 way merge patch strategy. When you change something it looks at the current state, the last version and the new version. Then it works out how to merge in such a way that happiness is ensured. It does this by setting annotations on resources.

Helm (current version 2) decided that they didn’t want to update customer created resources which means only a 2-way merge is possible. This will change in Helm 3. Until then Helm upgrades will never be as friendly as using kubectl to upgrade manifests directly.

In the meantime some of the deployment mechanisms have implemented more advanced diff and merge logic to cover the gaps.

My personal belief is that the lack of a 3-way merge in Helm is the main reason why people advocate for using it as a templating tool piped to kubectl apply.

State problems in Helm 2.x

By default when you run Helm it connects to a server component called Tiller. Tiller then manages how charts are installed and holds some state in order to allow functionality like rollbacks.

Local tiller (launching an ephemeral tiller on localhost on whatever machine is running Helm) is a reasonable suggestion if you always run Helm from a pipeline or some wrapper script. It’s incredibly annoying if you want to support local development against remote clusters.

Using the helm-tiller plugin is a good option for going ’tillerless’. Although, not all projects listed here support using Helm plugins.

Some people recommend simply using Helm as a templating mechanism until Helm 3.0 is out which removes Tiller. This breaks package management features like Hooks which are needed for many open source charts so it’s not really a feasible suggestion if you’d like to benefit from community packages.

Comparison

As with all of the Google sheets on Kubedex you can contribute by editing them directly. You can find the sheet for the table below here.

Recommendation

TLDR; probably simplest and easiest to use helm upgrade --install with the various options run from a pipeline.

Basic Helm

In all honesty it isn’t sane to use Helm for managing control plane services on a cluster until Helm 3.0 has been released with better upgrade logic.

Luckily most of us are helping teams of developers deploy CRUD apps that have more serious problems than we could ever hope to compete with. So depending on your appetite for adventure I’d pick from the escalating series of options below.

The simplest way to manage Helm chart installations on a cluster currently is to run helm upgrade. To cut down on some edge cases I’d recommend looking into the following options.

-- atomic

-- cleanup - on - fail

-- install

-- force

I’d also go ’tillerless’ by using the helm-tiller plugin. You can achieve a semi idempotent solution with those depending on your charts.

Helmsman vs Helmfile vs Pulumi

Beyond using Helm with various switches the next most reasonable option is probably to look into Helmsman vs Helmfile vs Pulumi and decide which fits your use case better. Again, there are various options you can toggle which abstract away some of the lower level Helm upgrade logic problems.

Here’s a quick snapshot of features that could help you decide.

Helmsman has some great features that let you protect namespaces and promote charts between namespaces. For example you can tell Helmsman to promote your charts from the dev namespace into staging. You can also do release ordering which may be important for people who have tightly coupled releases.

Helmfile is a better option if you’d like templating of config and values passed into charts. It also has support for running processes as part of the deployment lifecycle so you can customise what happens before and after a deployment. The Helm-X support allows for inline patching of upstream charts without maintaining a custom fork. It also lets you use Kustomize templates as if they are Helm charts.

Pulumi takes a rather sensible approach and avoids most of the Helm problems by not using Tiller at all. It does the API equivalent of piping helm template into kubectl apply. As mentioned above I don’t think this is a good idea from a usability perspective. You’re going to roll the dice on which open source charts work.

Out of the 3 options I’d personally go with Helmfile. From the testing I’ve done Helmfile handles chart upgrades gracefully and seems more idempotent than Helmsman.

Weave Flux vs Faros vs Eunomia vs Helm Controller vs Chart Operator

The next level of adventure comes down to a choice between Operators. The most popular of these is Weave Flux.

From a purely aesthetic perspective I much prefer the logic to reside in a controller on the cluster. This means your config management or deployment pipeline simply needs to update a CRD or post a Git commit. All of the magic then happens inside the cluster which is a central place to troubleshoot should things go awry.

Weave Flux

Weave Flux has many user friendly features enabled by default such as upgrades that mimic helm upgrade --install --atomic --force. It also handles rollbacks in the event of failure.

If you do get into a broken situation you can simply delete the resource manually and Weave Flux will notice this on its next sync and do the equivalent of helm delete --purge which quickly gets you back into a working state.

There are a few negatives to Weave Flux. Firstly, it’s not just one controller, it is two. There is a controller for Weave Flux and then a second Helm Controller that manages Helm Charts. Different options are specified on each and there are two places to look for logs whey trying to determine what’s happening.

The other negative with Weave Flux is the lack of ’tillerless’ support. It is not desirable to run a persistent tiller on the cluster for security reasons.

Regardless, a lot of people use Weave Flux and it shows in terms of usability. Out of the box it will probably just work for you and if you do run into issues the manual fix ups are quick and straight forward.

Faros and Eunomia

These both treat Helm as a templating system rather than a package manager. As mentioned multiple times I don’t believe this is a good idea as it breaks compatibility with a lot of charts.

Helm-Controller

I’ve spent quite a long time tinkering around with the Rancher Helm Controller and fell in love with its simplicity. Unfortunately, chart upgrades in the Rancher Helm Controller are currently broken so I can’t recommend it beyond casual playing.

It only seems to be used inside K3s and the usual lifecycle there is to place a CRD into a directory and restart the control plane. Hence why upgrades have never been tested.

Myself and a few colleagues tried to reach out a few times on Github issues and in the Rancher Slack channel for some help with contributing back fixes but got no response. The controller is using a shared Rancher library that we felt uncomfortable changing.

We’ve therefore forked the Rancher helm-controller and are continuing to work on it. We’ve swapped out the Rancher library with the Operator SDK. Why am I bothering when all the other options exist? Well, nothing seems to tick all of the boxes at once. I’d personally like:

A CRD per chart with all config inside

A simple controller that watches the CRD’s for changes

On change a Kubernetes job is run that executes helm upgrade with all the right flags

All Kubernetes jobs use the helm-tiller (tillerless) and helm-diff plugins to verify changes

On CRD delete a Kubernetes job is run that executes helm delete –purge

A configurable image that the Kubernetes job runs so I can swap it out with Helm 3.0 when GA

So a sort of hybrid between the tried and tested helm upgrade –install workflow wrapped by a very simple controller that executes it tillerless in jobs that are easy to debug with kubectl logs.

I’ll do a blog when it’s good enough to start using.

Terraform Helm Provider

The only option I will say to definitely avoid is the Terraform Helm Provider. This takes the deficiencies in Helm and raises them to a whole new level where successfully debugging an issue should result in a Nobel prize nomination.

Before everyone starts saying I’m unfair on Hashicorp I should probably explain that I don’t think it’s the fault of the provider that it’s so bad to use. Terraform will do what Terraform does. Which is to look at its statefile, and for each chart it finds in the statefile it will query Helm to see what the current state is, then refresh the statefile with changes if found, then do the diff of desired vs current state in statefile.

The real problem is Helm can’t be trusted. Manual changes to resources on the cluster outside of Helm makes the information that Terraform receives incorrect. In addition to Helm often being a liar, the diff it does between desired state and statefile is pretty arbitrary, and I’ve seen changes triggered for weird reasons. Finally, for Helm to be less infuriating it needs a bunch of options set that aren’t exposed by the provider.

Fixing a broken deployment then becomes the double-hard task of fixing up Helm on the cluster in addition to the Terraform statefile. In many cases just deleting the state entirely. It’s really not worth using currently but perhaps Helm 3.0 will solve all of this in future if the provider gets an update to use it.

A happy ending

I’d like to end this on a positive note and say that Helm 3.0 isn’t too far off and the team is acutely aware of all of the shortcomings expressed in this blog. The two items that are desperately needed are the 3 way merge patch and an official CRD controller. The last update in the Helm 3.0 alpha 2 release included info about the accelerated timelines for 3.0 GA.

“As maintainers, our focus has shifted to ensure that the Helm community can implement the features proposed for Helm 3 after the release is out without breaking backwards compatibility. We’re refactoring the internal architecture now to accommodate these enhancements to Helm in future releases. Core re-architectures like the removal of Tiller, improvements to helm upgrade’s update logic, changes to helm test, experimental OCI integration and the Chart.yaml’s apiVersion bump are all part of a minimum set of enhancements required to ship Helm 3, so we’re focusing on getting those out the door before we release Helm 3.”

So it looks like the upgrade issues will soon be a thing of the past. Hopefully all of the projects mentioned here will switch over and reap the benefits.