Ansible, Puppet, Chef: No thanks

May 11, 2017

I’m going to catch a lot of flak for this post, but I think it’s an important statement.

In the most common cases, configuration management tools like Ansible, Puppet, and Chef encourage bad practices and should be avoided instead of celebrated.

The only exception to this is managing an on-premise server fleet that can’t leverage software like Kubernetes, CoreOS, Apache Mesos or emulate an Infrastructure-as-a-Service platform like AWS or GCP.

A complicated and unnecessary abstraction

Most configuration management tools attempt to create another layer of abstraction on top of popular cloud providers. Provisioning a load balancer on AWS requires a handful of parameters that may be dependent and have dependencies on other AWS services. In order to provision the load balancer with something like Ansible, you need to use the ec2_elb_lb module, match up the required parameters, add additional parameters that Ansible supports but AWS doesn’t use directly, like purge_listeners , and then build a playbook.

Then, my playbook has to have a state of present to define that I want the ELB to exist as opposed to absent which would terminate the ELB.

Not only does the present / absent terminology just seem strange, I can’t conceive of any cases where this makes sense. Instead, it seems like this is just a general abstraction that all Ansible modules are forced into.

Continuing with the previous ELB example, last year AWS released their Application Load Balancers as a new Layer 7 load balancer capable of more sophisticated functionality. As of this writing there’s still no Ansible module for Application Load Balancers and a GitHub issue is still open.

Am I picking on Ansible a bit here? Sure – let’s see if Chef supports provisioning Application Load Balancers.

Nope.

The solution? Use Ansible / Chef to interact with CloudFormation which has all of the most recent resources available. So here, we’re writing a 20 line playbook to replace a single API call to aws cloudformation create-stack ... . That’s just silly.

A common language

I understand the draw to these configuration management tools. Having a single language to learn that supports automating all sorts of things like creating resources on AWS, GCP, Digital Ocean, Vultr, and Linode, creating files, running scripts on lots of servers in parallel, etc. seems like a fantastic idea.

Automation is a great thing. It helps to prevent bugs, it removes human error from processes, and overall, automating as much as possible is generally the right thing to do. But, poorly implemented automation can be bad.

Just as if I went to an extreme and created a makeAWSPlatform() function that provided a significant level of indirection over all of AWS for my particular use-case, it’s a very poor automation.

Why be cloud agnostic anyway?

There seems to be an argument perpetrated through the usage of these tools that celebrates the idea of being cloud agnostic and encouraging behavior that enables you to switch from AWS to GCP or Azure very easily. I’ve also come across orgs in the past that required that to be part of their strategy.

But, when asked why, the answer wasn’t clear. Going to painstaking lengths to be cloud-agnostic usually doesn’t make that much sense anyway.

Use different cloud providers for their strengths, and if you absolutely have to switch a service from one provider to another, eat the cost of doing it at the time instead of expending the extra efforts over time “just incase”.

Shared knowledge

Configuration management tools come with the false promise of Shared Knowledge. Instead of having a single DevOps engineer building custom deployment automation using Python or bash scripts that only he or she knows, having a defined set of “cookbooks” or “playbooks” give more of a sense that anybody on the development or operations team who knows Ansible / Chef can run things.

In this case, what the configuration management tool has is a defined structure. That defined structure has some value in organization and makes things easier to understand when onboarding a new team member.

However, whatever value this organization provides can be replaced with well-documented shell scripts, Makefiles, and platform-specific templates like AWS CloudFormation all while providing more flexibility and not being forced to come up with workarounds for a tool’s shortcomings (i.e. missing ALB support)

Immutability

Work on large systems long enough and immutability becomes a very valuable concept. The typical Ansible / Puppet / Chef strategy revolves around making changes to running systems instead of terminating existing systems and re-launching new systems with the desired state.

Mutating running systems poses several problems:

How do you handle commands that only succeed on some servers?

Installing application dependencies when each server starts up increases the chances of servers running different versions of packages

Many automation tools propose the use of dedicated (and premium) management servers

It’s hard to test the final state of a mutable system

Each server runs the risk of being slightly different than the others

As a previous user of OpsWorks, I can personally attest to the painstaking process of testing Chef deployments. While this may have changed in recent years, at the time it wasn’t rare to experience a 15-minute process for provisioning and deploying.

A better way

To be clear, I don’t think the idea of configuration management itself is bad. I’ve designed many tools to aid in the automation of my own workflows and it’s largely justified. But the popular configuration management tools come at the price of encouraging practices that have long been deemed unsatisfactory. They attempt to solve problems that don’t really exist anymore with most modern platforms and all too often I see users trying to force them into use, blindly adding additional layers of indirection by following the herd.

That said, this post would be counter-productive without describing what I think is a better way to do things.

Are you stuck managing a legacy fleet of EC2 instances that can’t otherwise be controlled by Kubernetes, CoreOS, Apache Mesos, or even ECS? Then sure, Ansible will likely be your best friend for a while.

Are you building the next platform that will rise above Kubernetes and CoreOS? Then, again, by all means use an automation tool to keep your underlying cluster of servers under control.

But – if you don’t fit into either of those categories…

1) Use your providers’ declarative resource templating

For AWS, use CloudFormation. Use Resource Manager for Google Cloud Platform. Yes, that means not adding another layer that pretends to abstract AWS and GCP into a common interface because in reality, they’re not the same platforms and don’t have identically functioning resources and that’s OK.

2) Use Docker to build images that can be deployed to your servers

Using a tool like Docker replaces the need for tools like Ansible and Chef.

Building Docker images means that all of the servers in your deployment are guaranteed to be running the same packages and versions. Because the packages were downloaded and built at the time the Docker image was built instead of at runtime, there’s no way for changes to slip in from one apt-get install / yum install / rpm -Uvh / ./configure && make && make install / pip install to the next.

3) Write well-documented Makefiles or shell scripts to automate builds and Docker commands

While not within their typical use-case, Makefiles can be used to write well organized and composable build tools.

# Project prefix to make cleanup easier prefix = myproject # Path to dockerfiles dockerfiles = $(curdir)/etc/dockerfiles curdir = $(shell pwd) # Build the nginx image build-nginx: docker build -t $(prefix)-nginx -f $dockerfiles/nginx/Dockerfile . # Build the MySQL image build-mysql: docker build -t $(prefix)-mysql -f $dockerfiles/mysql/Dockerfile . # Build all images build-all: build-nginx build-mysql

By using a CI/CD platform that natively supports Docker (don’t be fooled, there are only a few) Makefiles like this run just as well in CircleCI, Shippable, AWS CodeBuild, Jenkins.

4) Use ChatOps for non-CD deployments and one-off commands

Make common workflows commands in a ChatOps service like Hubot which can be easily monitored and logged. When a new build completes in your CI platform it should be automatically deployed to your staging environment and a message should be delivered to a Slack channel with the URL and git commit.

If you’re not doing CD, automate the release process as much as you can with commands like:

@hubot release [SHA1] v1.4.8 – Tag the commit as v1.4.8 and push upstream

– Tag the commit as v1.4.8 and push upstream @hubot list-releases – List available release tags

– List available release tags @hubot deploy [version] – Start a deployment of a particular version

Moving these commands from a developer’s machine to Slack makes for more accountability and a paper trail.

5) Avoid patterns and practices that mutate your infrastructure in-place

Instead of sending a command to your cluster to update an installed package, rebuild the Docker images containing that package and redeploy.

Instead of manually scaling server groups, leverage autoscaling and declarative resource templates to adjust scaling properties like minimum / maximum and thresholds.

The more containerized your infrastructure becomes, the easier it is to upgrade.

As an org, don’t exclusively look for developers with Ansible, Chef, and Puppet experience

I see lots of orgs with requirements stating they’ll only consider new candidates who are experienced with one of the many automation tools. It’s all too common to see the usage of these tools being heralded as a sign of competence and experience which doesn’t always hold true.

Just because a developer or engineer doesn’t use one of the popular automation tools doesn’t make them any less valuable, and in fact, the exact opposite may hold true.