Opinions on Configuration Management

I want to talk a bit about configuration management. I’ve got some opinions. Some people think configuration management is dead, some think it has a place, and honestly navigating the minefield of which tool to use and why is very difficult without getting egos involved. Many people have built their career on configuration management, so it’s really tough to tell someone that something is wrong with their tool. So leave your ego at the door, grab a beer, and let me share some thoughts on configuration management with you.

So I want to talk about Puppet, Chef, and Ansible. I’m not an authority on Salt, so I won’t speak to it. And most important, I don’t want to talk about these tools like I’m lining up some columns on an Excel spreadsheet — I want to talk to the philosophy behind each tool and why they DO, and use that as a method of exposing weaknesses and strengths in ALL configuration management tools. So if your company is having a bake off between multiple tools comparing feature sets — this blog post ain’t that.

The premise of configuration management is this: Servers are important and valuable, and servers become valuable by being configured correctly.

This is just wrong. This argument can be defeated with a simple example.

Let’s say you play Fortnite. (If you don’t know what Fortnite is, look here) You log in, join your friends, and hop into a game. While playing the game you say to your friends. “Holy crap, _______ is really fun!”

A. Nginx SSL certificate pinning

B. Riding rockets into my opponent’s face

C. Docker-compose up

D. Unity model scaling

E. sysctl kernel.sysrq=1

Obviously, the answer is B. Riding rockets into my opponent’s face. You don’t log in to play Nginx! That would be really silly! But what if you can’t log in? You hit Error 37? You can’t even see your friends to queue up while you’re bored!

“Crap, the servers are down! This game sucks.”

The conclusion is simple: Your servers themselves, and their configuration, are a liability. The important stuff is the applications you run on those servers. Your configuration management tool? It’s a liability reduction machine.

Puppet

While it is still growing as a tool, Puppet is fundamentally broken because it compiles modules on the Puppet Master server, which means that your application code and infrastructure code are entirely separate. Forever. You can’t fix this unless Puppet stops compiling code on the Puppet Master Server, full stop. You can’t package and deliver application code and infrastructure code together because of this problem. This causes an impedance mismatch between releasing new versions of your application (the thing that’s important), and releasing new versions of your liability reduction machine, erm, I mean configuration management. Every configuration management tool has this problem, but other tools at least have a way to hack it with some clever design patterns. But Puppet is still interesting to look at where we’ve come from and what the philosophy is behind the tool. Many people still use Puppet today, and there’s nothing wrong with that.

Pros

Puppet has a client on your nodes

HTTPS so you can avoid man in the middle attacks

New CEO since 2016 has moved the company in a positive direction, lots of interesting new features and growth in the Puppet tooling since then

New Puppet Master server improves the Puppet scaling story

Enterprise support is good. Good premium visualization tools.

Converges resources, so you can maintain state across nodes

Cons

Puppet server is a central point of failure

Puppet server compiles modules, so it’s doubly a central point of failure

Server is authoritative. Command and control model

Modules tend to become mono repositories, and it’s tough to untangle

Lots of complexity

No integration testing or auditing framework

Pinning version of modules and shipping them with applications is troublesome

You will have to learn Ruby

In practice, Puppet is very much a tool for a system administrator to use to massage their servers. It’s like taking your server to a weird spa and military bootcamp for infrastructure. “Yes, relax in the salt bath over there, now give me 100 pushups!” It is a command and control way of managing your servers.

The main problem with command and control interfaces is that they are unreliable. If you ask your Nginx server (with some code) “Excuse me, I need to you install this SSL certificate”, the Nginx server may very well reply, “Well no, I can’t do that because I don’t know what you’re talking about.” Then you’re left scratching your head at the message

undefined method’[]’ for nil:NilClass

wondering what the hell went wrong. All while your production server is on fire.

Or let’s take another, more complex problem with command and control interfaces. I’m making up a name for this because I’ve very rarely heard people talk about it, so you’ll have to excuse the new jargon.

Path Dependency

Path dependency is the idea that your upgrade path accidentally creates a dependency on running a particular version of your software before upgrading to the next version.

Let’s take a look at three environments, all three using VMs (mutable infrastructure) and the versions of configuration management they upgrade with:

Dev

1.0.0 -> 1.0.1 -> 1.0.2 -> 1.1.0 -> 1.2.0 -> 2.0.0 Stage

1.0.2 -> 1.1.0 -> 1.2.0 -> 2.0.0 Prod

1.0.2 -> 1.2.0 -> 2.0.0

But wait! I thought configuration management was idempotent? Shouldn’t each version run the same, no matter what?

Unfortunately, configuration management isn’t always idempotent. If it was, it would run really slow, as each resource was checked against every possible condition every run. This kind of checking might be possible with some basic sysctl commands, but good luck configuring every integration in your webapp as some kind of monolithic state machine, or integrating secrets without some kind of uncertainty.

In practice, even if the code in Dev and Stage ran without any problems, there’s STILL a chance that Prod can break on upgrade from version 1.0.2 to version 1.2.0. This can happen because your prod server can possibly store a different state than your stage or dev server, because it was exposed to a different upgrade path and different version of your software. It’s insane, not as rare as you might think, and incredibly defeating and frustrating — especially if you put a ton of time and effort into unit and integration tests for your configuration management code.

Chef

Chef is often compared with Ansible because it’s the clear winner of client based configuration management, while Ansible is the clear winner of clientless based configuration management. It’s extremely scalable both from a technical perspective (the server is written in Erlang, client is written in Ruby), and a practical perspective (Facebook uses it to deploy their primary workloads).

Pros

Chef has a full fledged development kit that includes a huge number of tools (I won’t list all of them, but some deserve mentioning)

Test Kitchen is a first class testing toolkit for testing Chef code

InSpec is a first class integration test framework (and a formitable audit and compliance toolkit on its own)

The community is delightful. chef-client and chef server are both open source and have a formal RFC process and open source board. The majority of people on the board do not work at Chef, Inc.

The server is written in Erlang and easily scalable to 10,000+ nodes. (this might not matter, depending on your use case)

SSL is enabled by default when communicating with the server

Over the years Chef hasn’t been too afraid of releasing new versions and deprecating older stuff that can get you into trouble, so the platform is stable and improving, even now.

There are STILL strong use cases for Chef in 2018 on mutable hardware, particularly for Windows workloads, patching, hardening, tuning, and workloads that still don’t run well on Kubernetes (databases, I’m looking at you.)

Chef is the best configuration tool for Windows. (They heavily wrap Powershell DSC, and their WinRM connector is quite good).

Chef easily has the highest quality learning site out of all the tools learn.chef.io. (This is to mitigate how tough to learn the tool is in the first place, so maybe this isn’t a pro)

Everything is Apache 2, and Chef requires DCO on certain repositories. No GPL worries.

Cons

Concept count is high. It’s tough to learn, and there are some landmines to watch out for.

Chef is command and control. (Although it tries to mitigate with other tools, these other tools are out of band and cause some of the complexity mentioned above.)

The Chef Server is TOO powerful. It has many older stateful concepts that live on the server (Data bags, roles, environments, attribute overrides, run_lists). It’s very easy for a less experienced engineer to go wild and use all of these without thinking of the consequences, and get themselves into a corner.

Just like Puppet, Chef is very much a tool for system administrators, and developers have little patience for using it.

Chef doesn’t start making LOTS of sense until you’re looking at around 300 nodes under management. So it’s tough to make a case when you’re just starting out.

You will have to learn Ruby.

In 2018 the relevance of ALL configuration management tools is decreasing, so Chef isn’t the hot sexy thing anymore.

It’s Ruby, so it’s not going to run on your toaster. Compared to something written in Rust or C++, it’s going to run slower

Testing

Why does Chef have an integration test framework? I mean, aren’t resources in a configuration management tool idempotent? Let’s hypothetically say your resources ARE perfectly idempotent (they aren’t). You still can’t trust your resource. So who can you trust? You trust the Observer. The Observer is always right. And you need to have a different actor be the Observer than the actor executing the resource.

This is important: If you are running configuration management of any sort, you need integration testing, compliance tooling, and/or an auditing test framework because the only way to know something is to observe it. Observe as many different ways as you can! Many people are calling this “policy as code” or “compliance as code”, and I think that’s right. It’s a powerful tool because you get a separation of duties. You can show the compliance as code reports to your auditor. One last benefit is because you are separating out the actors, your teams can develop code in parallel. Your security engineers and compliance officers can develop and run their own compliance as code without any dependencies on configuration code or application code. New CVE in gcc-libs pop up last night at 2am? Hold my beer, I’ll write a quick rule that observes our entire infrastructure to check for the bad version, commit to git, and send you a Pull Request to review. Bam, thanks I’ll be going back to sleep now!

You should read more about this topic in Julian Dunn’s blog post Why “Why-Run” Mode Is Considered Harmful https://blog.chef.io/2018/03/14/why-why-run-mode-is-considered-harmful/

As a side note: I think unit testing for configuration management is fairly worthless. Testing out the core resources built into a configuration management tool is useful, but when developers write their own code, the risk you are trying to mitigate is almost always in the runtime of the tool, and not in the compile time of the tool. In fact, writing too many unit tests is an unnecessary burden and a risk in itself. In configuration management, the Observer that matters is NOT the resource, it’s the thing that looks at the runtime. So the moral here is this: If you’re thinking of writing 10 unit tests for your configuration management code, write just 1 integration test instead, and you’ll save yourself lots of heartache. (Yet another reason why developers don’t like configuration management tools.)

Server Side Statefulness Is Harmful

Chef (and every configuration management tool) gives you options to have attributes (key-value pairs that you can override in the runtime). But this is extremely problematic because at run-time, you have NO IDEA where those overrides could come from. They could come from the code, the node itself, or the node object that’s stored on the server. And if you’re setting multiple ones, which one wins? And how does it win? WHEN does it win? Debugging this at runtime is by far the biggest weakness of Chef, and configuration management tools in general. Luckily you can get around this if you’re smart — you can simply not even use server side statefulness. In other words, you disable and regularly blow up the features that the server offers, and use checks in your CI system to prevent developers from checking in code that uses such things. And this holds true for ANY server side stateful data. I wish I didn’t have to disable server side stateful data in the first place though: I’d like to just treat my configuration management servers as dumb code repositories with a bit of an API and authorization layer on them.

In a scenario where you disable server side stateful data, the server becomes an immutable artifact store for your Chef code.

A clever configuration management server admin, hard at work

“Hey look at me, you thought I was storing mutable state for your nodes, but I’m just a big box for dumping in some code with a button on it to copy it to your servers!”

Your operating model becomes much simpler in these scenarios — you pin the versions of the code to the nodes, and you only move the pins when you’re ready to upgrade your nodes. Still — this isn’t perfect because you’re running into the same problem that ALL configuration management has — it takes a different path than the application to production, and has to be run out-of-band. So if something goes wrong with your configuration management deployment, your application is hosed. The result is that even if you are quite skilled using these tools and you go out of your way to have immutable server data, good practices around code repositories, and use a few tricks to deliver application code and infrastructure code at (nearly) the same time — doing production deployments is always a nail biting process because the failure domain is at runtime.

The Tool Reflects The Language and Community It Is Written In

If you start diving deep into your configuration management tool, it becomes very clear that the opinions baked into the tool follow the opinions of the language that it is written in. The reason for this is obvious: The developers that wrote the tool wrote it in their favorite language!

So even though I’ve used Puppet and Chef quite a LOT, and they’re GOOD tools, I’ve never quite felt at home. Python was the first language I really fell in love with; I learned Ruby in high school before I learned Python and didn’t take to it. Puppet and Chef FEEL like Ruby — they have lots of ways to accomplish the same tasks, so reading code can be more difficult than say, Python’s do_it_one_right_way approach. These communities employ monkey patching, or pushing a module on the stack at runtime instead of properly `require`ing their code — which is all very Ruby-like. But you also get the delightful parts of Ruby — gems, bundler, great testing frameworks, and so on. Ansible and Salt, which are written in Python, reflect their languages and communities as well, and the troubles and delights of the Python community bubble up.

One observation that follows this is that many devops tools employ somewhat esoteric languages. (But nothing TOO crazy). (Newer) Puppet Master Server has parts written in Clojure, Chef Server is in Erlang, Habitat is written in Rust, Conduit is written in Rust. I’m not exactly sure why this is, to be honest, and I think it’s probably unique in each case.

Ansible

Ansible is the most popular configuration management tool because it’s easy to use. I’ll say it one more time. Ansible is the most popular configuration management tool because it’s easy to use. It’s die-hard focused on ease-of-use, and uses Python’s batteries-included opinions about how it delivers features to users.

Pros

Ansible is easy to use

ANSIBLE IS EASY TO USE

AHHHH DAVID WHY ARE YOU SCREAMING AT ME, ITS BECAUSE ANSIBLE IS EASY TO USE

A bunch of built in provisioners make it easy to spin up AND configure infrastructure together, which sometimes let you forget about using a provisioning tool or using a centralized server for running your configuration management.

Batteries included philosophy means you get pretty much everything you need out of the box

It’s easy to read the code

You can run it by just doing a `git clone`

Simple use cases can just be written in YAML

Medium complex use cases can just be written in YAML

SOME complex use cases can just be written in YAML

It FEELS like Bash (But, it’s not Bash. Sorry developers)

There’s a VERY clear separation of concerns between Ansible YAML code (Playbooks) and modules (The actual Python resources)

Ansible is REALLY GOOD at configuring API driven devices, like Cisco switches, F5s, and so on. I wouldn’t use anything else except maybe Bash or Powershell (depending on the use case).

Ansible modules CAN be written in ANY language, but for practical purposes, they’re almost always written in Python.

Cons

Ansible does not have an integration testing framework, and actively discourages using one in favor of Why-Run/Dry-Run mode in their own documentation (https://docs.ansible.com/ansible/latest/reference_appendices/test_strategies.html). This is incredibly naive. (But Test Kitchen can work for Ansible too — which is a bit strange that a Chef tool is so good that Ansible users would use it, instead of a native Python testing framework)

Dependency management is a nightmare. Git submodules is heavily used in the Ansible world. Python suffers from terrible dependency management practices in general (although the future is looking bright with pipenv, Ansible does not use pipenv today. https://docs.pipenv.org/)

Ansible heavily encourages mono-repos. Weirdly, Ansible code repositories in practice look more like Puppet repositories than Chef or Salt.

When you start scaling out Ansible (> 400 nodes) many companies end up writing Rube-golberg-esque ways of delivering their code. In essence, many folks end up running Ansible in a Chef-like manner, and they get all the same problems (and benefits) that come with it. But Ansible isn’t written to be a first-class citizen of this use case.

Everyone manages their SSH key distribution differently, and it drives me crazy. Hashicorp Vault with an SSH backend. Hard-coded, not rotated SSH keys, LDAP public key services, or a million other custom things built for this exact problem. This is a direct problem caused by Ansible’s clientless opinion.

Ansible is command and control style configuration

Ansible’s community is a free-for-all, and licenses are a headache

Delightful Python

One of the delightful aspects of Ansible is that it deeply draws its opinions from the Python community, and focuses on the “…for Humans” style to creating features. This means that PEP8, docstrings, and all the wonderful things that Python gives are delightfully, and ruthlessly employed. Yes I’m a bit of a Python fanboy (although these days Rust is my main jam), and I’m about to nerd out about Python a bit here, so hold on.

Here, I’m just gonna copy some random code inside of core Ansible to illustrate this point.

def all_parents_static(self):

'''

Determine if all of the parents of this block

were statically loaded or not. Since

Task/TaskInclude objects may be in the chain, they simply

call their parents all_parents_static() method. Only Block

objects in the chain check the statically_loaded value

of the parent.

'''

from ansible.playbook.task_include import TaskInclude

if self._parent:

if isinstance(self._parent, TaskInclude) and not self._parent.statically_loaded:

return False

return self._parent.all_parents_static()



return True

More than likely, you’ve never read that before today. But you probably have a good idea about what it’s supposed to do and why. Look at that docstring that neatly explains everything. Clear, fully_typed_names for variables. And so on. That is the case with nearly ALL of Ansible, and it bubbles up to how playbooks get written.

A developer, ready to dark fork some public GPLv3 code into their company’s private code repository

Dark Forks

Ansible does something that I frequently call “Dark Forking”. I’m not sure if there’s another more widely used term than this, so again, sorry for introducing new jargon. The idea is this: Somebody else wrote a really, REALLY good Ansible playbook that automates 85% of what you need to do. So instead of putting their code in a dependency management tool, or contributing back to their repository on Github, you decide to copy-paste the code to YOUR Github repository, change all the Copyright notices, LICENSE, and so on to your needs. You put it in a git submodule, change 30 lines of code, and call it a day.

The result of this practice is that the community has LOTS of breadth, but lacks depth. It also means that there can be legal landmines with licenses, and you might not actually know where the code you’re using came from originally. Everyone’s got their little fiefdom of code living under their namespace.

An off the cuff example:

The most popular cookbook on the Chef Supermarket is (arguably) MySQL with 111 contributors on the Github repository, and support for Fedora, Scientific Linux, Centos Redhat, Amazon Linux, Ubuntu, Debian, and Suse. https://supermarket.chef.io/cookbooks/mysql

The most popular playbook on the Ansible Galaxy is (arguably) DavidWittman.redis with 30 contributors, and support for Ubuntu, Debian, Centos, and Redhat. https://galaxy.ansible.com/DavidWittman/redis/

The purpose here isn’t to compare tools, but to demonstrate that something in the Ansible ecosystem is sick. I mean, Ansible is wildly more popular than Chef, how the hell does the most popular Ansible Playbook (outside of core) have only 30 contributors and support only Debian and Centos variants?

This Dark Forking problem seems like a small problem. But it means that your risk of running Ansible Playbooks on esoteric platforms like Solaris or AIX or even Windows, is going to be a much worse experience because of it. The community simply hasn’t pushed hard in those directions — so you’re often going to tread new ground or worse yet, tread ground that somebody in the community already has gone down.

A good example of this is Windows support itself — good support for Windows DSC only landed in Ansible 2.4 (09/2017), when Chef has had Windows DSC support since Chef Client 12.2 (03/2015).

Configuration Management Winner Winner, Chicken Dinner

So which tool is the winner?

It depends on your business and use case.

But if you sat me down and forced me to choose one, I think I wouldn’t pick any of them because there is a better way to solve the same problems. I wrote my last code in Chef and Ansible in H1 2017. I haven’t touched it since.

Fundamental Configuration Management Problems

All four major configuration management tools are more alike than not. In fact, as time goes on, they continue to adopt features and ideas from each other. However, even though these tools are powerful, fundamentally they are stuck as liability reduction tools, and cannot teleport your IT organization into building apps, and thus, cannot be the primary driver behind creating large amounts of value for your company.

Before moving on I’d like to review some core pillars of why I think configuration management tools are fundamentally busted.

1. Configuration management code is developed and delivered out-of-band with application code. As a developer, it often feels like running the 400 meter dash, only to go play a game of soccer at the end of the race.

2. Configuration management is concerned with building a server and then jamming the application on top of it. It feels like you are crossing your fingers and hoping it all works at the end. Configuration management would have you build a Golden Image™, the One True Way™ of where you run your application. In this model, the application is bound to the server.

3. Configuration management’s failure domain is at primarily at runtime. Very little failure occurs at compile or build time. The result is long cycle times, and painful and difficult to understand failure scenarios (path dependency was just one of these discussed earlier).

4. Configuration management tools are server authoritative. As a result, the server is a central point of failure. This is insanity because it exposes your application to additional risk of something going wrong with the configuration management service, messing up an application runtime dependency, and bringing your app to its knees.

5. Configuration management uses command and control interfaces, and not promise-based interfaces. As a result, you end up in scenarios where the configuration management tool is commanding the application how to operate. How the hell does it make sense for your liability reduction machine to command your application to do something? The application is the one that should be in charge, if anything!

So what is the answer to these problems?

I think the future is devops tools specialized in application automation instead of configuration management. Fundamentally, these solve many of the issues with configuration management:

1. Application automation code is developed, compiled, packaged, and built with application code. It’s responsible for managing all of the configuration and runtime dependencies of your application. It can also handle responsibility of the infrastructure provisioning itself.

2. Application automation doesn’t care about building servers. It cares about ensuring that your application is running and managable.

3. Application automation’s failure domain is primarily at compile / build time.

4. Application automation tools are application authoritative. The node (or workload) that is running the application is the one with all the data about the application, so it talks to other applications and services, and not a centralized point of failure.

5. Application automation uses promise-based interfaces. As a result, the application, or service, can make promises, contracts, and agreements with other services, and you can assert that these exist to ensure the continued health of your application.

Docker is the future?

Docker is a wonderful tool that is woefully misused by newcomers. Instead of thinking of Docker as another deployment target for workloads, new developers often use it as a way to get rid of configuration management. And why wouldn’t they? Its painful to configure all that junk anyway!

Let me illustrate: A developer discovering Docker for the first time feels like a mad scientist:

“IT’S IMMUTABLE! BEHOLD MY CREATION! HAHAHAHAHA!!!” — Some developer, circa 2015

I know because that’s how I felt when I watched Solomon Hykes reveal Docker live at PyCon 2013.

From there, it’s only a short trip for developers to build a Docker container, throw it over the wall to operations personnel, and have operators come back and say, “Hey uh, your Docker container is entirely busted.” And somehow it’s the operator’s fault for not understanding Docker or something. It is only then that the developer will realize that Docker isn’t the silver bullet they once thought that it was.

This is because this style of Docker container development has three of the same fundamental problems as configuration management:

1. It’s developed out-of-band from application code

2. It’s concerned with building a server and jamming the app on top of it

3. It is a command and control way of deploying applications

Developer usage of Docker in these scenarios is fueled by pain from using configuration management tools. But you’ll never want to go back to configuration management unicorn land because

Using Docker eliminates large swaths of complexity (The feeling of, “holy hell, I never have to write a Playbook / Cookbook ever again!”)

It does solve SOME fundamental problems with configuration management, namely mutability around servers and server authoritative data.

They’re tiny, so provisioning is much faster, bringing down cycle time.

Eventually the developers and operators will argue, and come to the conclusion that they need Just Enough™ configuration management. So Docker + Ansible will be implemented and you’ll call it a day. I feel like this was how most people discovered docker in 2013–2016. Things have matured since then.

Application Automation

Ultimately, the thing you care about is your application, and ensuring it can be deployed and managed. This is where tools like Kubernetes and Habitat bring application automation to resolve the fundamental issues with configuration management, and tools like Conduit build out new operations workflows.

Kubernetes

Kubernetes is a tool for deploying and managing applications. It does this by having you create a deployment configuration (a YAML file). This is VERY different than the style of YAML used in Ansible, by the way. Instead of being a list of command-and-control structures to go build a server with, these files are key-value pairs of stuff your application is concerned about. Runtime dependencies, ports, and configuration values are all fed through the deployment configuration, and you can make promises about how your application will scale, or what dependent services it needs at runtime.

The major issue with this is that Kubernetes ONLY works on containerized workloads. This is why every VP of Engineering on the planet right now is rushing to find the magical lever that will allow them to rewrite their application to run on a Docker container, so that they can run it in Kubernetes.

In other words: Running your application on Kubernetes is fantastic because it focuses the application as the center of the universe. It allows you to focus on building and operating amazing applications and not building up large monolithic servers. But migrating existing virtual machine based workloads to Kubernetes is a painful journey, frought with peril because Kubernetes has no tools available for building your application. Kubernetes simply doesn’t care about the build stage of your application, it just assumes you have some Docker containers laying around, ready to go. I’m not sure I’d call this a weakness, but it’s something that is frequently overlooked when companies are seeking to move to Kubernetes.

Let’s do a quick checklist to make sure the Kubernetes operating model really solves the problems we want it to solve from configuration management’s problems:

✓ Kubernetes deployments are developed with application code. In practice, I frequently see the kubernetes deployment yaml checked right in to the root of the application code git repository. (it would be weird not to)

✓ Kubernetes does not concern itself with building servers or building Golden Images™. It only cares about the application and how it operates.

⇎ Kubernetes does not try to build your application, so largely it’s exempt from the runtime failure domain problem because the tool simply assumes you’ve already done your due diligence building solid applications and Docker containers. Kubernetes does provision application infrastructure and make promises about how that infrastructure will run in service to the application. However, Kubernetes simply can’t resolve runtime dependencies during a build because it doesn’t have a build part of the tool, so this must be accomplished in some other tool (like Habitat).

✓ Kubernetes does not require a centralized server to orchestrate deployments or configure applications, so it does not suffer from central points of failure.

✓ Kubernetes uses promise-based interfaces to keep your application running, deploy your application, and make assertions about how it will scale.

Habitat

Habitat is a tool for building, deploying, and managing applications. It does this by having you create a plan.sh or plan.ps1 (a Bash script or Powershell script). This script builds your application (usually 15 lines of bash around your dependency management / build tool), declares all your runtime dependencies, and declares configuration files and runtime promises. Habitat will then build your application and give you an atomic package (a signed .tar.xz file) with all of the stuff you need to run your application. Then you can either use that package immediately to run on bare metal or vitual machine workloads, or export it to Docker, Kubernetes and other formats. Habitat also has a Supervisor service that enables applications to talk to each other over a gossip protocol, which is super powerful.

In a very interesting way, Habitat solves many of the problems that Kubernetes doesn’t solve. Kubernetes is primarily concerned with application resources from the point of view outside the container, while Habitat is primarily concerned with appliation resources inside of the container. Put them together and you’ve got a delicious devops chocolate cake.

Habitat isn’t concerned with provisioning application infrastructure at all. It simple doesn’t try to solve this problem. Instead, it leans heavily on exporting to other formats like Kubernetes to solve the issue. Again, I don’t know if I would call this a weakness of the tool, but going the next step and figuring out how to operate everything in Kubernetes is an important part of the process.

Let’s hit our configuration management problems checklist:

✓ Habitat plans are developed with application code. In practice, I frequently see a Habitat directory in the root of the application git repository. (It would be weird not to)

✓ Habitat does not concern itself with building servers or building Golden Images™. It only cares about the application and how it operates.

⇎ Habitat builds your application by wrapping your application’s build tools (Make, MSBuild, npm, etc), and packaging runtime dependencies with the application, so it helps incredibly by pushing runtime failure domains into build time. Still, it doesn’t concern itself with building application infrastructure, so this must be accomplished with another tool (like Kubernetes).

✓ Habitat does not require a centralized server to orchestrate deployments or configure applications, so it does not suffer from central points of failure.

✓ Habitat uses promise-based interfaces to keep your application running and to deploy updates to your application.

For further reading on Habitat + Kubernetes, I’d recommend some of these blog posts:

Habitat and Kubernetes: How Does It Work? https://www.habitat.sh/blog/2018/05/Hab-and-k8s/

Habitat + Open Service Broker https://www.habitat.sh/blog/2018/05/Hab-OSB/

Automate Application Updates with Habitat and Kubernetes https://www.habitat.sh/blog/2018/05/Auto-App-Updates-k8s/

Habitat and Helm! https://www.habitat.sh/blog/2018/02/Habitat-Helm/

Conduit

Conduit is a service mesh that runs on top of a Kubernetes cluster. This gives you the power to debug all the traffic going through your application — in other words, instead of grabbing some logs from your Splunk Logs and scratching your head while your production cluster is on fire, you can just do this:

conduit -n my-nodejs-app stat deploy NAME MESHED SUCCESS RPS LATENCY_P50 LATENCY_P95 LATENCY_P99

my-nodejs-app 1/1 100.00% 2.0rps 1ms 4ms 5ms

vote-bot 1/1 — — — — -

voting 1/1 89.66% 1.0rps 1ms 5ms 5ms

web 1/1 94.92% 2.0rps 5ms 10ms 18ms

and then use a bunch of other commands to interrogate what’s going on with your application.

The key here is that Conduit cares about your application. Not the infrastructure or the servers, or what color the server rack happens to be. This enables it to do some really powerful stuff! The future is looking really, really good.

The Future

I think we’ve all come to our senses and realized that applications are the valuable stuff, not servers, or configurations, or anything like that. Ultimately, what matters is that your customer (or user) is using the application. You play Fortnite to shoot rockets at people, not so you can configure Nginx SSL certificates! It’s the job of infrastructure and the runtime of the application to make promises to the application, so that the application can keep running and working correctly — not the other way around. Configuration management tools are wonderful, but they have fundamental issues that set them at odds with application development. Configuration management tools aren’t dead in 2018 either: there are still workloads out there that need to run on bare metal, or virtual machines, or API driven dedicated hardware, or IoT device, and those pieces of infrastructure require tuning, hardening, patching, and management of the operating system layer.

Bring on the future of applications!