FOSDEM: Configuration management

Benefits for LWN subscribers The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

The 2011 FOSDEM conference had a Configuration and Systems Management developer room on its second day. This first meeting about configuration management and automation with open source tools was organized by the people from Puppet Labs and had a focus on Puppet, but other tools like Chef and Cfengine were also discussed.

Configuration management is about establishing and maintaining consistency of a system throughout its life. For software, this means that the system has to track and control all configuration changes, which can be the contents of files in /etc , the installation of specific packages, file permissions, users, and so on. Having a configuration management tool for your systems is useful in a lot of ways: you can automatically repair a system's configuration after a failure, you can easily reproduce a specific configuration on another system, you can audit changes, and, if you pair the configuration management system with a version control system like Git, you can always return to a known-good configuration if things go wrong. Where configuration management systems really shine is when you have a large number of systems networked together: by automating the configuration, you save the system administrator's time and you're sure that all systems are configured consistently.

The big three configuration management systems for Linux are Puppet (used by Red Hat, Citrix, and the Los Alamos National Laboratory), Chef (used by Engine Yard, 37signals, and Scribd), and Cfengine 3 (used by Facebook, AMD and the Joint Australia Tsunami Warning Centre). Puppet and Chef are broadly similar in architecture, but Puppet has a language designed specifically for the task of describing resources, while Chef is using the general-purpose programming language Ruby to configure resources. Also, Chef seems to be more aimed at developers that want to deploy their web applications, and it doesn't support as many platforms as Puppet does. Cfengine is the grandfather of these configuration management systems (with Cfengine 3 as a total rewrite); one of its advantages is its lower memory footprint and higher performance than Puppet and Chef, but in recent years its popularity has declined. Other configuration management systems that were present in the developer room are FusionInventory, GLPI, and OPSI.

A meta-distribution

In his case study about Linux system engineering in air traffic control, Stefan Schimanski showed how scalable Puppet really is and how it can guarantee reliable mass deployment of the Linux-based, mission critical applications needed in air traffic control centers. Air traffic is growing yearly, so the number of computer systems that have to handle these flights is also growing, as is the work load for the system administrators. Moreover, the systems really need 24/7 365 high-availability: if they go down for 30 minutes, air traffic control has a really big problem. For example, if a computer in a control center freezes, the operator is essentially blind.

These strong requirements coupled with the growing number of servers mean that air traffic control centers need automatic installations of every system with minimal downtime and fast rollbacks. Moreover, all informal requirements documents, described by non-technical people, should be converted into formal specifications of the configuration of the system, to be able to standardize the systems and make their configuration reproducible. Therefore, Schimanski rethought his system engineering approach in 2010 and turned to Puppet.

One thing that Puppet makes easy is distinguishing between the abstract requirements and the concrete implementation. For each node, the system administrator can define how the node has to be configured in an abstract way, e.g. by including classes for a desktop node, a server node, a webserver node, and so on. By reading these node definitions, you can easily see what the node is supposed to be doing, without having to bother with the concrete implementation, which is written in separate files for these classes. For example, the webserver class installs and configures Apache and also includes the configuration of the server class. Moreover, according to Schimanski a good Puppet configuration introduces traceability, which is essential in that kind of environment: "If someone asks where requirement #91 of the requirements document is implemented, it's easy to point out the Puppet code that implements this."

Another interesting idea that Schimanski introduced in his talk was the concept of a meta-distribution: the air traffic control systems are implemented as SUSE Linux Enterprise and Red Hat Enterprise Linux servers, but the Linux distribution itself is completely interchangeable. The AutoYaST or Kickstart files of the installation are minimal, and almost all configuration is done in the form of Puppet modules, e.g. for NTP and other services. The result is a heavily customized enterprise Linux distribution, but all these customizations are documented in a completely formal way. Schimanski explains the rationale behind this approach:

We don't want to depend on one operating system, so if, hypothetically, Novell stops the development of SUSE Linux Enterprise, we could migrate our systems to Red Hat Enterprise Linux or even Ubuntu Server in only four days without redoing all the configuration work.

To a certain degree, Puppet modules can be written in an operating system independent way. There are always some minor differences, such as where the distribution puts its configuration files, but this can be abstracted away with variables that get their value (e.g. the file path) depending on the operating system. Of course you have to check these little things before migrating to another operating system, so it's not effortless, but according to Schimanski, Puppet makes migrating a lot easier.

The Puppet ecosystem

The talks also showed that there is a nice ecosystem of tools developing around Puppet. For example, Henrik Lindberg gave a demo of Geppetto, a new Eclipse-based project developing tools to simplify the process of authoring and using Puppet manifests and modules. The near-term objectives of the project are flattening the learning curve for new Puppet users, supporting best practices, and encouraging the sharing of Puppet modules. Under the hood, Geppetto has a grammar for the Puppet DSL (Domain Specific Language), written with Xtext. Thanks to Xtext, this also automatically results in an Eclipse editor that knows the Puppet language and offers syntax coloring, code completion, code folding, and syntax errors and warnings. Moreover, when creating a Puppet module you can enter metadata and choose dependencies, and at the end you can export the module to a zip file which can be uploaded to the Puppet Forge. The Geppetto integrated development environment can be downloaded as a stand-alone product for Linux, Windows or Mac OS X, or as a separate plug-in for Eclipse.

Another rising star in the Puppet ecosystem is Foreman, presented by its creator Ohad Levy, who joined the ranks of Red Hat in August 2010 as a principal software engineer in its cloud team. This project is now a year and a half old and has 20 contributors, but according to Levy, Foreman will at some point be part of Red Hat's cloud portfolio. Foreman integrates with Puppet and acts as a web based dashboard for it, providing real time information about the status of hosts based on Puppet reports, statistics, and so on. Moreover, Foreman takes care of the low-level details of setting up machines and installing the Puppet client on them, until Puppet is able to take care of the configuration defined in your Puppet modules. It even supports creating virtual machines using the libvirt API, with RHEV-M and Amazon EC2 support in the works. The largest installation managed by Foreman that Levy knows about is running 4000 active hosts. This is clearly a project to watch, as it is backed by Red Hat and it has the potential to make managing an environment with Puppet a lot easier.

Configuration management is not only useful for system administrators installing servers, but also for developers setting up their development environment. Gareth Rushgrove talked about using configuration management tools to get new employees up and running quickly with a development virtual machine. Especially interesting was his coverage of Vagrant, a tool for automated virtual machine creation for Oracle's VirtualBox. Using automated provisioning of the virtual environments using Puppet or Chef, developers can get a complete development environment up and running in no time. Users can configure Vagrant to forward ports to the host machine, to configure shared folders, and so on. It's also possible to package an environment in a distributable box, and rebuilding a complete environment from scratch or tearing down the environment when you're done is possible with a single command. Normally users start by downloading a base box to use with Vagrant (the default one is Ubuntu Lucid Lynx), but they can also build their own base box with a tool like VeeWee.

Lessons for disaster recovery

While Puppet clearly was the most visible configuration management system at FOSDEM, it was not the only one. Joshua Timberman, Sr. Technical Evangelist at Opscode (the creators of Chef), gave a short "Chef 101" talk, followed by an overview of how to use Chef to deploy applications with nothing but the source code repository and data about the application configuration. Traditionally, one deploys applications with tools like tar, rsync and (in the Ruby world) cap deploy , but what do you do then with the server configuration, like that needed for web servers, load balancers, database servers? Timberman showed how you can easily deploy web applications with their corresponding servers using various server roles configured in Chef cookbooks. The Chef server itself is a lightweight Ruby on Rails application, and the largest Chef deployment that Timberman knows about has 5000 nodes checking in to the Chef server each 30 minutes.

The first talk of the day was by Nicolas Charles and Jonathan Clarke who presented their use of Cfengine in their company Normation and focused on their experiences with disaster recovery. All their services (web, email, Git repository, Redmine, ...) were running on one hosted server. This used a three-disk RAID5 array, with daily backups, separate virtual machines for each service, and all services automatically installed and configured using Cfengine 3.

When two hard drives failed simultaneously, they first thought this would be easy to repair, as they had backups and used a configuration management system. However, it seemed they had forgotten some things. For example, they hadn't automated nor made a backup of the configuration of the virtual machines, so these had to be re-created manually. Moreover, after watching all the services coming back online with the right configuration thanks to Cfengine 3, they saw that they had to manually restore the backups, after which they saw that a couple of files were missing. The three big lessons here are: don't forget to describe your virtualization setup in your configuration management system, tie in your configuration management system to your backup tool, and always test your backups.

The system administrator as glue

The best quote that summarized the don't reinvent the wheel approach of configuration management came from Levy's talk: "Automate as many processes as possible, using best practices where available, and act as the glue between the gaps." In this regard, it is interesting to know that everyone can share their Chef "cookbooks" (packages of "recipes") on cookbooks.opscode.com, and Puppet users can share their Puppet modules on the Puppet Forge. This is great for new users who can research the modules of other users and reuse them in their own infrastructure. Your author had already automated some of the services on his home network with Puppet, and this configuration management track at FOSDEM was inspiring enough to continue this approach and decrease the amount of glue in his network.