18.1. Introduction

Puppet is an open source IT management tool written in Ruby, used for datacenter automation and server management at Google, Twitter, the New York Stock Exchange, and many others. It is primarily maintained by Puppet Labs, which also founded the project. Puppet can manage as few as 2 machines and as many as 50,000, on teams with one system administrator or hundreds.

Puppet is a tool for configuring and maintaining your computers; in its simple configuration language, you explain to Puppet how you want your machines configured, and it changes them as needed to match your specification. As you change that specification over time—such as with package updates, new users, or configuration updates—Puppet will automatically update your machines to match. If they are already configured as desired, then Puppet does nothing.

In general, Puppet does everything it can to use existing system features to do its work; e.g., on Red Hat it will use yum for packages and init.d for services, but on OS X it will use dmg for packages and launchd for services. One of the guiding goals in Puppet is to have the work it does make sense whether you are looking at Puppet code or the system itself, so following system standards is critical.

Puppet comes from multiple traditions of other tools. In the open source world, it is most influenced by CFEngine, which was the first open source general-purpose configuration tool, and ISconf, whose use of make for all work inspired the focus on explicit dependencies throughout the system. In the commercial world, Puppet is a response to BladeLogic and Opsware (both since acquired by larger companies), each of which was successful in the market when Puppet was begun, but each of which was focused on selling to executives at large companies rather than building great tools directly for system administrators. Puppet is meant to solve similar problems to these tools, but it is focused on a very different user.

For a simple example of how to use Puppet, here is a snippet of code that will make sure the secure shell service (SSH) is installed and configured properly:

class ssh { package { ssh: ensure => installed } file { "/etc/ssh/sshd_config": source => 'puppet:///modules/ssh/sshd_config', ensure => present, require => Package[ssh] } service { sshd: ensure => running, require => [File["/etc/ssh/sshd_config"], Package[ssh]] } }

This makes sure the package is installed, the file is in place, and the service is running. Note that we've specified dependencies between the resources, so that we always perform any work in the right order. This class could then be associated with any host to apply this configuration to it. Notice that the building blocks of a Puppet configuration are structured objects, in this case package , file , and service . We call these objects resources in Puppet, and everything in a Puppet configuration comes down to these resources and the dependencies between them.

A normal Puppet site will have tens or even hundreds of these code snippets, which we call classes; we store these classes on disk in files called manifests , and collect them in related groups called modules. For instance, you might have an ssh module with this ssh class plus any other related classes, along with modules for mysql , apache , and sudo .

Most Puppet interactions are via the command line or long-running HTTP services, but there are graphical interfaces for some things such as report processing. Puppet Labs also produces commercial products around Puppet, which tend more toward graphical web-based interfaces.

Puppet's first prototype was written in the summer of 2004, and it was turned into a full-time focus in February of 2005. It was initially designed and written by Luke Kanies, a sysadmin who had a lot of experience writing small tools, but none writing tools greater than 10,000 lines of code. In essence, Luke learned to be a programmer while writing Puppet, and that shows in its architecture in both positive and negative ways.

Puppet was first and foremost built to be a tool for sysadmins, to make their lives easier and allow them to work faster, more efficiently, and with fewer errors. The first key innovation meant to deliver on this was the resources mentioned above, which are Puppet's primitives; they would both be portable across most operating systems and also abstract away implementation detail, allowing the user to focus on outcomes rather than how to achieve them. This set of primitives was implemented in Puppet's Resource Abstraction Layer.

Puppet resources must be unique on a given host. You can only have one package named "ssh", one service named "sshd", and one file named "/etc/ssh/sshd_config". This prevents different parts of your configurations from conflicting with each other, and you find out about those conflicts very early in the configuration process. We refer to these resources by their type and title; e.g., Package[ssh] and Service[sshd] . You can have a package and a service with the same name because they are different types, but not two packages or services with the same name.

The second key innovation in Puppet provides the ability to directly specify dependencies between resources. Previous tools focused on the individual work to be done, rather than how the various bits of work were related; Puppet was the first tool to explicitly say that dependencies are a first-class part of your configurations and must be modeled that way. It builds a graph of resources and their dependencies as one of the core data types, and essentially everything in Puppet hangs off of this graph (called a Catalog) and its vertices and edges.

The last major component in Puppet is its configuration language. This language is declarative, and is meant to be more configuration data than full programming—it most resembles Nagios's configuration format, but is also heavily influenced by CFEngine and Ruby.

Beyond the functional components, Puppet has had two guiding principles throughout its development: it should be as simple as possible, always preferring usability even at the expense of capability; and it should be built as a framework first and application second, so that others could build their own applications on Puppet's internals as desired. It was understood that Puppet's framework needed a killer application to be adopted widely, but the framework was always the focus, not the application. Most people think of Puppet as being that application, rather than the framework behind it.

When Puppet's prototype was first built, Luke was essentially a decent Perl programmer with a lot of shell experience and some C experience, mostly working in CFEngine. The odd thing is he had experience building parsers for simple languages, having built two as part of smaller tools and also having rewritten CFEngine's parser from scratch in an effort to make it more maintainable (this code was never submitted to the project, because of small incompatibilities).

A dynamic language was easily decided on for Puppet's implementation, based on much higher developer productivity and time to market, but choosing the language proved difficult. Initial prototypes in Perl went nowhere, so other languages were sought for experimentation. Python was tried, but Luke found the language quite at odds with how he thought about the world. Based on what amounted to a rumor of utility heard from a friend, Luke tried Ruby, and in four hours had built a usable prototype. When Puppet became a full-time effort in 2005 Ruby was a complete unknown, so the decision to stick with it was a big risk, but again programmer productivity was deemed the primary driver in language choice. The major distinguishing feature in Ruby, at least as opposed to Perl, was how easy it was to build non-hierarchical class relationships, but it also mapped very well to Luke's brain, which turned out to be critical.