2012-02-18

UPDATE: This tutorial is a bit out of date. It is still accurate about the concepts, but rather than using the shell scripts described below you can do everything with knife-solo. Just say knife prepare your.hostname.com to install Chef there, then say knife cook your.hostname.com to apply your runlist. The documentation has more details. You should also use Librarian to manage your cookbook dependencies. These two tools will make your life a lot easier. Good luck!

Chef is a tool to build servers in an automated, repeatable, and self-documenting way. It is especially useful for cloud-based computing (what I like to think of as “disposable computing”), where you can launch a new instance with just a button click or shell script. But Chef is fairly new, and good documentation is hard to find.

This article describes how to use chef-solo to set up a simple webserver on a fresh VPS. First we cover some high-level concepts, then we show how they apply to chef-solo, and finally we give a method to bootstrap a new instance by installing Ruby and Chef. There’s no need to install Ruby by hand; the scripts at the end of this post will do that for you.

Cookbooks, Recipes, and Run Lists

In Chef, everything is a cooking metaphor. You have various “cookbooks,” which store one or more “recipes.” Each cookbook knows how to install/configure one thing. So there is a cookbook for Apache, for Postgres, etc. Typically a cookbook will have one default recipe, plus other recipes for additional things. So with the Apache cookbook, you use the default recipe to install Apache, and you use the PHP5 recipe to add the PHP module. There are also more basic cookbooks for doing things like creating user accounts.

Your cookbooks and recipes are not the dinner. A cookbook is more like a library, and a recipe is like a function call. It is parameterized, so you can use it to install Apache with this DocumentRoot here, and that DocumentRoot over there. Thus, they get reused and shared. You can find individual cookbooks on the Opscode community site, but it’s easier to grab a whole set of cookbooks off github. Here are the cookbooks by Opscode, the makers of Chef, and here are the cookbooks from 37Signals. If you find a well-written, good-size cookbook collection, you can just fork it then tweak it as necessary.

The way you use your recipes to set up an individual server is by specifying a “run list.” You don’t cook every recipe in the book every time you make dinner. Instead you follow that night’s menu. A runlist is like your menu. It says which recipes to use, in which order, and which parameters to pass to each one. Your runlist might look like this:

" run_list " : [ " recipe[base] " , " recipe[hostname] " , " recipe[users] " , " recipe[apache2] " , " recipe[apache2::mod_php5] " ]

Here, names without a double colon are requests to run the cookbook’s default recipe, whereas apache2::mod_php5 will run the mod_php5 recipe in the apache2 cookbook.

Chef Server vs. Chef-Solo

There are two ways of structuring Chef. The standard, approved way is to run a master server with all your cookbooks, where each machine you want to configure is a client. You can set up Chef Server yourself or pay Opscode for a hosted solution (free for up to five nodes). I tried this way at first, and it makes sense if you’re managing a horde of machines, but it’s a lot of work. For learning the ins and out of Chef, I’d start with chef-solo, then graduate to the client/server setup once you’ve baked a few recipes.

Chef Server also demands powerful resources. Chef Server has so many dependencies (Erlang, CouchDB, Java, RabbitMQ, . . .), I couldn’t even manage to keep it running on an AWS Micro instance. That means if you’re on EC2, you’re going to pay $63/month for a Small instance just for your configuration management server. (Or you could power it on and off whenever you need it.)

The alternative is chef-solo. Opscode doesn’t seem to favor this approach, but I think it’s growing in popularity. It’s great if you just want to launch a few servers, and recently it has acquired some of the functionality it used to lack (like Data Bags).

Chef-solo runs solely on the client machine—that is, the machine-to-be-configured. So with this approach, you bootstrap by installing Ruby and Chef, then you upload your cookbooks and run list and tell chef-solo to get to work. There is no network communication, other than your ssh session.

I used chef-solo to set up this blog on a VPS running Ubuntu 10.04. You can find my scripts to run chef-solo on Github. The key files here are site-cookbooks , which contains a few custom cookbooks I had to write specifically for this host, solo.rb , which tells chef-solo where to find cookbooks, and solo.json , which has our run list and its parameters. Any time a recipe uses a value like node[:foo][:bar][:baz] , Chef checks solo.json to find that value.

We also need a base cookbook collection with standard recipes. I keep my standard cookbooks in a separate Github repo, then symlink to it like this:

$ cd ~/src/illuminatedcomputing-chef-solo && ln -s ~/src/cookbooks cookbooks

I suppose I could use git sub-repositories for this, but alas I’m not that fancy. If you look in solo.rb , you can see that it references both cookbooks and site-cookbooks . Later entries override earlier ones, so recipes in site-cookbooks will replace our standard ones if necessary.

The last item to note is the data-bags directory (also referenced from solo.rb ). A data bag is another JSON collection of data, which you can access from a recipe. I use it to store users’ public ssh keys, so I don’t have to put those in solo.json and hence in source code control. It doesn’t really matter so much to share a public key, but if you were using passwords, a data bag could be a way of keeping those more restricted. Chef-solo didn’t get data bag support until 0.10.4, so make sure you install a recent version (as in our bootstrapping script below).

We run chef-solo by passing in solo.rb and solo.json . It looks like this ( -c for “config”, -j for “json”):

chef-solo -c solo.rb -j solo.json

That command will keep chef busy for a while cooking everything we asked. If something fails, you can fix the problem and run it again. Chef tries hard to keep every operation idempotent, so re-running it won’t cause problems. Nonetheless, once you have things running smoothly, I’d wipe the machine and do it again, to prove that you can run the full script without errors.

Bootstrapping

One part of this process is still annoying: we’ve automated everything except the installation of chef itself. The other two files in my chef-solo tutorial github project, deploy.sh and install.sh , handle this task. I’ve adapted these scripts, but I’m not the original author. They seem to have evolved over several generations.

We’ll run install.sh on the server. This checks if chef is installed, and if not, it assumes we have a fresh instance. So it updates all the packages, installs RVM, installs Ruby, and finally installs chef. Then, once it knows everything is ready, it runs chef-solo, passing the solo.rb and solo.json files. You can run install.sh multiple times, and after the first time it will just run chef-solo.

The last script is deploy.sh . We run this on our laptop (or whatever). It bundles everything into a tarball, copies it to the remote server, and runs install.sh over there. So if everything is working correctly, deploy.sh is the only script you need to run by hand.

One warning, though: because install.sh updates all the machine’s packages, after I run it Ubuntu wants me to restart the machine. It will still go through the whole chef process and set up Apache, but then I do a reboot before deploying my actual blog/application code (using rsync). It’d be nice to do the reboot as soon as install.sh finishes the package update, but then the script would need to wait for the machine to come online again before running chef. In my case, this manual step wasn’t something to worry about, but it might matter to you. One way around it, if you’re on EC2, would be to launch the box using an AMI that already has updated packages.

Conclusion

I hope this helps you get started with Chef. You can find my cookbook code and the chef-solo tutorial code on github. With that code, you should be able to bootstrap chef on a clean Ubuntu installation, then use chef to install Apache. From there, you’ll probably want to learn more about the structure of cookbooks and how to write a recipe. The most important part of recipe-writing is knowing what resources are available. Resources are essentially methods you can call from your recipe to do operations on the server, like writing out a file or installing a package. But cookbook internals really deserves its own post. If you have any questions, please feel free to ask them in the comments section below.

Please enable JavaScript to view the comments powered by Disqus.

Disqus