Quantum of Deployment

Posted by Erik Kastner on May 20, 2010

aka. Deployinating the Country Side

UPDATE 2011-07-29:

Deployinator is now Open Source!

Grab it on github: https://github.com/etsy/deployinator

We deploy a lot of code. Deployinator is our creation to make that as easy and painless as possible. Deployinator is a one button web-based deployment app. Hit that button and code goes to our webservers and is serving requests in almost no time. Using Deployinator we’ve brought a typical web push from 3 developers, 1 operations engineer, everyone else on standby and over an hour (when things went smoothly) down to 1 person and under 2 minutes.

At Etsy, we’re doing what’s come to be called Continuous Deployment. However, what we’ve learned is that having a tool like Deployinator is useful for more than just enabling that. This post is about those benefits – for anyone deploying web code.

Why

Our job as engineers (and ops, dev-ops, QA, support, everyone in the company actually) is to enable the business goals. We strongly feel that in order to do that you must have the ability to deploy code quickly and safely. Even if the business goals are to deploy strongly QA’d code once a month at 3am (it’s not for us, we push all the time), having a reliable and easy deployment should be non-negotiable.

It’s a metric I’ve come to call the “quantum of deployment”: what’s the smallest number of steps, with the smallest number of people and the smallest amount of ceremony required to get new code running on your servers? This isn’t a trivial question. Even if you’re on a slow release cycle, and have a push engineer, what happens if there’s an emergency push needed? Does it go through your normal process, or is there a fast-lane? Do your fast-lane deployments get logged? Are they measured for speed? Is everyone aware that it happened the way they would for a normal deployment, or is it your dirty little secret?

It’s not hard to get started. If you currently have a bunch of shell scripts that move everything in place, wrap those up with a single shell script. The most important thing is that it’s ONE easy step. This might require changing your process. Try to remove or replace special cases. The less thought it takes to deploy, the more you can focus on getting stuff done.

Once deploying is No Big Deal, a lot of things can change. Features can go out a piece at a time instead of one all-or-nothing push. Your app configuration options can be in code – and changed quickly. Your hair can grow back. Puppies will lick your face!

What

Custom software? There’s a lot of choice out there, and we generally try to not reinvent the wheel whenever possible (and this wheel has been invented again and again). After comparing our requirements with the available software, we decided to roll our own. Here’s what we were looking for:

Web based

Logged (When, What and Who)

Adaptable to our network

Run from a central location

Announced in our IRC and email

Transparent in regards to its actions

Integrated with our graphing/monitoring tools

A lot of our requirements are inspired by the way Flickr deploys, as documented in Building Scalable Websites (Written by Friend of Etsy, Cal Henderson).

For anyone deploying code (engineers, designers… anyone really), it’s an easy process. Once your code is ready to go, you go to Deployinator and push the button to get it on QA. From there it visits Princess (see the sidebar). Then, when it’s ready to go live, you hit the “Prod” button and soon your code is live, and everyone in IRC knows who pushed what code, complete with a link to the diff. For anyone not on IRC, there’s the email that everyone gets with the same information.

What it looks like

This is the main Deployinator screen. Here is how we deploy the “web stack”.

Sidebar: What the heck is a “Princess”? Princess is our staging environment. So why didn’t we just call it “staging”? Partly, that’s just how we roll. We used to have an environment called “Staging” that was the last stop before code went live to everyone. However, it was not ideally set up; the first time your code would interact with actual production hardware was when you deployed. Princess uses production data stores, our production network and production hardware. In order to make a clean mental break from the old way, one of our rad engineers, Eric Fixler, came up with “Princess”.

Here’s our IRC bot telling everyone that something went out. It also includes a link to the commits that went live.

How

When we first brought Deployinator online, it was just a web frontend to the shell scripts that moved everything in the right place. What we gained by putting a screen in front of it was the ability to iterate the backend without changing the experience for people deploying. Deployinator currently uses svn to update the code, then rsync to move it between environments.

Another important part of Deployinator is that the environments are a one way street. Code going from Princess to production is unaffected by any commits that have happened since getting on princess. This creates something of a “mantrap“, so that we know exactly what we’re deploying. No surprises!

Deployinator isn’t used just for our web stack either. With the simple architecture we’ve built, we can add all kinds of stuff to it easily. It’s currently used for many different things such as the API, Lists service, internal admin-only tools and others. Having a single deployment process has removed a lot of complexity.

When

This isn’t a post about continuous deployment. Having a very simple deployment procedure is something you should do even if the thought of deploying your code 20 times a day scares you. Deployment can be a contentious subject with many stakeholders. Getting it simple and repeatable allows everyone to share a common vocabulary.

For the nerds…

Transporting the bits and bytes

Here’s a rundown of some of the interesting parts of how Deployinator actually moves bits around. As mentioned above, this has changed and will change again. We analyze our entire process and have some low-hanging performance fruit to pick. As of today, an API push takes about 18 seconds, a Princess push takes about the same, and a production web push is 70-150 seconds. Here are the steps that a web push goes through:

From the repo of truth

We’re deploying directly from trunk (that’s a whole other post!). So the first step of deploying is to update the code on the deploy host.

Builda what now?

After the code is updated, we run “builda”, an app we wrote to take our assets and bundle them up with lots of magic. We use a combination of javascript async loading, Google’s closure and versioned directories to make things like css, javascript and sprites as fast as possible.

Rsync

At this point, we have a bunch of directories on our deploy host that represent our web tree. We then rsync to a staging area on each web box.

Fanout

At some number of boxes, rsyncing back to a single push host stops working. We’re employing a strategy called fan out. We rsync in chunks of 10 hosts at a time. This is one area where a lot of speed ups will be happening soon.

First they came for our assets…

Pop quiz, hotshot: Someone visits the site during a deployment and box 1 (the one they randomly get) has the new code. The html they’re returned refers to a new image. When they request that image, they end up on box 451.. which doesn’t have that asset yet. What do you do? WHAT DO YOU DO?

We’ve solved this with two steps. The first (mentioned above) is versioned asset directories. The next is to get those assets on ALL hosts before ANY host has the new code referring to it.

Graceful

We’re using APC user caching, and expect it to have fresh data each deployment. Things like some application settings, database fields and routes are all cached for super fast lookups. However, if someone changes something in some code going out, we need to make sure that it’s fresh. We’re currently issuing a graceful restart to our apaches on each deployment.

Deployinator itself

One of the design goals of Deployinator has been to be as simple as possible. It’s a sinatra web app, but could really be written in anything. The script that does the svn updating (and checking out for new stacks) is in PHP and some of the straight-up simplest code possible.

The commands that Deployinator runs through for each different stack are listed in ruby methods, and are mostly strings (with servers and such interpolated). It’s easy for anyone to come in and change how something works. Simple, understandable software that gets the job done.

The one fancy bit of Deployinator is the streaming rack middleware that powers the live updating code window:

Database DDLs aren’t code

An awesome feature of Capistrano is the ability to run schema migrations as part of your deployment. At a certain scale, however, database changes become more time consuming and dangerous. All of our schema changes go through a stringent process with several checks in place. However, not all schema is defined in the database. Whenever we have schema that’s defined in code, or inside the data itself, it’s just a normal code push.

Conclusion

Our deployment process is a very important part of how we work at Etsy. We treat it just like our web code, databases or other “serious” things. Deployinator has helped us to get more features out faster with less defects and drama. As we triple our engineering team in 2010 (we’re hiring!), tools like this are what make it possible for us to change the world.