JR iOS continuous deployment, continuous integration, facebook, software development, software engineering

This weekend at Hackernews I stumbled across a great video showing how Facebook pushes new code live.

It’s the most advanced push deployment system I have seen.

The video is excellent. If you are into software, looking for advanced build and deploy techniques, or just looking for ammunition to improve your own build and deploy process, watch this video.

Here are some of the highlights.

Culture

We operate at ludicrous speed and massive scale.

Tools are not going to solve your problem.

It’s about culture.

No big fat layer of QA, managers and adult supervision.

500 core engineers.

As a developer you will shepherd your changes out from the time you check it into trunk, to the time you release it out to your mom.



There is no army of people who are going to vet it and check it.

You are accountable.

Subversion and git are used for version control.

UI tests are Watir and Selenium.

Oncall duties are serious. When you are on call, you are the guy.

Branching

Generally don’t branch. All work done off trunk. You work. You checking. Bam. You’re done.

Cut one week release branches.

Your change can go out with the weekly deploy, or you can bump it up into the daily deploy.

Testing

Everyone tests at Facebook all the time.

Anyone can open a bug – there is a Facebook group internally anyone can go to to see the latest bugs, and open any new ones they find.

Everything is automated. Here are few of the tools Facebook uses internally to do pushes.

IRC bots

These bots are there to tell you the state of your push.

Don’t bug a deployment engineer asking where you code is.

Ask the bot.

When you push is going live, the bot will ping you and ask you if you are here.

You are to respond, and let support know that you are here to help if needed.

You are on standby.

For a daily push, if you don’t do this, your rev doesn’t go out.

Test Console

Built there own test console to show the state of there tests.

Use Watir, Selenium, + bug suite of unit tests.

Console will not only show which tests are broken, they will show when the test broke, and who’s change broke it.

Shadow branch

Production + changes + tests

Shadowing prod.

This is the working prod changes changes are merged to.

Error tracking (18min)

php errors. Exceptions. Fatals. All the things going wrong.

Can see calling stack for all errors on the site.

Will show subversion blame for that line of code.

Gatekeeper (24min)

This is the tool/process that impressed me most out the the entire press.

Gatekeeper always Facebook to incrementally push changes to the live website, and then turn them on or off in very complicated selected ways (basically a big conditionally).

For example, you could push some new changes live (that you aren’t about) and then only expose them to:

– Employees only

– By country

– Age

– IP

– East coast/West coast

– Anyone but TechCrunch 🙂

You can bump public up ot 1%. In minutes you will get a million hits.

You can grab the data, turn it back down. Make changes. And then turn it on again.

Super cool feature that let’s you ease it out.

Push Karma (25min)

Basically a Karma system where by you are assigned a Karma score (4 stars) and every time you screw up a push, you lose Karma.

Great way for putting accountability on the engineers to make sure their changes make it live OK, and don’t cause the build engineers any pain.

HipHop for PHP (29min)

PHP compiler.

PHP is crappy and slow. So they make a compiled version of PHP.

Generates highly optimized C++ and converts into giant 1 GB binary – which is Facebook in it’s entirety.

Takes a couple minutes.

Savings are 50% performance boost.

Less hardware required.

Open source.

BitTorrent (31min)



Facebook pushes it’s 1GB binary of compiled PHP to it’s 10,000s of servers using BitTorrent.

Very cool. Rack affinity – looks locally first before going out to neighbours.

Ridiculous data speeds.

Can roll Facebook.com in about 15min.

Whole site.

Incredible.

Minimal user impact.

Summary

Tools alone won’t save you.

But you need the right people.

But you need the right culture.

But you need the right company.

Watch the video.