Netflix is getting ready to unleash its Simian Army.

The online movie rental company uses a troupe of cloud software – it calls the programs "monkeys" – that poke and prod its online applications and keep the website and its services humming along.

There's a Chaos Monkey, a program that randomly kills virtual machines to make sure that small outages will not disrupt the overall system. They've got Security Monkey – it looks for configuration and security flaws – and Janitor Monkey, too: It looks for system resources that aren't being used and shuts them down.

Over the next few months Netflix will release the source code for these programs and more, giving cloud developers a look at how it runs its services on Amazon's cloud. The plan is "to release pretty much all of our platform, including the Monkey infrastructure, over the rest of this year," says Adrian Cockcroft, the Director of Cloud Architecture at Netflix. "We will be doing bits and pieces of it through the summer and into the fall."

Every Sunday night, Netflix's servers take a beating as they stream movies to the company's 23 million customers. It's the busiest time of the week, but by 4 am on Monday most of those movie-watchers have gone to bed. That makes for an up-and-down kind of business; and one that's particularly well-suited to cloud computing, where users pay for servers only when they need them.

Other companies might consider Netflix's software a proprietary secret, but over the past year, it has gradually become a big publisher of open-source code. Open-source helps Netflix stay in touch with other cloud developers and keeps the company's practices in-line with what others are doing. That's important, because Netflix doesn't want to become a strange outlier in the cloud revolution; it wants to be a leader.

But the open-source program is also a pretty good recruiting tool, Cockcroft admits."The big objective for us in going out and talking about this was, we like to hire the very best people in the industry," he says. "People have to know that you're doing interesting stuff."

From Sun to the Cloud

Adrian Cockcroft hasn't always been a cloud guru. A decade ago, he was a well-respected Sun Microsystems engineer, working hard to make Sun's expensive Unix systems as reliable as the mainframe. But today, Sun is gone, bought by Oracle, and Cockcroft spends his days at developing for Amazon's cloud, where he doesn't have to spend much time mucking about in with cables and motherboards.

Cockcroft is the guy who would have gotten the blame if Netflix's systems had crashed back in 2010 when several million people started watching movies on the Apple's new iPhone. That didn't happen. In fact, Netflix on the iPhone was pretty well received, and nowadays, Cockcroft gets calls from companies wondering how best to move their software to the cloud.

In many ways, Cockcroft's story is a metaphor for the changes sweeping across the corporate technology industry. An author of several well-regarded performance tuning books, Cockcroft can work pretty much anywhere he likes. But when his big server project was scrapped in 2004, he didn't want to cast his lot with another hardware company. Instead, he went to eBay and helped set up eBay Research Labs. After 16 years at Sun, the interesting work was no longer in the traditional IT world, it was in the data centers of companies that were running programs for consumers.

Cockcroft – who bears a faint resemblance to a younger, mellower Michael Gambon – says that Netflix puts a premium on engineering, but it's pretty much like any other medium-sized 1,000-person company.

"In many ways we're a relatively traditional enterprise," he says. "We've been around since '97. We had all the fairly traditional constructs in terms of software and legacy applications and things like that, but we've been able to move more quickly than most people."

In 2007, the Netflix hosted its website in a cage at a local data center. By 2008, it was tinkering with Amazon Web Services, and a year later it used the cloud to help shrink the backlog of DVDs that were waiting to be encoded for streaming. In 2010, Netflix launched its iPhone app entirely in the cloud, with Amazon providing the web services and content-delivery networks such as Level 3 doing the actual video streaming.

Adrian Cockcroft Photo: Netflix

Sure, there are still a few big Oracle databases at Netflix. That's how they keep track of rental DVDs, but anytime you visit the Netflix Website, you're dealing with Amazon's cloud-based servers, which any company can lease out by the hour for its own computing needs.

There's a thing that's separates Netflix from many other companies, though. It's that shifting workload. Netflix is quietest early Monday morning, but there are lulls at other times too, during big sporting events such as the Super Bowl, for example.

Companies with flat data demands might as well run their own data centers, but when traffic goes up and down as much as Netflix's, the cloud makes a lot of sense, says Jason Hoffman, chief technology officer with cloud service provider Joyent.

Netflix in 2009 was in a bit of a Goldilocks situation. It wasn't too big and stuck in its ways to move to the cloud, but it was big enough do build some amazing systems, quickly.

It chose the cloud, and it hasn't looked back.

Cockcroft believes that others could follow his company's example and use the cloud to handle unpredictable workloads. "A lot of the medium-sized enterprises still have enough agility and don't have too much holding them back, so they're the the ones adopting cloud a little bit more aggressively," he says."The really big ones are still trying to figure it out or are too entrenched in what they're doing. And the startups – you can't go to Sand Hill Road and not do cloud now. It's taken as normal unless you have a very good reason for not doing it."

At Netflix, the cloud lets developers take a new algorithm from idea to working website feature in less than a week.

The key? Instant resources for developers. "You click a button and two minutes later your machines are running," he says. "That's the way our developers work and they launch machines directly themselves."

Netflix has developed some pretty sophisticated graphical tools for doing this kind of thing – some of these will also be open-sourced later this year, Cockcroft says.

In conversation, Cockcroft makes his work sound almost unremarkable. But according to Joyent's Hoffman, there aren't many people who have the analytical skills and the knowledge of firmware, virtual machines, operating systems, and hardware you need in order to pull this type of work off. "He's a unicorn," he says. "I can maybe put together a dozen people in a room who are good at this," he says.

This story has been updated to correct the spelling of Cockcroft's name.