Don’t scale: 99.999% uptime is for Wal-Mart David 99 comments Latest by e.senguttuvan

Jeremy Wright purports a common misconception about new companies doing business online: That you need 99.999% uptime or you’re toast. Not so. Basecamp doesn’t have that. I think our uptime is more like 98% or 99%. Guess what, we’re still here!

Wright correctly states that those final last percent are incredibly expensive. To go from 98% to 99% can cost thousands of dollars. To go from 99% to 99.9% tens of thousands more. Now contrast that with the value. What kind of service are you providing? Does the world end if you’re down for 30 minutes?

If you’re Wal-Mart and your credit card processing pipeline stops for 30 minutes during prime time, yes, the world does end. Someone might very well be fired. The business loses millions of dollars. Wal-Mart gets in the news and loses millions more on the goodwill account.

Now what if Delicious, Feedster, or Technorati goes down for 30 minutes? How big is the inconvenience of not being able to get to your tagged bookmarks or do yet another ego-search with Feedster or Technorati for 30 minutes? Not that high. The world does not come to an end. Nobody gets fired.

Alistair Cockburn taught me a great name for this in Agile Software Development: Criticality. The criticality of your average “Web 2.0” application is one with loss of comfort as the result of something going wrong. Unlike the criticality of the credit card processing for Wal-Mart, which is probably at the level of essential money.

So the short summary is that it’s not a profitable decision to shoot for 99.999% availability for tagging bookmarks. But that’s not nearly as important as the real lesson:

Before you have users, it’s a waste of time ensuring that they can always get to the service

A project that spends a lot of time upfront on scalability is the one that can’t afford to fail. And a project that can’t afford to fail is an inherently uninteresting idea for a new growth business. You can’t carry around the label of Zero Risk (TM) and expect to be the next big thing. It will focus your energy on all the wrong things.

What you need is to embrace the goal of getting someone to care enough about your product that they’ll actually complain when its down. Once the first complains starts to trickle in, you know you’re riding something right, and then you start caring about adding another percentage point or two.

Om Malik thinks that the running-with-scissors approach of most start-ups is a sign of a bubble. Awahh? The bubble was when people thought they needed to spend $3 million dollars buying Sun servers and Oracle databases to build a site for wedding invitations.

The business smarts is when you don’t blow the farm before the crap shot has turned sure bet. Fail cheap. Because odds are you’re going to. And you need to have your shirt for the second round.

So. Don’t scale. Don’t worry about five 9’s or even two. Worry about getting something to a point where there’s reason to worry about it.

UPDATE: Dare Obasanjo from Microsoft talks about how even the big guys have these issues from the position of a company that launched MSN Spaces and grew it to 3x LiveJournal in 1 year:

The fact is that everyone has scalability issues, no one can deal with their service going from zero to a few million users without revisiting almost every aspect of their design and architecture.

Spot on.