Amazon's Elastic Block Store ("EBS") service, an underpinning component of Amazon's extremely popular Elastic Compute Cloud ("EC2"), experienced a substantial service interruption this afternoon. Amazon EC2 has become such a ubiquitous feature in the cloud computing landscape that it's difficult to throw a rock without hitting a large company with a public Web offering that uses it. So today's service interruption bit deeply: among the sites knocked partially or totally offline were reddit, Imgur, and developer favorite Heroku.

EC2 is an "infrastructure as a service" offering, quickly providing the computing and network bandwidth necessary to host websites and Web applications of varying sizes (hence the "elastic" part—they can provide as little or as much cloud power as you're willing to pay for). Destinations like reddit use it to host portions of their sites because Amazon can provide infrastructure as a manageable, measurable, forecast-able expense, and can grow or shrink as dictated by demand or budget. Hosting your website on EC2 is a quick and often inexpensive way to get yourself into "the cloud."

EBS works in concert with EC2 by providing chunks of storage space which can be used by EC2. If your website needs a lot of storage, a quick way to make that happen is to add EBS space, which is a lot like adding more hard drives to your EC2 cloud instance (except that EBS chunks are of course not hard drives, and they carry extra features like snapshotting and cloning).

EC2 and EBS, as part of the Amazon Web Services suite, are designed with substantial amounts of redundancy and failover capability. In addition to having local redundancy, the services are divided up into partitions called "Availability Zones," and large customers can spread their EC2 instances out across multiple zones.

Today's service disruption was centered on one of the USA's East Coast EBS availability zones, and was acknowledged by Amazon just before 1:00pm CDT on its Web Services status page. reddit became unavailable for many immediately after, followed by several other sites. The problems seem to have mostly subsided and Amazon is currently advising customers to manually relocate their EC2 workloads outside of the affected availability zone if they continue to experience slow performance.

Unfortunately, this is not the first time Amazon's distributed cloud has run into Internet-crippling issues. There was a much more serious EC2 outage in April 2011, which again stemmed from EBS-related issues. These kinds of outages are a jarring reminder of the true nature of "the cloud"—it's still just servers in data centers. The basic concept of cloud computing is to abstract the annoying physicality of things away from the user, but it's not turtles all the way down. The service is only as available as the underlying technologies make it.

In this case, in spite of the lessons learned from last year's outage, an EBS failure has again impacted a significant number of major websites, and it would appear that the automated procedures built into EC2 still do not protect against some kinds of failures. The takeaway for anyone considering cloud hosting for your website or application is that there is no magic pill, and just because something is "in the cloud" doesn't mean it can't come crashing down to earth when there are problems.