Your Cloud Instance Just Died. We're Sorry.

Why Does AWS Keep Failing?

It seems like in the last few months, Amazon Web Services has suffered a number of massive outages--outages that have affected millions upon millions of people. When you knock Netflix, Pinterest and Instagram offline, you are going to make a lot people unhappy. But not only that, last night, AWS went down taking Heroku and DotCloud, two of our competitors offline. This time, it was due to a massive thunderstorm in Northern Virginia. But, one has to wonder, how did this happen? This outage makes Amazon look like an amateur in the game: it's been over 10 years since we've heard of hosting service provider losing power to the raised floor in the datacenter. When you are that big, and you are the primary service provider for so many services, when you go down, people notice--especially those that are doing it right and wondering how you let this happen to you.

Amazon was the only DC provider to go down in Ashburn, one of the biggest datacenter hubs in the US.

Datacenterknowledge.com has this to say:

The Washington area was hit by powerful storms late Friday that left two people dead and more than 1.5 million residents without power. Dominion Power’s outage map showed that sporadic outages continued to affect the Ashburn area. Although the storm was intense, there were no immediate reports of other data centers in the region losing power. Ashburn is one of the busiest data center hubs in the country, and home to key infrastructure for dozens of providers and hundreds of Internet services.

I am not one to kick someone while they're down, but I do want to take second to point out a few things. First, it seems that Amazon isn't the most reliable infrastructure out there. Secondly, it's never smart to put all of your mission critical stuff in one place, as it seems that Heroku and DotCloud have done. And lastly, when you are looking at providers, it's a good idea to check their reputation--something we make it a point to do here at Jelastic. In fact, from March of 2006 up until now, our US service provider, ServInt, has had 99.99% uptime! That's over 2,200 days straight! In this industry, that's unheard of: even the biggest name in the game, Amazon, doesn't even come close.

Reliability. It really does matter.

Not even 15 miles from the AWS US-East datacenter that had the outages is ServInt's Reston datacenter that houses Jelastic-US. Even ServInt's COO, Christian Dawson, was without power at his home for 15 hours, but his datacenter was up and running, making sure Jelastic was available through the storm. If that doesn't sum it up, nothing does.

This is the second major outage that Amazon has had in very recent memory. In fact, not two weeks ago, our COO, Dmitry Sotnikov, wrote about the June 14th AWS outage and how it's not smart to put all your eggs into one basket, especially if that basket is prone to crashing to the ground.

Uptime.

Although this is outage specific to the US, it's worth noting that a number of services have been affected elsewhere. TechCrunch was one of the first to notice this:

Worth pointing out that these outages seemed to also affect services in other markets like Europe — meaning that, despite Amazon having more local hubs in Europe, Asia Pacific and South America, these services appear to be routed through only one of them, in North America. We’ll keep checking and updating.

So, if you were trying to post photos to Instragram yesterday... yeah.

Have a great weekend. Sorry if you can't stream movies right now.

Related articles