Why you shouldn’t be using S3 or Google App Engine

Recently a new ‘hype’ has been popping up, namely Amazon’s S3 and Google App Engine. For those who don’t know, S3 and App Engine are basically hosting facilities for web applications which run, and data that’s stored, remotely on Amazon or Google’s infrastructure. This allows you to benefit from the huge and reasonably super-scaling architectures that these companies have built.

While this may seem nice, I’ve had my doubt about the usefulness of such services. For one, it fosters vendor lock-in. Your application with an Amazon S3 database back-end won’t be able to run anywhere else. Neither will your Google App Engine application. Sure, some of the software is released as Open Source, but the software is just icing on the cake. It’s the architecture that really counts, and it won’t be easy to reproduce Google or Amazon’s architecture. And when you build your application against those architectures, it’s bound to become limited to them, in the same way (and probably even beyond that) that SQL queries you write for one database will perform badly on a different database.

And I am now seeing the first proof of these problems, as well as entirely different problem: debugging. You see, beyond the tools provided by the service, you are out of options when it comes to debugging. When hosting your own data storage or application platform, you can sink your teeth into it when problems arise. Even when most of the stuff you’re using isn’t Open Source, you’ll still be able to sick a whole range of diagnostic tools at the problem. No such luck with remote application hosting services. They’re black-box beyond the most common debugging situations.

That’s what the people on a thread at Amazon’s Web Services Forum are experiencing right now, if you ask me. I’ll quote the gist of the conversation:

all data we store on S3 has gone through the same code path for months. starting a couple days ago a small percentage of the objects we are retrieving are not checksumming to the correct values. we hash and store objects by checksum and rehash the objects when we retrieve to ensure there is no data corruption. all the objects we’re having issues with were uploaded at approximately the same time period a few days ago. we’ve stored 10’s of millions of objects in S3 and never encountered such problems. please let me know ASAP if you have any idea what could be going on here. thanks.

I’m having similiar problems. […] I’ve been investigating our end to find the problem, and it was just suggested that I should check the forums to see if anyone else was having problems. […] This is super-high priority for us (both corporately and personally, since lack of sleep dealing with this is killing me

The first post was made on June 22, 2008 5:05 PM. An Amazon support engineer (I assume) is working on it, but at June 23, 2008 11:12 AM there is still no answer.

And this shows exactly my objections with such remotely hosted application services. It’s out of your control. That’s something I couldn’t live with, and I think most companies shouldn’t want to either. As one user comments: “ This is super-high priority for us (both corporately and personally “. Staking your entire business on some black-box remote service seems like a silly idea to me. Most service providing software I (and the company I work at) use are Free or otherwise Open Software, which means we’ll always have the source to dive into when we run into problems. And even if the source isn’t there, at least you’ll get to look at the problem from both sides. When it comes to database problems, you’ll be able to view the logs, turn on debugging, inspect the entire environment the service is running in, from software to hardware. That’s something you can’t do with a remote service.

Yes, there will always be problems for which a fix is hard to find or for which there simply isn’t a fix. If you’re not willing to run that risk, you can pay a company a lot of money for support, and let them handle the hard problems (or for some even the easy problems). Naturally, Amazon and Google both offer that support too, but that’s still different. You see, when you run into a really unfixable problem in your own architecture, you can always swap any part of the software or hardware and try again. Or at least, that’s what most developers try to achieve anyway: interchangeable software (databases, application servers, programming libraries) and hardware. But with a remote service such as S3 or App Engine, you’ve already committed your entire application to one, huge, non-interchangeable component from which you can never escape. Hence the vendor lock-in.

But who knows what the future brings? This may all turn out to be a non-issue, and companies and developers may all flock to such general service providers in the future without any difficulties whatsoever. I guess I’m a conservative person when it comes to these kinds of changes. I’d rather wait and see.