The Time I Got Drunk On S3 And What I Learned

One of the most popular services provided by Amazon Web Services is Simple Storage Service (S3). The service has 11 nines of durability. Two nines of availability. Event notifications for pushing message to SNS and Lambda. Plus many more features to meet a wide range of use cases.

But this post is not about how to use S3. Rather it is about how not to use it.

Instead I want to share some lessons I learned while I was drunk on the idea of using S3 for everything. Not to long ago I envisioned using the service for everything from static websites to running my own NoSQL database. It is safe to say that some use cases are great for the service. Others are downright a waste of your time.

There are limitations and concepts to S3 that even AWS Certified Professional Solutions Architects like myself don’t know until you hit them. Often times you won’t run into these cases unless you are pushing S3 pretty hard. That said these mistakes were quite painful to learn. Often times costing me days or weeks to redesign the system to account for these.

So save yourself some time and learn from the mistakes I have made.

Mistake 1: Non-Random Key Prefixes == Hot Spots

One way to think of the service is to think of it as a cluster underneath the hood. A cluster implies that there is one to N instances in it. Data is distributed across the nodes in the cluster via a partition key.

A good partition, a random key prefix, will distribute the keys in a bucket across different nodes. Yielding a more balanced cluster and no node in the cluster is ever overwhelmed.

A bad partition will not distribute the keys across different nodes. Leading to an unbalanced cluster. It is likely that one or more nodes will fail because they have to handle all the load. For a small amount of keys and a small amount of requests you will likely never see this. For example, creating random key prefixes for your static website bucket is pointless. You are likely not going to be doing enough there to notice a performance impact.

What about a large number of EC2 instances doing a list operation on one key prefix? Guess what happens to the node in S3 that has all that data? It dies and you end up on the phone with an AWS support engineer at 3 AM to try and have them redo the partition.

The lesson here is that a random key prefix balances the load across S3. In distributed systems where lots of S3 requests can happen this is critical. Too many requests to one node in the cluster and S3 must protect itself and return errors back to you.

Mistake 2: List, List, List Everything In A Bucket

aws s3 ls s3://mistakes-made/

Have you ever listed all the objects under an S3 path and thought about how that works? Probably not.

Let’s think about it for a second.

We already talked about why a bad partition key can get you in trouble. Listing operations with a good partition key is bad as well. This isn’t hard to wrap your head around if you think about the cluster concept again.

A balanced cluster will have keys in the listing operation distributed across the nodes. A listing operation for 1,000 keys will need to get N keys from node one, N from node 2, etc. This is time consuming. Data has to be collected across the cluster.

This is why this line is in the AWS S3 listing docs:

Sets the maximum number of keys returned in the response body. If you want to retrieve fewer than the default 1,000 keys, you can add this to your request.

You cannot ask for more than 1,000 keys in a single list operation.

At times you write keys to a key prefix and want to aggregate across those keys. When you want to do an aggregation, you list all files under the prefix. For a small number of keys you will be fine. If you are in the 10’s of thousands, you are in trouble. This operation will be time consuming. S3 could throttle you as well to protect itself once again.

Mistake 3: Getting Bucket Tags Is Not The Same API

aws s3api get-bucket-tagging — bucket s3://your-data-bucket

This one is subtle but will bite you if you are not careful. Notice that this CLI call is not aws s3 but rather aws s3api. No big deal right? Yes….for most things.

If you are doing things at scale and making many requests a second to get a tag on a bucket, you will be throttled. Imagine you relied on a tag on the bucket to know where to route data that landed in the bucket. In a concurrent system there is going to be many requests a second to get the same tag.

A GET request for a key from the bucket is very fast and can handle hundreds of requests per second. A GET request for the tag on a bucket cannot handle that same load. Luckily the solution is simple. Instead of storing that information as a tag on the bucket store it as a file in the bucket.

Mistake 4: It Doesn’t Work For Every Use Case

When I was drunk on S3 I considered moving my entire database into a bucket. It wasn’t even a NoSQL database, so this was going to require reworking my entire data model. But I thought how slick would it be to have my entire database in S3. Wrong. Wrong. Wrong.

Not that it isn’t possible. I actually believe it is possible. But worth it? No.

One, moving a relational database to be non relational is no small feat. Two, there are a lot of things you would have to re-implement that other NoSQL technologies have. Three, for what you might gain in database savings you will likely lose in your own health and sanity.

S3 is extendable to meet all kinds of use cases. You can use it to hold data backups, host websites, act as a command channel, or even be a data lake for future ETL work. For all the great things it can do, there are other technologies that can do it better. Knowing when to leverage the power of S3 and when to pick a better tool is critical to leveraging any service in AWS.

Conclusion

These are lessons that I have learned while using Simple Storage Service (S3) for a variety of use cases. I do not consider any of these lessons as problems with S3. Rather they are misunderstandings or mistakes in my thinking. I am proud of learning these lessons. The only reason I learned them was by leveraging S3 to solve problems I was facing. This is the best way to learn any service in Amazon Web Services.

I do not claim that this is an exhaustive list. Please share any lessons you have learned, so we can all learn something from them. I hope that this post helps you in the future when you are considering leveraging S3 for your use case.

Learn AWS By Actually Using It

If you enjoyed this post and are hungry to start learning more about Amazon Web Services, I have created a new course on how to host, secure, and deliver static websites on AWS! It is a book and video course that cuts through the sea of information to accelerate your learning of AWS. Giving you a framework that enables you to learn complex things by actually using them.

Head on over to the landing page to get a copy for yourself!

If you enjoyed this, don’t forget to offer claps to show your support!