William Smith

Well, I'll start off by saying that the s3 API is obviously very good. It is ubiquitous. It has excellent tooling, it plugs right into a movie an amazing number of off the shelf things and it makes it very easy to work with. But when developing this product, specifically, I had to work with it at a much lower level. And when you actually start making calls to s3 directly, instead of using some tool or library to do it, it becomes very apparent the design limitations The API, for instance, it's often not easy to compile all the information you want about an object or a bucket. from just one API call, you have to as an example, to fetch the ACLs of a bucket, you need to make a separate call. And to fetch the permissions for other users, you have to make a separate call. And so it can be easy to build up a whole bunch of calls to s3 that you want just to compile one piece of useful information. Additionally, it's got some features that while very powerful, can also be very hidden, like the bucket versioning that you mentioned earlier, which has some very strange bits of behavior in that the prior versions won't be obvious when you are listing objects in most ways because they're not returned from the regular s3 endpoint. And you have to make special calls to find them. But you can't delete a bucket that is an empty so if you want to remove a bucket that appears to be empty, you might need won't be able to because it's got prior versions, and disabling versioning isn't actually enough to get rid of them. In all cases, you often need to configure a lifecycle policy, and then wait until they clear themselves out. We also had one customer during the beta period, who was very confused about the usage in one of their buckets, because we were reporting that they were using a huge amount of data. And from their perspective, it looked like they were using very little. And after working with the customer and looking at Sef and trying to debug it, we found that it was actually all hidden in multi part upload metadata. So s3 gives you I think, five megabytes per file per upload. And if the file you need to upload is bigger than that, you have to upload it in multiple chunks. Most clients handle this for you seamlessly and it works great, but the client library they were using was actually a Mineo client and Java was not a boarding multi part uploads that failed in the correct way. So the metadata was staying in there. But because you have to make several very strange API calls to see what multi part upload metadata is in a bucket. It wasn't obvious to them where this extra usage was coming from, which I think speaks to that. Well, the s3 API is fantastic. And it's very widely supported. It seems to have grown organically as new features were added to s3. And that's good. In some ways, you have a very well supported and maintained thing that a lot of people know how to use, but at the same time, it makes it so that if you're just coming into the game and looking at it, you might look at it and say, I have no idea what's going on here. Do I think that it should be standardized and redone? Probably not. I mean, at this point, it's so widely supported that it would be terribly disruptive to the entire object storage ecosystem, if there was just a new standard way to access it, that everything had to support and it probably wouldn't work but it is important, especially if you're working directly with s3 to really read the fine print. There are some gotchas in there.