Whats in Amazon's buckets?

While catching up on some old Hak5 episodes I found the piece on Amazon's S3 storage. If you don't know what S3 is then I recommend going and watching the episode, it gives a good introduction and was all I'd had before starting this project. The thing that caught my eye, and Darren's, was when Jason mentioned that each bucket has to have a unique name across the whole of the S3 system, as soon as I heard that I was thinking lets bruteforce some bucket names.

So I signed up for the free tier and started investigating. I created a couple of buckets and looked at the options, by default a bucket is private and only accessible by the owner but you can add new permissions which make the bucket publicly accessible. I made one bucket private, one public then hit their URLs to see what would happen, this is what I got back:

Private bucket

<Error> <Code>AccessDenied</Code> <Message>Access Denied</Message> <RequestId>7F3987394757439B</RequestId> <HostId>kyMIhkpoWafjruFFairkfim383jtznAnwiyKSTxv7+/CIHqMBcqrXV2gr+EuALUp</HostId> </Error>

Public bucket

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>digipublic</Name> <Prefix></Prefix> <Marker></Marker> <MaxKeys>1000</MaxKeys> <IsTruncated>false</IsTruncated> </ListBucketResult>

There is an obvious difference between the two so that will be easy to test for in a script. The next thing I looked at was the region. When you set a bucket up you can specify which of the five data centres the data is stored in so your data is closer to your target audience. You get the following options:

US Standard

Ireland

Northern California

Singapore

Tokyo

So I setup a bucket in each and accessed them all, the difference when accessing them is the hostname, this is the mapping:

US Standard = http://s3.amazonaws.com

Ireland = http://s3-eu-west-1.amazonaws.com

Northern California = http://s3-us-west-1.amazonaws.com

Singapore = http://s3-ap-southeast-1.amazonaws.com

Tokyo = http://s3-ap-northeast-1.amazonaws.com

But as the bucket names have to be unique across the whole of S3 what happens if you access a bucket in Tokyo with the hostname for Ireland?

<Error> <Code>PermanentRedirect</Code> <Message> The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint. </Message> <RequestId>4834475949AFC737</RequestId> <Bucket>digitokyo</Bucket> <HostId>TC1DCxcxiejfiek33492034AqtEVBxr+1Oj0GJvmCktGVrlcdZz9YjX5wHMbITi2</HostId> <Endpoint>digitokyo.s3-ap-northeast-1.amazonaws.com</Endpoint> </Error>

They kindly redirect you to the correct hostname.

With all this info I built up a script which would take a word list and run through it trying to access a bucket for each word, it nicely parsed out the returned XML, followed redirections and resulted in a list showing public, private and unassigned buckets.

That was good, but what about files? I put some files in my public bucket and hit its URL:

<ListBucketResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"> <Name>digipublic</Name> <Prefix></Prefix> <Marker></Marker> <MaxKeys>1000</MaxKeys> <IsTruncated>false</IsTruncated> <Contents> <Key>my_file</Key> <LastModified>2011-05-16T10:47:16.000Z</LastModified> <ETag>"51fff3c9087648822c0a21212907934a"</ETag> <Size>6429</Size> <StorageClass>STANDARD</StorageClass> </Contents> </ListBucketResult>

That is a directory listing, that is good!

I put some more files in, some private and some public and they all showed up in the list. Trying to access private files though resulted in a "403 Forbidden" being returned and a bunch of XML similar to that for a private bucket. However I can use this, by doing a HEAD on each file in the directory list I get either a "200 OK" or a "403 Forbidden", this means that I can now enumerate all the files to see if they are public or private.

Quick summary... Given a word list I can check which buckets exist and if they do whether they are public or private. For all public ones I can get a directory listing and from that listing I can see which files are public and which are private. I think that is pretty good for a mornings work.

I called the script Bucket Finder and you can download it from its project page.

I've ran the script a few times with some nice long word lists and got some interesting data back but as this post is getting a bit long I'll stop here and you can read the analysis in Analysing Amazon's Buckets.