Last March, Facebook caught some flak when some hacks circulated showing how to access private photos of any user. These were enabled by egregiously lazy design: viewing somebody’s private photos simply required determining their user ID (which shows up in search results) and then manually fetching a URL of the form:

www.facebook.com/photo.php?pid=1&view=all&subj=[uid]&id=[uid]

This hack was live for a few weeks in February, exposing some photos of Facebook CEO Mark Zuckerberg and (reportedly) Paris Hilton, before the media picked it up in March and Facebook upgraded the site.

Instead of using properly formatted PHP queries as capabilities to view photos, Faceook now verifies the requesting user against the ACL for each photo request. What could possibly go wrong? Well, as I discovered this week, the photos themselves are served from a separate content-delivery domain, leading to some problems which highlight the difficulty of building access control into an enormous, globally distributed website like Facebook.

Here’s an example “public link” of a photo of me in my office:

http://www.facebook.com/photo.php?pid=34947682&id=210132

Posting this link shouldn’t be a privacy problem, as you shouldn’t be able to see a photo by following this link unless you’re in my network of friends. Facebook promotes this view by telling you at the bottom of the page that you can safely send this link out to friends, and in fact such links are posted all over the web if you search for them. Access rights are enforced on the Facebook page, so knowing this link doesn’t reveal the photo. But, unfortunately, the actual photo file is embedded in the page with the address:

http://photos-c.ak.fbcdn.net/photos-ak-sf2p/v646/41/83/210132/n210132_34947682_4899.jpg

Presumably, ‘fbcdn.net’ stands for ‘Facebook Content Delivery Network,’ and the image is hosted here using a high-performance photo server which doesn’t have to do all the session management overhead of the larger site. Keep in mind that, as reported in October, Facebook is hosting 10+ billion photos, by some measures more than any other site on the web. If the URL of a photo were temporary and difficult to guess from the public address, this scheme might be okay. The photo server will in fact respond to a request from wget without any cookies at all. It has to, because it is in a different domain than the main Facebook site, and browsers are specifically designed to prevent transferring state between domains.

Unfortunately the link is neither temporary nor difficult to guess. The links appear to work indefinitely, based on trying some months-old ones floating around the web. Worse, most of the apparent randomness in the URL is not needed to access the photo. The following link is just as valid as the one posted above:

http://photos-c.ak.fbcdn.net/photos-ak-sf2p/210132/n210132_34947682_4899.jpg

All we need is the actual filename of the photo, and I’ve reverse-engineered the filename format as:

[photo-size][uid]_[pid]_[PIN].jpg

Photo-size is just a character in the set {t, s, n} representing the resolution of the image, uid is the user ID of the user who uploaded the photo, pid is a photo ID, and PIN is a four-digit random number. I’m calling it a PIN because it was chosen to be four decimal digits, which can only be assumed to have been done in a foolish analogy to bank card security. It’s easy to learn everything but the PIN given a public link to the photo. Brute-forcing the PIN is also fairly easy: it’s a space of 9000, which can be searched in about 45 minutes using one script. This is also easily parallelisable, given that we can query any of the mirrored photo servers in the set {photos-a.ak.fbcdn.net, photos-b.ak.fbcdn.net … photos-z.ak.fbcdn.net} we can get this down to under 2 minutes.

This is still a lot of work for one photo, but it gets better. Incrementing the photo ID by one reliably gives the next photo that was uploaded as part of the same album. Looking at the next few photos in sequence from the one posted above, the sequence of PINs is {4899,5210,5535,5857,6193,6524,6853}, giving deltas of {311,325,322,336,331,329}. These are almost certainly created by timestamping as the photos are received. So, given the public link to one photo, and doing one brute-force, we can pretty easily get the rest of the album with 10-20 queries per photo. I’ve coded this up and it works splendidly–the photo servers don’t appear to do any rate-limiting or blocking.

How to fix this problem? Obviously Facebook could check the session cookies for every photo request, but we’ll assume this is impractical given the current setup. If concede that using the knowledge of an opaque URL as a capability to view a photo is all we have to work with, then there is no reason not to increase the length of the PIN portion to be a cryptographically-strong 20 digits–it doesn’t need to ever be written or stored by a human. Of course, these must be generated randomly as photos are uploaded. It would also be prudent to have the PINs expire after an hour or so, as they aren’t meant to provide a permanent link, and may end up cached in all sorts of places. Finally, multiple requests with invalid PINs should lead to IP blocking to prevent crawling.

This is a smaller hole than the one from last year, as we need to find a public photo link first. As far as I can see, there’s no predictable pattern of photo IDs for given user IDs, so we can’t access photos for our arbitrary choice of user. Still, it is a privacy violation as Facebook promotes the view that public links won’t allow access to photos, when they actually do. Above all, it is an inexcusably sloppy design, especially given the bad press Facebook received for the original problems.