Insecurity in the Jungle (disk)

A few weeks ago, in the wake of stories about Dropbox's poor security, a user of my Tarsnap online backup service mentioned that he had heard Jungle Disk recommended as a secure alternative. This surprised me, since I remembered from the early days on the Amazon Web Services developers forums that JungleDave — as the author called himself — was always far more concerned with ease of use than with security. Had things improved? I decided to investigate, and I wasn't impressed with what I found.

Unlike most online backup / storage companies, Jungle Disk has released source code, here and here. They did this because in the early days of Jungle Disk, people wanted some assurance that they could get their data back if Jungle Disk went out of business; since the Jungle Disk client stores data directly to Amazon S3 and Rackspace Cloud Files, it is also possible to read files directly from those services. (This is also a feature which Tarsnap users frequently request, but the design of Tarsnap — including amortizing S3 PUT costs across blocks uploaded from multiple users — makes it impossible to provide such a mechanism for Tarsnap.)

Now, this code is not the code used in the actual Jungle Disk client — like most other online backup services all you get is a binary, and you have to trust that it isn't doing anything wrong (either due to intentional mis-features or accidental bugs) — but the fact that the published source code can interoperate with the Jungle Disk client code does at least provide us with some information about what Jungle Disk does cryptographically.

There is one thing I like about Jungle Disk's cryptography: They use AES-256 in CTR mode. As I've mentioned here before, I like CTR mode as a building block because it provides cryptographic indistinguishability against chosen-plaintext attacks; because it avoids passing data to an AES core which could leak information via side channel attacks; and because it avoids the need for complicated (and very frequently buggy) block padding.

Unfortunately Jungle Disk missed one absolutely essential requirement in using CTR mode: They didn't include a Message Authentication Code. CTR mode has a property which cryptographers call "malleability", meaning that an attacker can make a change in the ciphertext and cause a predictable change to be made in the plaintext. In a sense this is like writing on a sheet of paper in a darkened room — even without seeing what was already on the page, you can still add more ink.

Data integrity matters. If the people running the underlying storage service (Amazon S3 or Rackspace Cloud Files) know the contents of a file stored via Jungle Disk, they could transform it into anything they want — planting files which are dangerous (e.g., viruses) or even illegal (e.g., child pornography). In the very unlikely case that they don't know the contents of any files you have stored, they could still blindly mangle files and Jungle Disk wouldn't notice that they had been corrupted. Maybe you trust Amazon and Rackspace to not do this; but the whole point of cryptography is that you shouldn't need to trust the storage service — and as was demonstrated a few years ago, sometimes data corruption occurs due to hardware failures.

So much for data integrity. What about the other side of the security coin — privacy? There too Jungle Disk has issues. Because of the lack of message authentication codes on files, Jungle Disk needs some other way to recognize if you have entered the correct password; for all that they don't defend against server-side corruption, they would have a mob of pitchfork-wielding customers at their offices if entering the wrong password resulted in files being mangled. To solve this problem, each file has a "salted key hash" sent along with it. This value consists of two parts: First, a 4-byte salt; and second, the 16-byte value MD5(salt || password).

This is bad. Really really bad. MD5 was designed to be a function which can be computed quickly and cheaply. If you give someone the values salt and MD5(salt || password), all they need to do to check if guessed-password is your password is to compute MD5(salt || guessed-password) — a single MD5 operation which is cheap and fast enough that attackers can easily test trillions of potential passwords.

There is an area of cryptography known as "password-based key derivation functions" which covers exactly this problem — how to convert a password into a key in a way which is as expensive as possible for an attacker to perform a "brute-force" attack (i.e., trying passwords until they get a match) against. The current state of the art is the scrypt function, which I developed in 2009 because I wanted to ensure that Tarsnap's passphrase-encrypted key files were as secure as possible. The scrypt key derivation function is over one hundred billion times more expensive to crack than MD5. Now, I'll give Jungle Disk a pass on not using scrypt — after all, their "salted key hash" was around before I developed scrypt — but they should still have used the standard PBKDF2 key derivation function (or the non-standard but slightly stronger bcrypt), which would still have been over a million times stronger than MD5.

What does this mean for security? Well, it depends on how strong your password is and what resources an attacker is willing to throw at cracking it. My laptop — a mid-range Dell from a couple years ago — can test slightly over 10 million passwords per second using its CPU and standard MD5 routines. Using GPUs and optimized MD5-cracking routines, Mark Bevand has reached 33.1 billion passwords per second with about $3000 of hardware. Attackers who can fabricate application-specific integrated circuits — a group which includes most Three Letter Agencies and many large companies — can do even better: on modern process technologies, roughly 5 billion passwords per second can be tested per dollar of silicon.

The following table shows the estimated time to crack various strengths of passwords on three categories of hardware — what most people have available and idle 99% of the time (a $1k laptop), what most interested amateur codebreakers could put together ($10k of GPUs), and what a government or large company could be expected to construct for political or industrial espionage purposes ($1M of ASICs):

Password strength $1k laptop $10k of GPUs $1M of ASICs 6 lower-case letters (e.g., "sfgroy") 15 seconds < 1 second < 1 second 8 lower-case letters (e.g., "ksuvnwyf") 3 hours 1 second < 1 second 8 ASCII characters (e.g., "6,uh3y[a") 10 years 10 hours 1 second 10 ASCII characters (e.g., "H.*W8Jz/r3") 95000 years 95 years 2 hours 34-character English text (e.g., "You will never guess this password") 2 years 2 hours < 1 second

You might like to look up the password category most similar to your passwords to see how long it would take for them to be cracked; but for reference, a recent study by Microsoft found that most passwords in use online fall somewhere between the 6-letter category and the 8-ASCII-character category in strength.

Now, maybe you don't have any data stored which Joe Cracker would be willing to spend 10 hours decrypting. Maybe you trust Amazon and Rackspace's internal procedures and security measures to ensure that nobody — either breaking in from outside, or working for those companies — will have access to your "encrypted" data. Depending on who you are and what data you have stored (your credit card numbers? bank statements? how about last year's income tax return, complete with your national tax ID number?) you might be justified in such trust. But I would say that this is profoundly missing the point: With good cryptography, you wouldn't need to trust them.

The bottom line: If you use Jungle Disk, Amazon or Rackspace and their employees...

... could mangle your data without Jungle Disk noticing,

... could probably replace your data with their choice of "evil bits",

... and unless you have a very unusually strong password, could read all of your files.

Finally, a personal note: I didn't want to write this blog post. To be quite frank, it's embarrassing to post something like this here only a few months after I fixed a critical security bug in my own backup service . However, I believe strongly in the principles of engineering ethics — foremost among them, the fact that having seen a danger, we have a duty to the public to report it. In the case of civil engineering, this means structures which could collapse and cause injury or death; but I see no reason why software engineers who discover security vulnerabilities should not be held to the same standard.

Based on this principle, upon discovering these issues I wrote to Allan Metts, the Director of Software Development at Rackspace responsible for Jungle Disk (which Rackspace acquired in 2008). His reponse was that "security is not compromised" — a position which I absolutely disagree with — and that these issues will be addressed in the "next major release of our technology, which is currently under development". This is simply not good enough; serious security vulnerabilities such as these should be fixed immediately, not next year.

While I'm willing to work with companies and give them time to fix issues if I think they're making a good faith effort (as I did a few years ago when I discovered a weakness in Amazon's API request signing method) Jungle Disk's response clearly does not fall into this category. As such, the only way I can comply with my ethical obligations is to raise the issue publicly and warn people to avoid using such an insecure product.

And if Jungle Disk's customers complain and the public pressure results in these vulnerabilities being fixed sooner, all the better.

Disqus