Hashing it out

Like many password breaches, almost none of the 1.3 million Gawker credentials exposed in December 2010 contained human-readable passcodes. Instead, they had been converted into what are known as "hash values" by passing them through a one-way cryptographic function that creates a unique sequence of characters for each plaintext input. When passed through the MD5 algorithm, for instance, the string "password" (minus the quotes) translates into "5f4dcc3b5aa765d61d8327deb882cf99".

Even minor changes to the plaintext input—say, "password1" or "Password"—result in vastly different hash values ("7c6a180b36896a0a8c02787eeafb0e4c" and "dc647eb65e6711e155375218212b3964" respectively). When processed by the SHA1 algorithm, the inputs "password", "password1", and "Password" result in "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8", "e38ad214943daad1d64c102faec29de4afe9da3d", and "8be3c943b1609fffbfc51aad666d0a04adf83c9d" respectively.

In theory, once a string has been converted into a hash value, it's impossible to revert it to plaintext using cryptographic means. Password cracking, then, is the practice of running plaintext guesses through the same cryptographic function used to generate a compromised hash. When the two hash values match, the password has been identified.

The RockYou dump was a watershed moment, but it turned out to be only the start of what's become a much larger cracking phenomenon. By putting 14 million of the most common passwords into the public domain, it allowed people attacking cryptographically protected password leaks to almost instantaneously crack the weakest passwords. That made it possible to devote more resources to cracking the stronger ones.

Within days of the Gawker breach, for instance, a large percentage of the password hashes had been converted to plaintext, a feat that gave crackers an even larger corpus of real-world passwords to inform future attacks. That collective body of passwords has only snowballed since then, and it grows ever larger with each passing breach. Just six days after the leak of 6.5 million LinkedIn password hashes in June, more than 90 percent of them were cracked. In the past year alone, Redman said, more than 100 million passwords have been published online, either in plaintext or in ciphertext that can be readily cracked.

"Now, it's like once a quarter you get another RockYou," Redman said.

We will RockYou

In the RockYou aftermath, everything changed. Gone were word lists compiled from Webster's and other dictionaries that were then modified in hopes of mimicking the words people actually used to access their e-mail and other online services. In their place went a single collection of letters, numbers, and symbols—including everything from pet names to cartoon characters—that would seed future password attacks.

"So it's no longer this theoretical word list of Klingon planets and stuff like that," Redman said of the RockYou list. "It's literally 'dragon' and 'princess' and stuff like that, and [the list] may crack 60 percent of a newly compromised website. Now you have 60 percent of the work done and you haven't done any thinking at all. You've just used your previous knowledge."

Almost as important as the precise words used to access millions of online accounts, the RockYou breach revealed the strategic thinking people often employed when they chose a passcode. For most people, the goal was to make the password both easy to remember and hard for others to guess. Not surprisingly, the RockYou list confirmed that nearly all capital letters come at the beginning of a password; almost all numbers and punctuation show up at the end. It also revealed a strong tendency to use first names followed by years, such as Julia1984 or Christopher1965.

Password assault figures 6.5 Average number of passwords for a Web user (despite maintaining an average of 25 separate accounts). 100 million-plus Number of passwords published online in the past year. 47 Years since the first believed password-database leak in 1965. 8.2 billion Average passwords combinations per second able to be tried by a PC running a single AMD Radeon HD7970 GPU. 3,108 terabytes Disk space needed to store a table of every possible 10-character password with lowercase letters, along with its corresponding MD5 hash. 167 gigabytes Space needed to store a rainbow table expressing 99.9 percent of the combinations above.



"Sup3rThinkers" wasn't included in the list of RockYou passwords, making it part of the 40 percent of hashes that require Redman to apply cracking techniques that go beyond a simple word-list attack. Fortunately for him, the RockYou corpus included both "sup3r" and "thinkers" as separate passwords. That allowed him to recover the password in question by appending each word in his list to every other word in the list. The technique is simple enough to do, although it increases the number of required guesses dramatically—from about 26 million, assuming the dictionary Redman uses most often, to about 676 trillion.

Other complex passwords require similar manipulations to be cracked. The RockYou list, and the hundred-millions-plus passwords that have collectively been exposed in its aftermath, brought to light a plethora of other techniques people employ to protect simple passcodes from traditional dictionary attacks. One is adding numbers or non-alphanumeric characters such as "!!!" to them, usually at the end, but sometimes at the beginning. Another, known as "mangling," transforms words such as "super" or "princess" into "sup34" and "prince$$." Still others append a mirror image of the chosen word, so "book" becomes "bookkoob" and "password" becomes "passworddrowssap."

Passwords such as "mustacheehcatsum" (that's "mustache" spelled forward and then backward) may give the appearance of strong security, but they're easily cracked by isolating their patterns, then writing rules that augment the words contained in the RockYou dump and similar lists. For Redman to crack "Sup3rThinkers", he employed rules that directed his software to try not just "super" but also "Super", "sup3r", "Sup3r", "super!!!" and similar modifications. It then tried each of those words in combination with "thinkers", "Thinkers", "think3rs", and "Think3rs".

Such cracking techniques have existed for a decade, but they work far better now that the crackers possess a more intimate understanding of the ways people choose passwords.

"It's vastly different than it was [before] because of these massive password lists," said Rob Graham, CEO of penetration testing firm Errata Security. "We never had a really large password list to work from. Now that we do, we're learning how to remove the entropy from them. The state of the art of cracking is much more subtle in that before we were guessing in the dark."

A little finesse

That subtlety takes all sorts of forms. One promising technique is to use programs such as the open-source Passpal to reduce cracking time by identifying patterns exhibited in a statistically significant percentage of intercepted passwords. For example, as noted above, many website users have a propensity to append years to proper names, words, or other strings of text that contain a single capital letter at the beginning. Using brute-force techniques to crack the password Julia1984 would require 629 possible combinations, a "keyspace" that's calculated by the number of possible letters (52) plus the number of numbers (10) and raising the sum to the power of nine (which in this example is the maximum number of password characters a cracker is targeting). Using an AMD Radeon HD7970, it would still take about 19 days to cycle through all the possibilities.

Using features built into password-cracking apps such as Hashcat and Extreme GPU Bruteforcer, the same password can be recovered in about 90 seconds by performing what's known as a mask attack. It works by intelligently reducing the keyspace to only those guesses likely to match a given pattern. Rather than trying aaaaa0000, ZZZZZ9999, and every possible combination in between, it tries a lower- or upper-case letter only for the first character, and tries only lower-case characters for the next four characters. It then appends all possible four-digit numbers to the end. The result is a drastically reduced keyspace of about 237.6 billion, or 52 * 26 * 26 * 26 * 26 * 10 * 10 * 10 * 10.

An even more powerful technique is a hybrid attack. It combines a word list, like the one used by Redman, with rules to greatly expand the number of passwords those lists can crack. Rather than brute-forcing the five letters in Julia1984, hackers simply compile a list of first names for every single Facebook user and add them to a medium-sized dictionary of, say, 100 million words. While the attack requires more combinations than the mask attack above—specifically about 1 trillion (100 million * 104) possible strings—it's still a manageable number that takes only about two minutes using the same AMD 7970 card. The payoff, however, is more than worth the additional effort, since it will quickly crack Christopher2000, thomas1964, and scores of others.

"The hybrid is my favorite attack," said Atom, the pseudonymous developer of Hashcat, whose team won this year's Crack Me if You Can contest at Defcon. "It's the most efficient. If I get a new hash list, let's say 500,000 hashes, I can crack 50 percent just with hybrid."

With half the passwords in a given breach recovered, cracking experts like Atom can use Passpal and other programs to isolate patterns that are unique to the website from which they came. They then write new rules to crack the remaining unknown passwords. More often than not, however, no amount of sophistication and high-end hardware is enough to quickly crack some hashes exposed in a server breach. To ensure they keep up with changing password choices, crackers will regularly brute-force crack some percentage of the unknown passwords, even when they contain as many as nine or more characters.

"It's very expensive, but you do it to improve your model and keep up with passwords people are choosing," said Moxie Marlinspike, another cracking expert. "Then, given that knowledge, you can go back and build rules and word lists to effectively crack lists without having to brute force all of them. When you feed your successes back into your process, you just keep learning more and more and more and it does snowball."