When it comes to long phrases used to defeat recent advances in password cracking, bigger isn't necessarily better, particularly when the phrases adhere to grammatical rules.

A team of Ph.D. and grad students at Carnegie Mellon University and the Massachusetts Institute of Technology have developed an algorithm that targets passcodes with a minimum number of 16 characters and built it into the freely available John the Ripper cracking program. The result: it was much more efficient at cracking passphrases such as "abiggerbetter password" or "thecommunistfairy" because they followed commonly used grammatical rules—in this case, ordering parts of speech in the sequence "determiner, adjective, noun." When tested against 1,434 passwords containing 16 or more characters, the grammar-aware cracker surpassed other state-of-the-art password crackers when the passcodes had grammatical structures, with 10 percent of the dataset cracked exclusively by the team’s algorithm.

The approach is significant because it comes as security experts are revising password policies to combat the growing sophistication of modern cracking techniques which make the average password weaker than ever before. A key strategy in making passwords more resilient is to use phrases that result in longer passcodes. Still, passphrases must remain memorable to the end user, so people often pick phrases or sentences. It turns out that grammatical structures dramatically narrow the possible combinations and sequences of words crackers must guess. One surprising outcome of the research is that the passphrase "Th3r3 can only b3 #1!" (with spaces removed) is one order of magnitude weaker than "Hammered asinine requirements" even though it contains more words. Better still is "My passw0rd is $uper str0ng!" because it requires significantly more tries to correctly guess.

"Underlying structures and not just the number of characters or words determine the strength of a passphrase," the researchers wrote in a research paper titled Effect of Grammar on Security of Long Passwords, which is scheduled to be presented at next month's Conference on Data and Application Security and Privacy. "Passphrase policies that do not consider this may unwittingly allow passphrases such as 'Th3r3 can only be #1!' and 'My passw0rd is $uper str0ng!' that differ in strength by three orders of magnitude."

Decreasing the search space

The scientists' novel cracking attack draws from phrase collections such as the Brown Corpus, which contains about 500 samples of English-language text, totaling about 1.1 million words. The researchers tagged the parts of speech contained in the phrases and observed the most common sequences, such as "determiner, adjective, noun" and "determiner, adjective, adjective, noun." By ordering the corpus of words included in their guesses to fit the most common sequences, crackers can vastly reduce the size of their "search space," an advance that in turn reduces the work required to find the correct phrase.

"When password values have underlying grammatical structures, it is important to understand the role of these structures in decreasing the guessing effort," the researchers wrote. "Guessing effort can be defined as the number of values an attacker has to enumerate to guess a password. Guessing effort is a function of (a) size of the password search space, which is the set of all possible unique password values and (b) distribution of password values, which depends on how users choose password values from the password search space."

The researchers—Carnegie Mellon software engineering Ph.D. student Ashwini Rao, MIT Ph.D. student Birendra Jha, and CMU graduate student Gananand Kini—wrote elsewhere:

If users are using certain rules more often than the others, an attacker can use this information to reduce her guessing effort. For example, if the password set contains only the tag-rule "Adjective Noun" then the attacker need not enumerate other tag-rules. Specifically, if the users are choosing weaker tag-rules more often than the stronger tag-rules, reduction in guessing effort can be higher.

There are other ways that grammatical structures help reduce search space. There are far fewer pronouns in English than verbs, fewer verbs than adjectives, and fewer adjectives than nouns. That means a password composed of “pronoun-verb-adjective-noun,” such as "Shehave3cats" is inherently easier to crack than a "noun-verb-adjective-noun" passphrase such as "Andyhave3cats". A password that incorporates more nouns would be even more secure.

Interestingly, the experiments conducted showed that John the Ripper and another freely available cracking program called Hashcat don't provide native support for combining large numbers of words contained in dictionaries of words and previously leaked passwords. However, it's possible to write rules to get around these limitations, and as passphrases become more widely used, it wouldn't be surprising to see the developers of these programs update them to support such techniques.

It's also important to remember that the cracking attacks used in the research work best against passwords that are hashed using cryptographic algorithms that are fast and computationally undemanding, such as SHA1 and MD5.

"So, yes, there are and will be smarter methods to crack passphrases, but in absolute terms their efficiency is relevant in a (large) subset of cases only," Alexander Peslyak, the principal developer behind John the Ripper wrote in an e-mail to Ars.

As Ars has repeatedly counseled, slower algorithms such as bcrypt, PBKDF2, or SHA512crypt are crucial for adequate password security. So while it may be wise to one day adapt password policies to account for grammatical rules, security professionals would do better to focus on their password storage regimen first.