Early last year, password security researcher Kevin Young was hitting a brick wall. Over the previous few weeks, he made steady progress decoding cryptographically protected password data leaked from the then-recent hack of intelligence firm Stratfor. But with about 60 percent of the more than 860,000 password hashes cracked, his attempts to decipher the remaining 40 percent were failing.

The so-called dictionary attacks he mounted using lists of more than 20 million passwords culled from previous website hacks had worked well. Augmented with programming rules that substituted letters for numbers or combined two or more words in his lists, his attacks revealed Stratfor passwords such as "pinkyandthebrain," "pithecanthropus," and "moonlightshadow." Brute-force techniques trying every possible combination of letters, numbers, and special characters had also succeeded at cracking all passwords of eight or fewer characters. So the remaining 344,000 passwords, Young concluded, must be longer words or phrases few crackers had seen before.

"I was starting to run out of word lists," he recalled. "I was at a loss for words—literally."

He cracked the first 60 percent of the list using the freely available Hashcat and John the Ripper password-cracking programs, which ran the guesses through the same MD5 algorithm Stratfor and many other sites used to generate the one-way hashes. When the output of a guessed word matched one of the leaked Stratfor hashes, Young would have successfully cracked another password. (Security professionals call the technique an "offline" attack because guesses are never entered directly into a webpage.) Now, with his arsenal of dictionaries exhausted and the exponential increase in the time it would take to brute force passwords greater than eight characters, Young was at a dead end. In the passwords arms race, he was losing. Young knew he needed to compile new lists of words he never tried before. The question was where to find the words.

A free cracking dictionary anyone can compile

Young joined forces with fellow security researcher Josh Dustin, and the cracking duo quickly settled on trying longer strings of words found online. They started small. They took a single article from USA Today, isolated select phrases, and inputted them into their password crackers. Within a few weeks, they expanded their sources to include the entire contents of Wikipedia and the first 15,000 works of Project Gutenberg, which bills itself as the largest single collection of free electronic books. Almost immediately, hashes from Stratfor and other leaks that remained uncracked for months fell. One such password was "crotalus atrox." That's the scientific name for the western diamondback rattlesnake, and it ended up in their word list courtesy of this Wikipedia article. The success was something of an epiphany for Young and Dustin.

"Rather than try a brute force that makes sense to a computer but not to people, let's use human beings because people typically make these long passwords based on things that humans use," Dustin remembered thinking. "I basically utilized the person who wrote the article on Wikipedia to put words together for us."

Almost immediately, a flood of once-stubborn passwords revealed themselves. They included: "Am i ever gonna see your face again?" (36 characters), "in the beginning was the word" (29 characters), "from genesis to revelations" (26), "I cant remember anything" (24), "thereisnofatebutwhatwemake" (26), "givemelibertyorgivemedeath" (26), and "eastofthesunwestofthemoon" (25).