If you've ever been nagged about the weakness of your password while changing account credentials on Google, Facebook, or any number of other sites, you may have wondered: do these things actually make people choose stronger passcodes? A team of scientists has concluded that the meters do work—or at least they have the potential to do so, assuming they're set up correctly.

The researchers—from the University of California at Berkeley, the University of British Columbia in Vancouver, and Microsoft—are among the first to test the effect that the ubiquitous password meters have on real users choosing passwords. They found that meters grading the strength of passwords had a measurable impact in helping users pick stronger passcodes that weren't used on other accounts. But the group also discovered these new, stronger passwords weren't any harder for users to remember than weaker ones.

The scientists were quick to point out caveats to their findings. For one, the meters provided little benefit when users were choosing passwords while setting up a new account, as opposed to changing passwords for an already established account. And the meters provided no improvement for accounts people considered unimportant.

"Within that context they're much more likely to just enter a password that they already used elsewhere because they either don't care about those accounts or that's just normally what they do when they enroll in a new account," Serge Egelman, a research scientist at UC Berkeley and the lead author of the paper, told Ars. "Whereas we show that in a different context—when changing passwords for high-value accounts—then the meters actually do have an observable effect on behavior in that people do choose stronger passwords. And ironically that's the context where we're least likely to see real meters in real life."

The researchers' paper—titled Does My Password Go up to Eleven? The impact of Password Meters on Password Selection—is important because it provides useful guidance to both end users and the security professionals who work to protect them. While more and more sites now offer these meters, Egelman said a surprising number of online banking services and corporate intranets don't yet offer them. Remarkably, neither Microsoft Windows nor Apple's OS X for Macs uses meters for users who are choosing or changing account passwords.

The findings come from an experiment in which affiliates of the University of British Columbia were brought to a laboratory and asked to test the usability of a portal that students, faculty, and staff use to access e-mail, view grades, and check out library books. As soon as they successfully logged into their account, they were presented with a notice requiring them to change their password. While the plaintext was never recorded, the laboratory computer did store a cryptographic hash of the passwords. It also measured other characteristics of both the old and new passwords, including the length and whether they used upper- and lower-case letters, numbers, and special characters. Some of the subjects were presented with one of two types of password meters that rated the strength of the new password, while a control group saw no meter at all.

The password meters presented to the test subjects used "zero-order entropy," a technique many meters use to measure password strength. One set of "existing motivator" meters used the measures to rate passwords as "weak," "medium," or "strong." A second set of "peer-pressure motivator" meters used the same data to present the strength of the new password relative to all the users of the system.

In turns out that the subjects who were presented with either type of meter picked significantly "stronger" passwords than those in the control group. The average zero-order entropy of passwords chosen with guidance from the existing motivator meter increased to 60.8 and the entropy of passwords chosen with the peer-pressure motivator grew to 64.9 bits. This means the total number of combinations required to brute-force crack the passwords would be 260.8 and 264.9 respectively. Subjects who saw no meter at all chose passwords that on average were 49.3 bits strong, about the same as the old passwords from all three groups.

"Overall, we observed that both password meters yielded statistically significant differences when compared to the control condition," the researchers reported in the paper. (The findings were recently presented at the CM SIGCHI Conference on Human Factors in Computing Systems in Paris.)

In addition to increasing entropy metrics, the researchers found other indications of improved strength. Passwords generated with the help of meters increased from a median of 9.0 to 10.0 characters, included more special characters, and contained more lower-case letters (from a median of 6.0 to 7.0).

"Thus, the meters motivated participants to create longer passwords through the inclusion of symbols and additional lower-case letters," the researchers said.

The subjects were invited back to the laboratory two weeks later and another encouraging finding came up. Those who had chosen stronger passwords with the help of the meter had no more trouble remembering their new passcodes than those who had chosen weaker passwords without using a meter. What's more, those with stronger passwords were no more likely to have reverted back to their old one than those who had chosen weaker passwords.

Building a better mousetrap

It's encouraging to know that password meters have a measurable effect on the passwords chosen by end users. But sadly there's no guarantee meters will actually help people choose passcodes that are more resistant to real-world cracking techniques. That's because the widely used zero-order entropy rating system is a poor metric for measuring the strength of passwords. The strength of the passcodes "Pa$$word1" and "$ecretPa$$word1" (minus the quotes) is 59.1bits and 98.5bits respectively. That's much higher than many passwords offer. What the scoring system fails to account for is that both passwords are so widely used that they're inevitably included in wordlists used in cracking attacks. These are among the first passwords to fall in typical cracking attacks. By contrast, the password "lkx8q2pe0" is considerably stronger because it would require time-consuming brute-force techniques to crack it, and yet it offers just 46.5 bits. (Bits are calculated by x * log_2(y), where x is the number of characters in a passcode and y is the number of available letters, numbers, or special characters).

What this means is that password meters have the ability to help end users choose more crack-resistant passcodes only if the meters are set up correctly. As Ars documented last week, a password advice site from Intel can't be trusted to help users pick passcodes because the methodology it uses is hopelessly flawed. The password meters used in the study and offered on many sites suffer from the same type of weakness, but there's no reason they can't be drastically improved—for instance, by banning the one million most commonly used words.

Egelman said there's no evidence to suggest improved meters wouldn't generate the same measurable effect in guiding people's choice of passwords.

"They don't know what algorithm we're using to drive the meter," he said. "They just know that they do some behavior, they get some feedback, and they keep trying until [they get] feedback they're happy with. I suspect that if we changed what the feedback is based on we would still have the impact on them."