An astonishing four out of every 1,000 public keys protecting webmail, online banking, and other sensitive online services provide no cryptographic security, a team of mathematicians has found. The research is the latest to reveal limitations in the tech used by more than a million Internet sites to prevent eavesdropping.

The finding, reported in a paper (PDF) submitted to a cryptography conference in August, is based on the analysis of some 7.1 million 1024-bit RSA keys published online. By subjecting what's known as the "modulus" of each public key to an algorithm first postulated more than 2,000 years ago by the Greek mathematician Euclid, the researchers looked for underlying factors that were used more than once. Almost 27,000 of the keys they examined were cryptographically worthless because one of the factors used to generate them was used by at least one other key.

"The fact is, if these numbers had the entropy that they were supposed to have, the probability of even one of these events happening in 7 million public keys would be vanishingly small," James P. Hughes, an independent cryptographer who participated in the research, told Ars. "We thought that was rather startling."

Following the publication of the paper, and reporting for this article, a separate group of researchers announced a similar finding, but they went on to say that only one of the weak public keys they analyzed was signed by a certificate authority trusted by major browsers. The remainder of the keys were used to secure routers and other embedded devices. More about this second report has been added to the end of this article.

With its discovery in the mid-1970s by Ronald Rivest, Adi Shamir, and Leonard Adleman, RSA cryptography revolutionized secure messaging because it was among the first systems that made it possible for the key needed to decode ciphertext to be held only by the person receiving the private message. RSA is one of the public key cryptographic algorithms used to generate SSL certificates, which are used to encrypt visits to particular websites. For the system to work, however, the underlying RSA modulus must be the product of two very large prime numbers that are unique to each key.

The revelation that such a large proportion of public keys were generated with a prime factor shared by one or more other keys means that such keys are trivial to break by anyone who can identify them. What's more, the percentage of keys known to be generated with non-unique factors is likely to grow as more keys are analyzed. The 0.38 percentage rate of faulty keys found when the researchers looked at 7.1 million total keys compares with a 0.26 percent rate in an earlier analysis that considered only 4.7 million RSA moduli. As a result, the true number of keys that could be broken using the technique may be higher than the current research reveals.

A judgment call

The researchers, led by Dutch mathematician Arjen Lenstra of École Polytechnique Fédérale de Lausanne in Switzerland, said they are releasing their findings ahead of August's conference because they want to alert users of public key cryptography to the presence of so many weak moduli. While it took the team three years to complete the study, they believe it will take peers only a matter of weeks to follow their recipe. Their discovery raised concerns about how to responsibly disclose it without making it easy for others to forge tens of thousands of keys.

"The quagmire of vulnerabilities that we waded into makes it infeasible to properly inform everyone involved, though we made a best effort to inform the larger parties and contacted all e-mail addresses recommended (such as ssl-survey@eff.org) or specified in valid affected certificates," they wrote. "Our decision to make our findings public, despite our inability to directly notify everyone involved, was a judgment call."

They compared their findings to revelations from 2008 that hundreds of thousands, possibly millions, of cryptographic keys generated on systems running Debian Linux were so predictable that an attacker could guess them in a matter of hours.

The Electronic Frontier Foundation's SSL Observatory, which queries every IP address on the Internet for underlying public secure sockets layer certificates, supplied some of the data used in the research. Project leaders don't plan to publish that data until they've had more time to contact parties with weak keys.

"We're currently working around the clock to get notifications to all of the parties that are affected by this," said Peter Eckersley, EFF's technology projects director.

The researchers, however, haven't ruled out the possibility that the large body of weak keys are already known, possibly to nation states or other well organized groups.

"The lack of sophistication of our methods and findings make it hard for us to believe that what we have presented is new, in particular to agencies and parties that are known for their curiosity in such matters," they wrote.

Simplistic or not, the exact method for identifying the keys generated with non-unique factors isn't included in the research paper; neither is the list of affected certificates or key holders. Hughes, who is an independent cryptographer in Palo Alto, California, said the team found a computationally efficient way to flag the keys without having to individually compare each one against all the others.

None of the weak keys they uncovered are used by certificate authorities to sign SSL credentials used by website operators to encrypt traffic and prove that their servers are authentic. That's good news. If a so-called "signing key" was weak, all the certificates it signed would also be trivial to forge.

Crypto problems

The research is the latest to show the limitations of cryptographic systems that websites use to secure communications. In September, researchers unveiled an attack that silently decoded encrypted traffic as it passed between SSL-protected websites and a Web browser. Over the past few years, the much more standard way of defeating SSL has been to compromise one of the 600 or so entities authorized to mint certificates that are trusted by Firefox and other standard browsers. Given the success and ease of that method, the techniques laid out in the research paper would likely not be an attacker's first choice of exploitation.

It remains unclear exactly what is causing large clusters of keys to use duplicated factors. Hughes said that when generation is done correctly for a 1024-bit key, it should theoretically require the generation of 2200 certificates before all possible factors are exhausted. Curiously, the problem of duplicate factors also marred 2048-bit keys, even though they should theoretically provide much more entropy. The researchers searched for similarities among the vulnerable keys for clues about what was causing random number generators to fail during the key generator process, but they were unable to make any determination.

"Our only conclusion is that there is not just one cause for all of these problems," Hughes said. "This leads to our conclusion that unless you can totally trust your random number generator, RSA is not a good algorithm to choose."

He said that other formulas such as Diffie-Hellman and DSA aren't as vulnerable because the duplication of a factor makes a key holder vulnerable only to the person who holds the corresponding certificate. "If you have a collision, you only affect one other person. You can hurt them and they can hurt you, but you haven't made it public to everybody and their mother."

Update

Eric Wustrow, a second-year graduate student in the University of Michigan's Electrical Engineering and Computer Science department, told Ars that research he and other colleagues conducted used a different data set but arrived at similar aggregate findings. However, he said all but one weak key encountered were self-signed. This separate group of researchers decided to publish a blog post summarizing the results out of concern internet users may misinterpret some of the reports about the earlier paper.

"As we say in the title don't panic," Wustrow said. "Not everything over SSL is broken."

He added that the finding that most of weak keys they found were used to protect routers and similar gear suggests the underlying cause may stem from vendors.

"Embedded devices have a history of problems in generating entropy for keys," he said. "We're seeing the same embedded devices from the same manufacturer generating the same primes."

Meanwhile, Hughes, one of the co-writers of the original paper, says he remains convinced that the weak keys represent a threat to people using webmail and e-commerce.

"I hate to say it but this does have implications for web-based commerce because people can mount man-in-the-middle attacks," he said. "People know, for instance, there have been man-in-the-middle attacks mounted against websites by foreign countries. Embedded systems matter to e-commerce because they're the infrastructure that goes between you and the site you're trying to go to."