The cr.yp.to blog

2017.11.05: Reconstructing ROCA

[Authors of this blog post, in alphabetical order: Daniel J. Bernstein and Tanja Lange.]

Nemec, Sýs, Švenda, Klinec, and Matyáš announced on 16 October that millions of supposedly high-security RSA public keys from a popular Infineon software library ("v1.02.013") were actually vulnerable to low-cost attacks. Specifically, Infineon's 1024-bit keys take "less than 3 CPU-months" to factor "on a single core of a common recent CPU", and half as much time on average. Infineon's 2048-bit keys take less than "100 CPU-years" to factor. On an Intel Haswell running at only 3GHz (similar to Amazon's "c4" cores), the 2048-bit keys take "140.8 CPU-years" to factor.

The news said "To give people time to change keys, the paper describing the factorization method isn't being published until it's presented at the conference." The authors of the paper included in their announcement only limited information about their factorization method.

We decided to see whether we could reconstruct the attack from this limited information, rather than waiting for the paper to be released. We figured out the main ideas within a day. Within a week we sent the authors our own attack software, running even faster than theirs. We certainly weren't working full time on this during the week.

We emphasize that this work was not independent of what the paper authors did, and the fact that we were able to reconstruct the attack so quickly should not be viewed as criticism of the merits of the paper. We were starting from some information released by the authors. What we are saying is that, from a security perspective, this information was in fact the critical information in the paper. Here are some questions to think about:

Presumably, after seeing the 16 October announcement, some users managed to replace their RSA signature keys before the paper was released. Were these users safe against serious attackers on 17 October?

Did delaying the release of the paper for two weeks make users think they were safer than they actually were?

they were safer than they actually were? RSA is used not just for signatures but also for encryption. Once a ciphertext has been sent and recorded by an attacker, replacing the key doesn't protect the ciphertext. What was the impact of the delay on the amount of private user data encrypted under weak keys?

Overall, did delaying the release of the paper help security, or hurt security?

security, or security? The authors notified Infineon at the beginning of February 2017. It seems that Infineon negotiated nine months of delay before the paper was posted. Did this much larger delay help security, or hurt security?

security, or security? ISO maintains a standard for prime generation (ISO/IEC 18032:2005) that isn't strict enough to prohibit what Infineon was doing. ISO wasn't told the attack details, and wasn't asked to fix the standard, until this month. Did this delay help security, or hurt security?

An August 2016 paper by Švenda, Nemec, Sekan, Kvašňovský, Formánek, Komárek, and Matyáš had already pointed out that Infineon keys were visibly non-random; see Table 6 in the paper. Attackers could already have figured out the whole attack from these details. Or attackers could have looked at Infineon keys on their own and found the same information. Our best guess is that serious attackers found the Infineon vulnerability years ago and have been quietly exploiting it since then.

Costs of an attack. We are deeply concerned by the statement "Large-scale vote fraud is not conceivable due to the considerable cost and computing power necessary" regarding the Estonian ID cards used for e-voting. This statement appears to be based on the claim that breaking all 750000 ID cards would cost 60 billion Euros, which in turn is based on the claim that breaking one card would cost 80000 Euros. Actual attack costs are thousands of times lower, for several reasons:

10x improvement: "Large-scale vote fraud" does not require breaking all of the ID cards; 10% already buys massive influence.

"Large-scale vote fraud" does not require breaking of the ID cards; 10% already buys massive influence. More than 4x improvement: It seems that 80000 EUR was the cost of a preliminary algorithm reported by the authors. The authors subsequently reported that a faster algorithm uses 40000 EUR worst-case and, more to the point, 20000 EUR on average. We have already beaten this, and we expect further research attention to produce further improvements.

It seems that 80000 EUR was the cost of a preliminary algorithm reported by the authors. The authors subsequently reported that a faster algorithm uses 40000 EUR worst-case and, more to the point, 20000 EUR on average. We have already beaten this, and we expect further research attention to produce further improvements. More than 10x improvement: 20000 EUR is a cost using rented computer equipment. Serious attackers reduce costs by an order of magnitude by purchasing computer equipment. Concretely, 20000 EUR buys 20 computers, each with 8-core CPUs, completing an average factorization in roughly 4 months; and the cost of a computer is amortized across years of calculations.

20000 EUR is a cost using computer equipment. Serious attackers reduce costs by an order of magnitude by computer equipment. Concretely, 20000 EUR buys 20 computers, each with 8-core CPUs, completing an average factorization in roughly 4 months; and the cost of a computer is amortized across years of calculations. More than 10x improvement: 20000 EUR is a cost using general-purpose CPUs. There is extensive literature studying the cost reductions achievable with other types of chips such as GPUs, FPGAs, and ASICs. Our assessment is that this factorization algorithm is reasonably well suited to take advantage of such chips.

We are happy to see Estonia finally blocking the Infineon keys starting 3 November, but tremendous damage could already have been done.

Certification. Infineon says that its "Fast Prime" function was "certified by the BSI (Federal Office for Information Security) in Germany. No mathematical weaknesses were known, nor have been discovered during the certification processes." The Estonian authorities say that compliance with "security requirements" has been "certified by the competent German and French certification bodies". There are several BSI certifications, ANSSI certifications, and so on for security systems that include Infineon's library v1.02.013: see, e.g., the BSI certification report for Infineon Security Controller M7892 B11 based on tests by TÜV Informationstechnik GmbH.

It seems that the prime-generation algorithm used by Infineon was never subjected to public review. It also seems that this failure was encouraged by the certification process. As stated in the paper:

The certification process counter-intuitively "rewards" the secrecy of design by additional certification "points" when an implementation is difficult for potential attackers to obtain—thus favoring security by obscurity. Relevant certification bodies might want to reconsider such an approach in favor of open implementations and specifications. Secrecy may increase the dificulty of spotting a flaw (above the capability of some attackers) but may also increase the impacts of the flaw due to the later discovery thereof.

Did BSI never ask Infineon to justify its prime-generation algorithm? Or did Infineon supply a justification with flaws that BSI failed to detect? Apparently Hanno Böck asked BSI for comments on this incident and didn't receive an answer. What is the value of certification if such weaknesses are not caught?

The reconstruction process

The rest of this blog post dives into the details of (1) what information was released and (2) our process of reconstructing an attack from this information.

Divisors in residue classes. As part of their 16 October announcement, the paper authors posted a tool for checking whether a key is generated by Infineon's library. We inspected the source code of the tool to see that an Infineon key N has a limited set of remainders modulo various small primes. For example, N is guaranteed to be 1 or 10 modulo 11, and 1 or 10 or 26 modulo 37, whereas properly generated RSA keys are equally distributed across 10 different possibilities modulo 11 and 36 different possibilities modulo 37.

The numbers 1, 10, 26 modulo 37 are exactly the powers of 10 modulo 37. How could Infineon be generating primes p and q so that the product N = pq is a power of 10 modulo 37? The simplest explanation is that each of p and q is generated as a power of 10 modulo 37, forcing the product N = pq to be a power of 10 modulo 37. Similar comments apply when 37 is replaced by various other possibilities.

Inspecting examples of Infineon primes p and q, rather than public keys N, would have immediately shown that this explanation is in fact correct. The 2016 paper mentioned above already stated that the primes were limited in the same way as N. We hadn't seen that paper, and we weren't looking at any secret keys, but we were using Occam's Razor to make reasonable guesses.

This type of information about p and q is important because the literature already explains a polynomial-time algorithm to find a prime divisor of N in a specified arithmetic progression u, u+v, u+2v, u+3v, etc., assuming that gcd{v,N} = 1 and that v ≥ N1/4. For v ≥ N1/2 this is an easy exercise. For v ≥ N1/3 this goes back to a classic 1984 paper "Divisors in residue classes" by Lenstra. For v ≥ N1/4 this is a result of Coppersmith, Howgrave-Graham, and Nagaraj, presented in Howgrave-Graham's thesis in 1998 and in a paper "Divisors in residue classes, constructively". The N1/4 is a soft cutoff: one can go down to N1/4/poly(N), although the time goes up correspondingly.

This isn't a complete discussion of the history. The underlying algorithmic idea was developed through a series of papers by many people aiming at various applications; see the survey paper "Reducing lattice bases to find small-height values of univariate polynomials" for more information. Perhaps the idea should be called "extended linearization" ("XL"), recognizing its similarity to a technique introduced by Lazard in 1979 for solving equations in many variables over a field; the name XL for this technique goes back to a 2000 paper by Courtois, Klimov, Patarin, and Shamir. But this name isn't standard outside the multivariate context. People instead typically say that they're using "Coppersmith's method", even if they're actually using, e.g., Lenstra's algorithm.

Internally, these algorithms perform lattice-basis reduction. There are various parameters such as the dimension of the lattice and the largest power of N that is used in creating the lattice. Larger lattice dimensions use more time for reduction but handle smaller values of v.

Nemec, Sýs, Švenda, Klinec, and Matyáš had announced their paper title as "The Return of Coppersmith's Attack: Practical Factorization of Widely Used RSA Moduli". This sounded consistent with the idea that the primes are in fact known or guessed modulo some v.

Random powers modulo primes. It's normal to generate large random primes by generating large random integers and testing them for primality. It's also normal to double the success chance per integer, saving half of the randomness, by generating large odd random integers and testing them for primality. One can go further in this direction, for example saving 77% of the original randomness as follows:

Decide that p mod 2 is 1.

Choose randomly what p mod 3 will be, either 1 or 2.

Choose randomly what p mod 5 will be, either 1 or 2 or 3 or 4.

Choose randomly what p mod 7 will be, either 1 or 2 or 3 or 4 or 5 or 6.

Reconstruct p mod 210 by the "Chinese Remainder Theorem" ("CRT"). There are 2*4*6 = 48 different possibilities at this point. These are exactly the 48 integers between 0 and 210 coprime to 210, i.e., not divisible by 2 or 3 or 5 or 7.

Add a large random multiple of 210 to obtain p.

Check whether p is in fact prime.

One can replace 2*3*5*7 by an even larger product of primes. Beware that at some point this breaks down: for example, if the goal is to generate a 1024-bit prime, then one can't use the product 2*3*5*7*...*997 of all primes below 1000. (This product is nearly 21380, so p mod the product is almost certainly above 21300.)

There is an extensive literature on generating sequences of random numbers. Lehmer in 1949 suggested generating a sequence of 8-digit random numbers by multiplying an initial number repeatedly by 23 modulo 108+1. What would happen if Infineon generated a sequence of possibilities for p mod 11 as, e.g., powers of 2 mod 11? This isn't inherently a problem: 2 is a generator mod 11, meaning that 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 are powers of 2 mod 11.

What would happen if Infineon instead generated a sequence of possibilities for p mod 11 as powers of 10 mod 11? Suddenly there would be only two possibilities: the only powers of 10 mod 11 are 1 and 10, matching what Infineon seemed to be doing. This could even be a deliberate weakness with plausible deniability: "Oh, sorry, nobody warned us that 2 mod 11 was okay and 10 mod 11 wasn't."

We wrote down details of our observations so far, and tweeted a hash of the details:

Had fun reverse engineering https://github.com/crocs-muni/roca/blob/master/roca/detect.py w/ @hashbreaker SHA256: 01463fbab8a8f9e345cd3f2201556a26d2f81b03cf2b8760643148b9a01255a6

Random powers modulo products of primes. The next day we sent email to the paper authors with the details and also with a simpler guess for what Infineon was doing. The simpler guess was that, instead of computing separate powers modulo 3, 5, 7, ... Infineon was computing a power of a single number g modulo the product L = 2*3*5*7*... This would save Infineon the CRT effort, and would have the same basic effect of guaranteeing that the resulting integers are coprime to L, i.e., not divisible by any of the primes dividing L. This would also force all output keys to be 1 or 10 modulo 11 if g happened to be 10 modulo 11.

This guess would also mean that Infineon was limiting the set of keys it could generate. Specifically, a standard number-theory exercise says that, if L is divisible by two different odd primes, then it's impossible to find an integer g such that all integers coprime to L are powers of g. For example, no matter what g you choose, there are at most 12 different powers of g modulo 210; recall that CRT was producing 48 different possibilities.

A related algorithm "GenPrime" was published by Joye and Paillier in 2006. The Joye–Paillier generator, like Lehmer's generator, starts from a new random number r coprime to L and then multiplies repeatedly by a constant g modulo L, obtaining r times a power of g. This can produce any number modulo L, and produces only a slight bias in the resulting primes. The guess was that Infineon was oversimplifying and generating merely a power of g; this produces far fewer numbers modulo L.

The authors wrote back promptly, confirming that this guess was correct but not revealing more details.

First attack. We were busy with other things but came back to this reverse-engineering on 20 October. We downloaded a pile of PGP keys and extracted the 2048-bit keys that survived all the tests announced by the authors, presumably Infineon-generated keys. We tried these keys modulo further small primes, finding, for example, that N mod 331 was always 1 or 330. We then tried these keys modulo products of small primes, for example finding that N mod 2*11*37*97*331 was always 1, 65537, 4878941, 8942297, 14367385, or 24016035. These are exactly the powers of 65537 modulo 2*11*37*97*331. For comparison, there were 2 possibilities for N mod 11, 3 for N mod 37, 6 for N mod 97, and 2 for N mod 331, so if these possibilities had been independent then there would have been 72 possibilities for N mod 2*11*37*97*331.

Further computations were consistent with the guess that p and q were being generated as powers of 65537 modulo L, where L was either the product of all primes through 691, or the product of all primes through 701.

A simple attack strategy at this point is to try all powers of 65537 modulo L, and apply one of the "divisors in residue classes" algorithms to each possibility. L is beyond the N1/4 cutoff mentioned above, even beyond the N1/3 cutoff: the two possibilities for L are roughly 2960 and 2970, whereas N1/4 is roughly 2512.

This comparison of sizes shows that knowing p mod L is much more information than necessary. One can thus replace L by something smaller. If p is a power of 65537 modulo L then it is also a power of 65537 modulo v for any divisor v of L, and the "divisors in residue classes" algorithms efficiently find p from this power if v is above N1/4.

The number of powers to try is the order of 65537 modulo v, and one can reduce v to save time in trying powers. (The analysis of orders is similar to the analysis of a standard factorization method, the "p-1 method".) On the other hand, as v drops down towards N1/4, the required lattice dimension increases steeply. We presumed that L includes 701, checked the orders and basis-reduction time for some possibilities for v, and then took v as the product of all primes r through 701 such that the order of 65537 modulo r divides 27*33*52*7*11*13*17*19*23. We wrote an attack script using the Sage computer-algebra system, and sent the script and a description by email to the authors. We tweeted a hash of this email:

More exploration with @hyperelliptic has now produced working attack code: 3f5ba89d705a1059683c4c406dcda87f8af73f37cf0202cc74b875fcc28b3cb6

The authors confirmed that our attack script worked on real Infineon keys. They also confirmed that L did in fact include 701. We were already assuming 701 in our script, and we would have already been sure about it if we had seen some examples of Infineon secret keys (for example, by putting some computer time into factoring some public keys). Anyway, covering both possibilities in the script wouldn't have been much extra effort.

Second attack. Finally, we put some effort into speeding up our script, stopping when we had something faster than what the paper authors had announced. We sent the details by email to the authors and tweeted a hash:

Yup. Our 2048bit attack using @sagemath is now 5-25% faster than ROCA blog. 3fd6a53a3b6362248ac10de4a8108df3c839a7193a96d0991c6675990599d917

The "Yup" here was in response to a tweet by Graham Steel regarding our first tweet:

I guess that was inevitable... will they have a faster version of the attack before the paper is even released?

The biggest speedup from our first version to our second was "chaining". The authors also helpfully pointed out to us that the 2016 paper had observed a narrower range of p and q for Infineon than what we had been assuming; this provided an additional speedup. As noted above, we see many possibilities for further speedups beyond what we implemented.

Version: This is version 2017.11.05 of the 20171105-infineon.html web page.