A New Provable Factoring Algorithm



Factoring, factoring, the whole day through, keeps on my mind—apologies to Hoagy Carmichael and Stuart Gorrell



Waterloo Mathematics source



Michael Rubinstein is an expert on number theory, who is on the faculty of the University of Waterloo. He is one of the organizers of a 61 -birthday symposium being held December 15–19 in Princeton for my friend and former colleague, Peter Sarnak. I guess it is a matter of taste for a number theorist whether to observe a birthday with a lot of factors (60) or a prime (59 or 61). Rubinstein also does extensive experimental mathematics and lists several code libraries below his publications on his website, which also has interesting articles on the math, history, and practice of musical tuning.

Today Ken and I wish to discuss a paper of his on one of my favorite problems: integer factoring.



The paper appears in the 2013 volume of the electronic journal INTEGERS, one of whose sponsors is the University of West Georgia. It is titled, “The distribution of solutions to with an application to factoring integers.” He studies the structure of solutions to (mod a), and uses this to prove the following result:

Theorem 1 For any , there is a deterministic factoring algorithm that runs in time.

The “ ” hides sub-linear terms including coming from the divisor bound, logarithmic terms from the overhead in nearly-linear time integer multiplication, and related sources.

The Factoring Divide

Factoring algorithms partition into two types: unprovable and provable. The unprovable algorithm usually use randomness and/or rely for their correctness on unproved hypotheses, yet are observed to be the fastest in practice for numbers of substantial size. The provable algorithms are usually deterministic, but their key feature is that their correctness is unconditional.

For those trying to factor numbers, to break codes for example, they use the fastest unprovable algorithms such as the general number field sieve (GNFS). The cool part of factoring is that one can always check the result of any algorithm quickly, so anytime an unprovable algorithm fails, it fails in a visible manner.

Why care then about slower algorithms that are provable? Indeed. The answer is that we would like to know the best provable algorithm for every problem, and that includes factoring. We are also interested in these algorithms because they often use clever tricks that might be useable elsewhere in computational number theory. But for factoring there is a special reason that is sometimes hidden by the notation. The GNFS has postulated runtime

where . This is not a similar order to . To see this more clearly, let be the length of in binary, so . Then the GNFS is roughly and slightly above , but is , which is not only miles bigger, it is a properly exponential function.

Nobody knows a deterministic factoring algorithm that beats properly exponential time.

The unprovable algorithms taunt and tease us, because they hold out the promise of being able to beat exponential time, but nobody knows how to prove it. Indeed, as Rubinstein remarks in his intro, each “teaser” is a conceptual child of one of the exponential algorithms. A reason to care about his new algorithm is its potential to be a new way to attack factoring, even though it currently loses to the best known provable methods. These are all slight variations of the Pollard-Strassen method, all running in -type times; there are also algorithms assuming the generalized Riemann hypothesis. See this for details.

The New Factoring Algorithm

Most factoring algorithms—even Peter Shor’s quantum algorithm—use the idea of choosing a number and working modulo . If and share a factor then we can quickly find it by Euclid’s gcd algorithm, while otherwise the problem of finding such that provides useful information. The starting point of Rubinstein’s algorithm is to do this for values that are right near each other, and compare the values obtained.

In its outermost structure, it uses this fact: Suppose where and are primes of the same length, so . Suppose . Then if we write

with (note they are relatively prime to or else gcd gives us a factor) we get that and are fairly small, about . They are also roughly bounded by and by . Multiples of them will stay small. So now let us work modulo for . We have:

so

Now an issue is that the values giving are far from unique. However, among them as will be the collinear points

The increments are small enough that the points stay close to each other in Euclidean space. If we divide the space into boxes of the right size and offset, they will all stay inside one box. If we can search a box well enough to find them and get all of exactly, then we get and . Moreover—and this is why I like factoring—once we find them we can verify that we have the right ones; if the check fails and we were fooled by -many other points we can move on.

Using , Rubinstein is able to show that the amortized number of “foolers” in a square is . Since there are -many such squares and , we get the target runtime. Note this is amortized, not just expected, so the algorithm is deterministic. The most clever and painstaking aspect is that estimates of the asymptotic convergence of solutions of to uniform distribution on are needed to get the “amortized” part. The details become complicated, but Rubinstein writes in an engaging top-down manner, and he includes a section on how this might—might—be used to break the sub-exponential barrier.

A Related Question

I found this work interesting not just because it is a new approach to factoring. I have tried in the past to prove the following type of result: Is there an asymmetric cryptosystem that is breakable if and only if factoring can be done in polynomial time?

I want to replace systems like AES with ones that uses the hardness of factoring for their security. Systems like AES rely on intuition and experimental testing for their security—there is not even a conditional proof that they are secure.

My goal is really trivial. Any public-key system can be viewed as an asymmetric system. But what I want is that the encryption and decryption should be very fast. This is one of the reasons that modern systems use private-key systems to create symmetric keys: performance. Using public-key systems for all messages is too slow.

My idea is not far in spirit from what Rubinstein does in his factoring algorithm. This is what struck me when I first read his paper. His algorithm is slow because it has no idea which “box” to look at. Can we share some secret that would allow to be factored faster, yet still make it hard for those without the secret?

Open Problems

Can we extend Rubinstein’s algorithm to break the barrier? Can his methods be used as described above to create asymmetric systems that are based on factoring? What is the real cost of factoring?

[fixed sentence about private-key/symmetric]