Using Groups to Factor Integers

elliptic_curve

At this moment, the fastest general-purpose factoring algorithm for extremely large integers is the general number field sieve (GNFS), invented by Pomerance in 1989 (as a generalization of Pollard’s special number field sieve from 1988). The GNFS has a heuristic runtime which is sub-exponential in the size (in bits) of the input. In particular, the GNFS is much, much faster than trial division at finding factors of large integers. (The worst-case runtime for trial division is exponential in the bit size.)

This is not to say that the GNFS is the exclusive tool used in finding factorizations today. The algorithmic overhead required to implement the GNFS is immense, so much so that it only starts to eclipse other algorithms when used to factor integers of 400 bits or more. (This translates to around 120 decimal digits.) For smaller inputs, different methods will out-perform the GNFS:

For one concrete example, the computer algebra system Mathematica uses a combination of trial division, Pollard’s p-1 Method, Pollard’s \rho Method, ECM, and the quadratic sieve to implement its built-in function FactorInteger[]. Note that Mathematica doesn’t even bother to implement the GNFS. In practice, the GNFS is only used in dedicated, distributed computing projects.

The quadratic sieve was the major inspiration for the GNFS and operates by searching for congruences of the form a^2 \equiv b^2 \!\! \mod N. Pollard’s \rho Method is a sort of Monte Carlo method, which obtains factorizations from cycles in the iterates of a pseudo-random map.

The other three algorithms Mathematica uses (trial division, Pollard’s p-1 Method, and the elliptic curve method) are all examples of algebraic group law factorization algorithms. Broadly speaking, these algorithms perform computations in an algebraic group G defined mod N for which the group structure decomposes (via Chinese Remainder Theorem) into components corresponding to the unknown prime powers dividing N. If we can find an element of G which projects to the identity in at least one but not all of the components of G (and can detect this!) we can factor N.

This note discusses some of the more successful algebraic group law factorization algorithms, which include trial division, Pollard’s p-1 Method, Williams’ p+1 Method, and the ECM. We start simple, and work our way up in complexity.

— 1. TRIAL DIVISION —

To find a non-trivial factor of N using trial division, we simply check if k \mid N for each k \leq \sqrt{N}. Unless N is prime (which is detectable in polynomial time), we’ll find a factor before \sqrt{N}. Now, this isn’t what I’d call an algebraic group law factorization algorithm (AGLFA). To appreciate trial division as an AGLFA, we need to re-interpret some of our steps.

The underlying group in this case is the integers under addition. Viewed mod N, we are looking at the additive group of \mathbb{Z}/N\mathbb{Z}. For simplicity, let’s assume that N has prime factorization N = pq (as in the case of an RSA modulus). By the Chinese Remainder Theorem, we can write

\displaystyle \mathbb{Z}/N\mathbb{Z} \cong \mathbb{Z}/p\mathbb{Z} \times \mathbb{Z}/q\mathbb{Z}.

Recall, as a general goal, that we’re looking for an element g \in G which equals the identity in exactly one of the components \mathbb{Z}/p\mathbb{Z} or \mathbb{Z}/q\mathbb{Z}. One way to do this is to fix an element a \in \mathbb{Z}/N\mathbb{Z} (at random, say) and look at the iterates of a under the group action. If we can find an integer L such that

L \cdot a \equiv 0 \!\! \mod p \qquad \text{but for which} \qquad L \cdot a \not\equiv 0 \!\! \mod q,

we will have found our partial identity. But how can we find L without knowing p or q? The idea is to search for L in the sequence \{n!\}. To identify a successful choice of L, we need a way to determine if L \cdot a \!\! \mod N is a partial identity. We can do this with a GCD computation, since \gcd(L \cdot a \!\! \mod N, N)>1 if and only if L\cdot a is at least a partial identity. So, without further ado, here’s “trial division” as an AGLFA:

  1. Choose a convenient value for a \in \mathbb{Z}/N\mathbb{Z}.
  2. Starting at j=2, replace a with j \cdot a \!\! \mod N.
  3. Compute d = \gcd(a,N). If 1<d<N, return d, a non-trivial factor of N (and stop).
  4. Increment j and return to Step 2.

It’s easy to look at the algorithm above and see the ways in which it might be slower than traditional trial division. The trial division you’re used to takes O(p) mod operations, in which p is the smallest prime factor of N. This takes time O(p \log N \log p) when N and p are both large. On the other hand, this algorithm involves O(p) modular multiplications of size N and O(p) GCD computations. The runtime is therefore O(p \log^2 N). The runtimes are thus comparable when p is large, although the implicit constants certainly favor the traditional version.

Remark — I’m aware just how silly this algorithm looks. The point here is just to create an analogy which we’ll develop in the next few sections. Some of the core ideas, like taking L=n! or finding factors by computing GCD against N, will be seen time and time again.

— 2. POLLARD’S p-1 METHOD —

Pollard’s p-1 Method originates in a paper by Pollard from 1974 and is typically viewed as the archetypal AGLFA. In this case, the underlying group is U_N, the multiplicative group of units in \mathbb{Z}/N\mathbb{Z}. Since the Chinese Remainder Theorem is a ring homomorphism, we have U_N \cong U_p \times U_q.

As in the additive case, we’ll search for a factor by taking a random a \in \mathbb{Z}/N\mathbb{Z} and looking at the iterates of a under the group action. If we can find an integer L such that

\displaystyle a^L \equiv 1 \!\! \mod p \qquad \text{but for which} \qquad a^L \not\equiv 1 \!\! \mod q,

we can recover p as \gcd(a^L-1,N). As before, we search for L in the sequence \{n!\}. This leads to Pollard’s p-1 Method, which — when written in pseudo-code — looks almost exactly like our version of trial division:

  1. Choose a convenient value for a \in \mathbb{Z}/N\mathbb{Z}.
  2. Starting at j=2, replace a with a^j \!\! \mod N.
  3. Compute d = \gcd(a-1,N).
    • If 1<d<N, return d, a non-trivial factor of N (and stop).
    • If d=N, return to Step 1 and choose a new a \in \mathbb{Z}/N\mathbb{Z}.
  4. Increment j and return to Step 2.

Just how fast is Pollard’s p-1 Method? We note that a^L \equiv 1 \!\! \mod p if and only if \mathrm{ord}_p(a) \mid L, and that \mathrm{ord}_p(a) \mid (p-1) by Lagrange’s Theorem. In a worst-case (but not atypical) scenario, we’d have to increase L=n! until (p-1) \mid n!. As a rough heuristic, Pollard’s p-1 Method frequently terminates when n is the largest prime factor of p-1.

The runtime of Pollard’s algorithm depends dramatically on the shape of the hidden factors p and q. For example, Pollard’s algorithm factors

\displaystyle N = 29937601 \cdot 73178713

in just twelve iterations, while the seemingly-related

\displaystyle N= 29937679 \cdot 73178713

takes over a million iterations to complete. When Pollard’s method is at it’s worst — for example when p and q are both safe primes — the algorithm is no faster than trial division. For this reason, Pollard’s p-1 Method is typically classified as a special-purpose factoring algorithm; it won’t be efficient for most inputs, but there are cases when it is the only tractable approach. This is the role it plays in Mathematica.

The runtime for p-1 Method is typically O(k \log^2 N), in which k is the largest prime factor of p-1 (or q-1, if q is found first). This can be contrasted with our algorithm based on trial division, which ran in time O(k \log^2 N), in which k is the largest prime factor of p (i.e. k=p). Obviously, p-1 will be smoother than p, but their smoothnesses might be of equal magnitude.

The difference between p-1 (in Pollard’s p-1 Method) and p (in trial division) can be explained in terms of Lagrange’s Theorem. Moving from \mathbb{Z}/p\mathbb{Z} to U_p, we move from a group of order p to one of order p-1. Since a lot of the variety in AGLFAs comes from choosing groups with different orders, expect more of this in the following sections.

— 3. WILLIAMS’ p+1 METHOD —

Williams’ p+1 Method was first presented in a paper by Williams in 1982. This special-purpose factoring algorithm is directly inspired by Pollard’s p-1 Method (the prototypical AGLFA) and, as the name suggests, works best when N has a prime factor p for which p+1 is smooth.

The underlying group in Williams’ p+1 Method is the multiplicative group of the quadratic ring extension

\displaystyle (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}] = \{a+b\sqrt{t} : a,b \in \mathbb{Z}/N\mathbb{Z}\},

in which t is a quadratic non-residue mod N. If N=pq, the Chinese Remainder Theorem implies that (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}] \cong (\mathbb{Z}/p\mathbb{Z})[\sqrt{t}] \times (\mathbb{Z}/q\mathbb{Z})[\sqrt{t}], which also extends to their unit groups.

When N=p is prime, every element of (\mathbb{Z}/p\mathbb{Z})[\sqrt{t}] has a multiplicative inverse. To be specific,

\displaystyle (a+b\sqrt{t})^{-1} \equiv \frac{a}{a^2-tb^2} - \frac{b}{a^2-tb^2} \sqrt{t} \!\! \mod p,

in which the denominators represent multiplication by (a^2-tb^2)^{-1} \!\! \mod p. (If this last inverse fails to exist, a^2 \equiv t b^2 \!\! \mod p, which implies a\equiv b \equiv 0 \!\! \mod p since t was chosen to be a quadratic non-residue.) Note that this implies that \mathrm{ord}_p(a+b\sqrt{t}) \mid (p^2-1) by Lagrange’s Theorem.

If we adopted Pollard’s p-1 Method wholesale, we’d now attempt to find a unit a+b\sqrt{t} and an integer L for which

\displaystyle \mathrm{ord}_p(a+b\sqrt{t}) \mid L \text{ in } (\mathbb{Z}/p\mathbb{Z})[\sqrt{t}]

\text{but for which} \qquad \mathrm{ord}_q(a+b\sqrt{t}) \nmid L \text{ in } (\mathbb{Z}/q\mathbb{Z})[\sqrt{t}].

Unfortunately, it’s not so easy, and we’re not quite done. Since the multiplicative group of a field is cyclic, it’s certainly possible (and not at all unlikely, even) that \mathrm{ord}_p(a+b\sqrt{t}) = p^2-1. In this case, searching for a candidate L in the sequence \{n!\} would require n to be at least as large as the largest prime factor of p^2-1. Since p^2-1 = (p-1)(p+1), we’re impeded by large factors of p+1 and of p-1, which makes our current approach no better than Pollard’s p-1 Method.

In theory, fixing this is simple: we just consider \mathrm{ord}_p((a+b\sqrt{t})^{p-1}) instead. But, lest we forget, we don’t know the value of p ahead of time. And yet, somehow, this doesn’t stop us. To see why, consider

\displaystyle (a+b\sqrt{t})^p \equiv \sum_{k=0}^p \binom{p}{k} a^{p-k} (b\sqrt{t})^k \equiv a^p + b^p t^{\frac{p-1}{2}} \sqrt{t}

\displaystyle \equiv a + \left(\frac{t}{p}\right) b \sqrt{t} \equiv a - b \sqrt{t} \!\! \mod p,

in which we’ve used the binomial theorem, the fact that \binom{p}{k} \equiv 0 \!\!\mod p unless k \equiv 0 \!\! \mod p, Euler’s Criterion, Fermat’s Little Theorem, and the fact that t is a quadratic non-residue. It follows that

\displaystyle (a+b\sqrt{t})^{p-1} \equiv (a-b\sqrt{t})(a+b\sqrt{t})^{-1} \equiv \frac{(a-b\sqrt{t})^2}{a^2-b^2 t} \!\! \mod p.

This formula still refers to p, but we can make the visible p-dependence go away by computing the right-hand side as an element A+B\sqrt{t} \in (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}]. Now, we attempt to find an L for which

\displaystyle \mathrm{ord}_p(A+B\sqrt{t}) \mid L \text{ in } (\mathbb{Z}/p\mathbb{Z})[\sqrt{t}]

\displaystyle \text{but for which} \qquad \mathrm{ord}_q(A+B\sqrt{t}) \nmid L \text{ in } (\mathbb{Z}/q\mathbb{Z})[\sqrt{t}].

By construction, \mathrm{ord}_p(A+B\sqrt{t}) is a divisor of p+1. Moreover, we expect (A+B\sqrt{t})^{n!} \equiv 1 \!\! \mod p once n gets as large as the largest prime factor of p+1. But once this occurs, we can detect it by computing a certain GCD. This comes together as Williams’ p+1 algorithm:

  1. Choose some t \in U_N which is a quadratic non-residue mod all primes p \mid N.
  2. Choose a random/convenient value a+b\sqrt{t} \in (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}] and compute
    \displaystyle A + B \sqrt{t} \equiv \frac{(a-b\sqrt{t})^2}{a^2-b^2 t} \!\! \mod N.
  3. Starting at j=2, replace A+B\sqrt{t} with (A+B\sqrt{t})^j \!\! \mod N.
  4. Compute d = \gcd(A-1,N) or d = \gcd(B,N).
    •  If 1<d<N, return d, a non-trivial factor of N (and stop).
    • If d=N, return to Step 1 and choose a new t and a+b\sqrt{t} \in (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}].
  5. Increment j and return to Step 3.

As you might guess, the typical runtime of Williams’ p+1 Method is O(k \log^2 N), in which k is the largest prime factor of p+1. There are plenty of cases, for example

\displaystyle N=25 \cdot 2^{40}+1 = 561797 \cdot 48928333,

in which the Williams’ Method outperforms both trial division and Pollard’s p-1 Method. Still, we mustn’t forget that Williams’ Method is a special-purpose factoring algorithm, which is typically no faster than trial division.

Remark — There is no known algorithm to deterministically perform Step 1 without essentially factoring N along the way. In practice, one chooses t \in U_N at random and hopes that it was a quadratic non-residue mod the prime p for which p+1 is smoothest. This will happen about 50% of the time, so it’s a good idea to run Williams’ algorithm more than once if it fails the first time. It’s also worth noting that Williams’ p+1 Method degenerates into a less efficient version of Pollard’s p-1 Method when t is a quadratic residue, which is not ideal but also not the worst outcome.

There’s nothing stopping a program like Mathematica from incorporating Williams’ p+1 Method into their factoring suite. That being said, I’ve never seen it put to the test in a serious context. It may be that the Williams’ p+1 Method fails too often to bother implementing.  (See here for further comparison to Pollard’s Method.)

— 4. FACTORING WITH CYCLOTOMIC POLYNOMIALS —

The p+1 which appears in Williams’ p+1 Method arises from the computation (p^2-1)/(p-1). If we replace the quadratic extension (\mathbb{Z}/N\mathbb{Z})[\sqrt{t}] with an extension of degree d, we’d produce a factoring method which relied on the smoothness of p^d-1. With a bit of care, we can eliminate some of the factors of p^d-1 and require only that \Phi_d(p) be smooth, where \Phi_d(x) is the d-th cyclotomic polynomial. This completely subsumes the algorithms due to Pollard and Williams, since \Phi_1(x)=x-1 and \Phi_2(x)=x+1.

This technique was first published in 1989 by Bach and Shallit. Unfortunately, it remains mostly a curiosity, since \Phi_d(p) is far less likely to be smooth as d increases in size. The degree of \Phi_d(x) is \phi(d), which is at least two for d \geq 3. Since the likelihood that X is smooth decreases as X \to \infty, the odds of success with a larger cyclotomic polynomial are negligible. This is a common problem with AGLFAs, one which really restricts the viability of algorithms in this class.

— 5. LENSTRA’S ELLIPTIC CURVE METHOD —

The elliptic curve method (ECM) was first presented in a paper by Lenstra in 1987. Like Williams’ p+1 Method, the ECM was directly inspired by Pollard’s p-1 Method. Here, the group of integers U_N is replaced by the group of points on an elliptic curve.

More specifically, one picks a random elliptic curve E: y^2 z = x^3 +axz^2 + bz^3 over \mathbb{Z}/N\mathbb{Z} and a non-identity point P on E. Using the addition law on elliptic curves, we compute L \cdot P for L in the sequence \{n!\}. By the Chinese Remainder Theorem, this addition can be viewed mod p, for each prime p \mid N. If ever L \cdot E = O \in E(\mathbb{Z}/p\mathbb{Z}), the z-coordinate of L \cdot P will be divisible by p. But this can be detected with a GCD computation, which gives us a method for factoring N.

If we restrict to a single curve E, this algorithm functions exactly like Pollard’s p-1 Method, except with p-1 replaced by the order of the group E(\mathbb{Z}/p\mathbb{Z}). By Hasse’s Theorem, \#E(\mathbb{Z}/p\mathbb{Z}) = p+1 - a_{p}, in which a_p depends on E and satisfies \vert a_{p} \vert \leq 2 \sqrt{p}. If we’re lucky and p+1-a_p is particularly smooth, we’ll factor N in short order.

More likely, p+1-a_p will not cooperate. But here lies the advantage of the ECM over Pollard’s p-1 Method: we can throw out E and repeat our process with a new elliptic curve. Under the Sato–Tate Conjecture, a_p varies in the interval [-2\sqrt{p},2\sqrt{p}] according to a well-defined distribution. In particular, by trying many elliptic curves, we expect to find one for which p+1-a_p is sufficiently smooth in a predictable amount of time.

By optimizing our definition of sufficiently smooth (and applying some heuristics for the number of smooth numbers in an interval), we obtain an algorithm which is sub-exponential in the smallest factor p of N. For this reason, the ECM is most commonly used to identify factors of moderate size (i.e. fifty-ish digits) in enormous composites. The largest factor found to date with ECM is an 83-digit divisor of (7^{337}+1)/8, found by Propper in 2013. (The leftovers form a 202-digit composite.)

The generalization from elliptic curves to curves of higher genus is easy to do, but in the end, seems to be unhelpful. The reason for this is the same reason that factoring with cyclotomic polynomials is impractical: the odds of finding group with sufficiently smooth order decreases dramatically as the order increases in size.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s