The Philosophy of Square-Root Cancellation


Partial sums of the Möbius function appear to grow no faster than a square-root.

A great many problems in analytic number theory concern bounds for finite sums in which some amount of cancellation is to be expected. For example, one might study the partial sums of the Möbius function,

\displaystyle M(x) = \sum_{n \leq x} \mu(n),


which satisfies the trivial bound M(x) = O(x). As the Möbius function changes sign infinitely often, we expect some amount of cancellation in M(x), but such progress is hard won. Indeed, the unspecified improvement M(x)=o(x) is already equivalent to the famous Prime Number Theorem (PNT).

Just how much cancellation do we expect in the sum M(x)? Replacing the PNT with the Riemann hypothesis, we could show M(x) = O(x^{1/2+\epsilon}) for all \epsilon > 0. (In fact, these two statements are equivalent.) Conversely, \Omega_\pm-results (due to Kotnik and te Riele, e.g.) imply that the exponent 1/2 is essentially optimal. We might say that M(x) is suspected to demonstrate square-root cancellation, since M(x) is (conjecturally) no larger than the square-root of the length of its defining sum.

A second example concerns bounds for the Kloosterman sums

\displaystyle S(m,n;x) = \sum_{u (x)^\times} e\!\left(\frac{m u + n u^{-1}}{x} \right),

in which e(z) = e^{2\pi i z}. Here, the trivial bound \vert S(m,n;x) \vert \leq \varphi(x) \leq x was famously improved upon by Weil’s oh-so-non-trivial estimate

\displaystyle \vert S(m,n;x) \vert \leq d(x) \sqrt{\gcd(m,n,x)}\sqrt{x},

which is O(x^{1/2+\epsilon}) for fixed m and n. In other words, Kloosterman sums demonstrate square-root cancellation.

In this note, I’ll discuss why square-root cancellation is so typical in problems in number theory, and give a quick survey of important sums known or widely conjectured to satisfy bounds of this form.


It is widely speculated that the values of the Möbius function behave like a random variable \{\pm 1\} on the square-free integers. Given this, what might we expect out of M(x)?

Let \widetilde{\mu}: [1,x] \to \{\pm 1\} be a “random” function (ie. chosen uniformly at random from the 2^x functions of this form). These are called random walks, and in this particular case, often lumped together as the simple random walk on \mathbb{Z}. The number of random walks that would give \widetilde{M}(x) = k is

\displaystyle \binom{x}{(x-k)/2},

provided of course that x and k have the same parity.  From here, various estimates can give bounds on the probability that \widetilde{M}(x) exceeds O(x^\alpha). To spare you a slog, we might cite Hoeffding’s inequality, which here provides

\displaystyle \mathbb{P}\big(\vert \widetilde{M}(x) \vert \geq t \big) \leq 2 \exp\left( - \frac{x t^2}{2} \right).

Note that this probability becomes vanishingly small as t exceeds O(\sqrt{x}), so we conclude that “most” random walks exhibit a form of square-root cancellation. Considering the partial sums of arithmetic functions as walks leads to the following moral principle, which I’ll call The Philosophy of Square-Root Cancellation:

The Philosophy of Square-Root Cancellation:
We should expect square root cancellation in the partial sum of any arithmetic function that behaves “randomly”.

Returning to the Möbius function, we note that the Riemann hypothesis follows from the conjecture that \mu behaves like a random variable. (This is the probabilistic evidence towards the Riemann hypothesis.)


We have already seen two cases (Kloosterman sums and M(x)) in which square-root cancellation is proven or widely conjectured. In this section, I’d like to fill out the picture with a great many more examples.

I: The Pólya–Vinogradov Inequality.  In 1918 Pólya and Vinagrodov (independently) showed that

\displaystyle \bigg\vert \sum_{n \in [A,B]} \chi(n) \bigg\vert = O\big(\sqrt{q} \log q\big),

in which \chi is any non-prinicipal Dirichlet character of modulus q and A<B are integers. Since the sum of \chi over all of the residues mod q is 0 (by orthogonality), we may assume that the sum above is no longer than q in length, so Pólya–Vinagradov represents square-root cancellation.

Under the assumption of GRH, Montgomery and Vaughn (1977) have given the slight improvement O(\sqrt{q}\log\log q). On the other hand, it is known (Paley, 1932) that these character sums are infinitely often \gg \sqrt{q}\log \log q, so square-root cancellation is the best one can expect.

II: Classical Gauss Sums. The Gauss sum of a Dirichlet character \chi of modulus q is the finite sum

\displaystyle G(\chi) = \sum_{u=1}^q \chi(u) e^{2\pi i u/q}.

These sums were first considered by Gauss under the assumption that \chi was the Legendre symbol. Here, Gauss proved that G(\chi)= i \sqrt{q} or \sqrt{q} depending on the residue of q mod 4, and the generalization \vert G(\chi) \vert = \sqrt{q} holds for any primitive character.

This cancellation is most easily seen as a consequence of the Plancherel formula with respect to Fourier inversion on \mathbb{Z}/q\mathbb{Z}.

III: Counting Points on Elliptic Curves.  Fix a,b \in \mathbb{Z} and consider the elliptic curve

\displaystyle E: \quad y^2 =x^3 +ax +b

over the finite field \mathbb{F}_p, with p prime. The number of points (x,y) on the reduction of E, which we write as N_p, can be written in terms of the Legendre symbol:

\displaystyle N_p = p+ \sum_{x=0}^{p-1} \binom{x^3+ax+b}{p}.

Square-root cancellation suggests that \vert N_p - p \vert = O(\sqrt{p}) may hold. This (proven) result is known as Hasse’s theorem, and essentially follows from a bound on the magnitude of the roots of the local zeta function of E. (That is, Hasse’s theorem represents the analogue of the Riemann hypothesis for the local zeta function of E.)

IV: The Gauss Circle Problem.  The Gauss circle problem concerns estimates for the number S_2(R) of integer lattice points bounded by the circle of radius \sqrt{R}. One expects (and Gauss showed) that S_2(R) \sim \pi R. More precisely, Gauss proved that

S_2(R) = \pi R + P_2(R),

in which P_2(R) is an error term satisfying P_2(R) = O(\sqrt{R}). To see this another way, let r_2(n) denote the number of representations of n as a sum of two squares. Then, since

\displaystyle P_2(R) = \sum_{m \leq R} \left(r_2(m)-\pi\right),

we recognize Gauss’ bound as a form of square-root cancellation after accounting for a main term. Surprisingly, greater-than-squareroot cancellation is conjectured to occur within P_2(R), and we expect P_2(R) = O(R^{1/4+\epsilon}).

This deviation from “random'” behavior can be explained in practice by shortening the sum of length R via Poisson summation. It may also be possible to recognize the additional cancellation as a consequence of the Hecke relations. (Indeed, a general greater-than-squareroot cancellation is known for sums of coefficients of other modular forms.)


Exercise 1. Let \widetilde{\mu}: [1,x] \to \{\pm 1\} be a random function as before, with random walk \widetilde{M}(x). Use Stirling’s approximation to prove that

\displaystyle \mathbb{P}(\vert \widetilde{M}(x) \vert \leq x^\alpha) = \sqrt{\frac{2}{\pi}} \cdot x^{\alpha-\frac{1}{2}}+O\big(x^{3\alpha-\frac{3}{2}}\big).

Conclude that “most” random functions \widetilde{\mu} are \Omega(x^{1/2-\epsilon}) for all \epsilon > 0.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s