Counting Matrices of Small Trace


(Image credit David Zerba, 2013)


To refresh on a bit of graph theory, we define a k-regular graph to be a graph in which every vertex has k neighbors.  Thus 0-regular graphs are collections of disconnected vertices, while 1-regular graphs consist of disconnected edges.  A 2-regular graph consists of disconnected cycles (or perhaps infinite chains, if the graph is infinite).

For a general graph G, we define the girth of G, written h(G), to be the length of the shortest non-trivial cycle in G.  If no cycles of finite length exist, as is necessarily the case with 0-regular and 1-regular graphs, the girth is said to be infinite.  In the case of 2-regular graphs on n vertices, the trivial upper bound h(G)=n is attained by the cycle graph on n vertices.

The story is quite changed if we consider finite k-regular graphs with k \geq 3. Here, one can show that the girth of a k-regular graph satisfies

h(G) \lesssim 2\log_{k-1}(n),

and Erdős and Sachs showed that logarithmic growth (with a constant worse than 2) is actually attainable. The best known constant in these estimates is currently 4/3, as happens for certain trivalent sextet graphs and for the Ramanujan graphs of Lubotzky, Philips, and Sarnak. (In particular, it’s not known whether or not the constant 2 is sharp.)

A construction due to Brooks and Makover relates graphs of large girth to hyperbolic surfaces of large systole. (Here, the systole of a hyperbolic surface is the shortest homotopically non-trivial (and non-peripheral) loop on the surface.)  This construction is perhaps unsurprising, since the dual graph of a surface triangulation is 3-regular.


In 2016, Bram Petri and I submitted a paper that builds upon the machinery of Erdős and Sachs to construct a family of genus g surfaces for which the length of the systole exceeds

\log g - \log \log g +O(1).

(This lower bound on the size of the systole is sharp up to constants.)

A key ingredient in our paper (and where I, as a number theorist, finally made myself useful) is a count of \mathrm{SL}_2(\mathbb{Z}) matrices with non-negative entries and bounded trace. To be specific, let \mathrm{SL}_2(\mathbb{Z})^+ denote the semigroup of \mathrm{SL}_2(\mathbb{Z}) consisting of matrices with non-negative entries, and let

\displaystyle n(m) = \# \left\{\gamma \in \mathrm{SL}_2(\mathbb{Z})^+ : \mathrm{tr}(\gamma) = m\right\}.

This will actually be infinite for m=2, since our semigroup contains

\displaystyle \left(\begin{matrix} 1 & k \\ 0 & 1 \end{matrix}\right) \quad \text{and} \quad \left(\begin{matrix} 1 & 0 \\ k & 1 \end{matrix}\right)

for all k \geq 0. Otherwise, n(m) is finite for m> 2. Moreover, we see that n(m) has a nice closed form in terms of the divisor function:

Proposition 1 (PW, Proposition 3.2): For m>2, we have

\displaystyle n(m) = \sum_{a=1}^{m-1} d(a(m-a)-1).

Here, the trivial bound d(m) = O(m^\epsilon) (see Apostol’s Introduction to Analytic Number Theory, Theorem 13.12, eg.) gives n(m) = O(m^{1+\epsilon}), and it’s reasonable to assume that averaging over several values of the divisor function would show n(m) = O(m (\log m)^C) for some constant C. Unfortunately, I suspect that showing this would be quite hard, as averaging arithmetic functions over the values of a quadratic polynomial is notoriously difficult. (Compare, for example, the problem of finding infinitely many primes in the outputs of a quadratic polynomial, and in particular, Hardy-Littlewood’s Conjecture F.)

For averages of divisor sums over linear polynomials, far more is known. In the absolute simplest case, we have

\displaystyle \sum_{n \leq X} d(n) = X \log X + (2\gamma-1) X + O(X^\alpha)                    (1)

for some \alpha\leq 1, in which \gamma denotes the Euler-Mascheroni constant. Dirichlet proved that \alpha = 1/2 is admissible, and incremental reduction of \alpha towards the conjecturally-optimal exponent \alpha = 1/4 is known as the Dirichlet divisor problem. The current bound is due to Huxley (2003), who gives \alpha = 131/416 \approx .3149. (This is also the record exponent in the Gauss Circle Problem, which belies the similarity of these problems. In fact, Huxley holds both records.)

Fortunately for Bram and me, a bound for the partial sums of n(m) was sufficient for the purposes of our paper. Explicitly, we need only consider

\displaystyle N(m) = \sum_{k \geq 3}^m n(k) = \sum_{k \geq 3}^m \sum_{a=1}^{k-1} d(a(k-a)-1).

The trivial bound n(m) = O(m^{1+\epsilon}) implies that N(m) = O(m^{2+\epsilon}), but we expect to remove the \epsilon-dependence via averaging.  Interchanging the order of summation, we write

\displaystyle N(m) = \sum_{a=1}^{m-1} \sum_{k \geq \max(a+1,3)}^{m} d(a(k-a)-1)=\sum_{k=1}^{m-2} d(k) + \sum_{a=2}^{m-1} \sum_{n \equiv -1(a)}^{a(m-a)-1} d(n).

The first term is O(m \log m) by the classic Dirichlet divisor estimate, and in the second we’ve reduced our work to the study of the divisor function in averages over arithmetic progressions.  Luckily, this has been studied before; for example, Fouvry and Iwaniec give that

\displaystyle \sum_{\substack{n < X \\ n \equiv -1(a)}} d(n) = \frac{1}{\varphi(a)} \sum_{\substack{n \leq X \\(n,a)=1}} d(n) + O\left(\left(a^\frac{1}{2} + X^\frac{1}{3}\right)X^\epsilon\right),

as a consequence of the fact that the divisor function equidistributes over the (primitive) residue classes.

Tracking the propagation of this error term and adjusting some limits of summation for simplicity, we see that

\displaystyle N(m) = \sum_{a=1}^m \frac{1}{\varphi(a)} \sum_{\substack{n \leq a(m-a) \\ (n,a)=1}} d(n) + O\left(m^{\frac{5}{3}+\epsilon}\right).

The next step in our evaluation is estimation of the inner sum. If we open up the inner sum in the line above we see that

\displaystyle N(m) = \sum_{a=1}^m \frac{1}{\varphi(a)} \sum_{\substack{m_1 \leq a(m-a) \\ (m_1,a)=1}}\,\, \sum_{\substack{m_2 \leq a(m-a)/m_1 \\ (m_2,a)=1}} \!\!\!1 + O\left(m^{\frac{5}{3}+\epsilon}\right).         (2)

The innermost sum just counts the number of integers less than a(m-a)/m_1 which are relatively prime to \varphi(a), which suggests a heuristic main term of the form \varphi(a)(m-a)/m_1 for this sum. The shape of the error is harder to guess, but is nevertheless small:

Proposition 2: We have

\displaystyle \sum_{\substack{n \leq X \\(n,a)=1}} 1 = \frac{\varphi(a)}{a} X + O\left(d(a)\right).

Proof: (See here, as well.) Standard tricks give

\displaystyle \sum_{\substack{n \leq X \\(n,a)=1}} \!\!1 = \sum_{n \leq X} \sum_{d \mid (n,a)}\! \mu(d) = \sum_{d \mid a} \mu(d) \lfloor X/d \rfloor = X \sum_{d \mid a} \frac{\mu(d)}{d} + \sum_{d \mid a} \mu(d) \left\{X/d \right\},

in which \mu denotes the Möbius function, \lfloor x \rfloor denotes the integer part of x, and \{x\}=x-\lfloor x \rfloor denotes the fractional part of x. The Dirichlet convolution identity \mu(a) \star a = \varphi(a) expresses the first term at right as \varphi(a)X/a. To bound the second term, we need only remark that each of the d(a) terms in the divisor sum is O(1).  \square

If you’re just looking for a slightly-improved upper bound (as was the case in my paper with Bram), it’s enough to bound the inner sum in (2) via Proposition 2 and write

\displaystyle N(m) \ll \sum_{a=1}^{m} \frac{1}{\varphi(a)} \sum_{(m_1,a)=1}^{a(m-a)} \left(\frac{\varphi(a)(m-a)}{m_1} + O(d(a))\right) + O\left(m^{\frac{5}{3}+\epsilon}\right).

Ignoring coprimality in the m_1-sum for simplicity and applying the bound d(a) \ll \varphi(a) gives the following upper bound for N(m):

Theorem 3 (PW, Proposition 3.4): We have N(m) = O(m^2 \log m).


There’s some clear room for improvement here, and while looking over this paper recently I decided to see what could be shown with a little more work. The core inefficiency in the previous section concerns the way we opened up the divisor sum

\displaystyle \frac{1}{\varphi(a)} \sum_{\substack{n \leq X \\ (n,a)=1}} d(n)

in line (2). In essence, our treatment resembles the calculation

\displaystyle \sum_{n \leq X} d(n) = \sum_{m_1 \leq X} \sum_{m_2 \leq X/m_1} 1 = \sum_{m_1 \leq X} \left\lfloor \frac{X}{m_1}\right\rfloor                            \,

\,             \displaystyle = \sum_{m_1 \leq X} \left(\frac{X}{m_1}+O(1)\right) = X \log X + O(X),

which gives a quick proof of the fact the main term in (1) is X \log X but introduces a large error and obscures some secondary terms. The same is true in our estimate of N(m) above: we produce a main term (for simplicity, just an upper bound) but can say little about the error.

As it happens, our problem can be solved using the same tool that Dirichlet used to extract refined asymptotics in (1): the hyperbola method. (So-called because divisors of n are in bijection with lattice points on the hyperbola ab=n.)

Very briefly, the hyperbola method exploits the ab-symmetry of the hyperbola ab=X to count the number of points beneath it. Combinatorially, this corresponds to counting only the smallest factor in a pair of divisors, which is of course no larger than \sqrt{X}. Explicitly, one has

\displaystyle \sum_{n \leq X} d(n) = 2 \sum_{d \leq \sqrt{X}} \bigg(\sum_{q \leq X/d} 1 - \sum_{q\leq d} 1\bigg) + \sum_{d \leq \sqrt{X}} 1.

Here, the inner sum of the double sum counts the number of lattice points (d,q) under the hyperbola dq=X that satisfy d<q. As we vary d \leq \sqrt{X}, this counts about half of the points under the hyperbola — to finish, we double our estimate to account for d>q and add a final sum to treat the d=q case.  (This is presented in greater detail in Apostol, Section 3.5.)

With some bookkeeping, we realize that a similar statement holds for the sum over (n,a)=1:

\displaystyle \sum_{\substack{n \leq X \\ (n,a)=1}} d(n) = 2 \sum_{\substack{d \leq \sqrt{X} \\ (d,a)=1}} \bigg(\sum_{\substack{q \leq X/d \\ (q,a)=1}} 1 - \sum_{\substack{q\leq d \\(q,a)=1}} 1\bigg) + \sum_{\substack{d \leq \sqrt{X} \\(d,a)=1}} 1.

Applying Proposition 2 gives

\displaystyle \sum_{\substack{n \leq X \\ (n,a)=1}} \!\!\!d(n) = \frac{\varphi(a)}{a} \sqrt{X} + 2\!\sum_{\substack{d \leq \sqrt{X} \\ (d,a)=1}} \!\!\frac{\varphi(a) X}{ad} -\frac{\varphi(a) d}{a}                 (3)

\displaystyle + O\left(\frac{\varphi(a)d(a)}{a}\sqrt{X}\right).            \,

To handle the two terms that appear in the sum at right in (3), we require two variants of Proposition 2. This first follows from Proposition 2 and partial summation, and we state it without proof:

\displaystyle \sum_{\substack{n \leq X \\ (n,a)=1}} n = \frac{\varphi(a)}{2a} X^2 + O\left(X d(a)\right).

The proof of the other fact we’ll need is different enough that we state it as a Lemma. The proof is similar to Proposition 2 and is hinted in the exercises below.

Lemma 4: We have

\displaystyle \sum_{\substack{n \leq X \\ (n,a)=1}} \frac{1}{n} =\frac{\varphi(a)}{a} \log X + \frac{\varphi(a)\gamma}{a} - \sum_{d \mid a} \frac{\mu(d) \log d}{d} + O\left(\frac{d(a)}{X}\right).

Proof: See the Exercises.  \square

Simplification of the sums in (3) produces an estimate for the restricted partial sums of divisors in the spirit of Dirichlet’s hyperbola method:

\displaystyle \sum_{\substack{n \leq X \\ (n,a)=1}} \!\!d(n) =\frac{\varphi(a)^2}{a^2} X \log X + \bigg(\frac{2\varphi(a)^2(2\gamma-1)}{a^2}\bigg) X            \,

\,                 \displaystyle - \frac{2\varphi(a)X}{a} \sum_{d \mid a} \frac{\mu(d) \log d}{d}+ O\left(\frac{d(a)\varphi(a)}{a}\sqrt{X}+ d(a)^2\right).

Heuristically, we consider the factors \varphi(a)/a as marking the proportion of integers that occur in the sum over n coprime to a. The divisor sum that appears can be considered as a correction term, signifying that divisors aren’t perfectly distributed in residue classes. We note that Dirichlet’s estimate (1) follows from taking a=1.


Now that we have a good estimate for sums of divisors over the integers coprime to a, our work is nearly done. Elementary bounds for error terms massage our expression for N(m) into

\displaystyle N(m) = \sum_{a =1}^m \bigg(\frac{\varphi(a)(m-a)\left(\log(a(m-a)) + 2\gamma-1\right)}{a} \bigg)           (4)

\,                        \displaystyle  -2\sum_{a=1}^m (m-a)\sum_{d \mid a } \frac{\mu(d) \log d}{d} + O \left(m^{\frac{5}{3}+\epsilon}\right).

The first term at right in (4) represents the dominant contribution towards N(m). By partial summation, we write this term as

\displaystyle \int_1^{m} \sum_{n \leq t} \varphi(n)\frac{m \log (t(m-t)) +2 t -2m + 2 \gamma m}{t^2} dt + O\left(m \log m\right).

To handle the simplified partial sum of \varphi(n), we recall (see Apostol, Thereom 3.7, eg.) that

\displaystyle \sum_{n \leq X} \varphi(n) = \frac{3}{\pi^2} X^2 + O\left(X \log X\right).

(Better error bounds are known but this is more than sufficient.)  Thus the first term in (4) grows as

\displaystyle \frac{3}{\pi^2} \int_{1}^{m} 2t+m(\gamma-1)+m \log(t(m-t)) \,dt + O\left(m\log m\right)

\,                  \displaystyle = \frac{6m^2 \log m}{\pi^2} +\left(\frac{6\gamma}{\pi^2} - \frac{9}{\pi^2}\right)m^2 + O\left(m \log m\right).

The double sum in (4) may be simplified by reversing the order of summation. Splitting (m-a) into two terms and treating each separately gives

\displaystyle -2\sum_{a=2}^{m-1} (m-a)\sum_{d \mid a} \frac{\mu(d)\log d}{d} = -m^2 \sum_{d \leq m} \frac{\mu(d)\log d}{d^2} + O\left(m \log^2(m)\right).

To simplify further, we note that

\displaystyle \sum_{d \leq m} \frac{\mu(d) \log d}{d^2} = \sum_{d =1}^\infty \frac{\mu(d) \log d}{d^2} + O\left(\sum_{d>m} \frac{\log d}{d^2}\right) = \frac{36 \zeta'(2)}{\pi^4}+O\left(\frac{\log m}{m}\right),

in which we’ve used that \sum \mu(d) \log d/d^s = \zeta'(s)/\zeta(s)^2 to obtain the limiting value.  Collecting our estimates for each line in (4), we see at last that

\displaystyle N(m) = \frac{6 m^2 \log m}{\pi^2} + \left(\frac{6\gamma}{\pi^2}-\frac{9}{\pi^2} - \frac{36 \zeta'(2)}{\pi^4}\right) m^2 + O\left(m^{\frac{5}{3}+\epsilon}\right).

It’s unclear what the optimal error should be in this expression. Certainly our error exceeds O(m), and numerical evidence suggests that the error can be quite larger. In fact, this numerical evidence suggests the conjecture that this error is O(m^{\frac{5}{4}+\epsilon}), which matches the conjectured exponent in the Dirichlet divisor problem (renormalized since our main term for N(m) is m^2 \log m instead of m \log m).

A plot of the error term for N(m), normalized by division by m^{5/4}, is given in the figure below for m \leq 10000:



Exercise 1: Prove Lemma 4 by modifying Proposition 2 and applying the estimate

\displaystyle \sum_{n \leq X} \frac{1}{n} = \log X + \gamma + O\left(\frac{1}{X}\right).

Exercise 2: By reversing the order of summation, prove that

\displaystyle \sum_{a=2}^{m} a\sum_{d \mid a} \frac{\mu(d)\log d}{d} = \frac{m^2}{2} \sum_{d \leq m} \frac{\mu(d) \log d}{d^2} + O\left(m \log^2 (m)\right).

Exercise 3: Prove that

\displaystyle N(m) \leq \sum_{k \leq m^2/4} d(k)d(k-1)

and apply the Cauchy-Schwarz inequality to prove that N(m) \ll \sum_{k \leq m^2} d(m)^2.

Exercise 4: Prove that

\displaystyle \sum_{m =1}^\infty \frac{d(m)^2}{m^s} = \frac{\zeta(s)^4}{\zeta(2s)}.

Conclude using Exercise 3 that N(m) \ll m^2 \log^3 m.

One response to “Counting Matrices of Small Trace

  1. Pingback: Two Classic Problems in Point-Counting | a. w. walker·

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s