Revisiting the Product Rule

In differential calculus, the product rule is both simple in form and high in utility. As such, it is typically presented early on in calculus courses — soon after the linearity of the derivative, in fact. Moreover, the product rule is easy to derive from first principles:

Theorem (Product Rule): Let $f$ and $g$ be differentiable on the open set $U$ . Then $fg$ is differentiable on $U$ , and we have

$(fg)'(x)=f(x)g'(x)+g(x)f'(x)$ for all $x \in U$ .

Proof: For $x \in U$ , we have (by definition of the derivative)

$\displaystyle(fg)'(x) = \lim_{\epsilon \to 0} \frac{f(x+\epsilon)g(x+\epsilon)-f(x)g(x)}{\epsilon}$

$\displaystyle = \lim_{\epsilon \to 0} \frac{\left(f(x+\epsilon)-f(x)\right)g(x+\epsilon)+f(x)\left(g(x+\epsilon)-g(x)\right)}{\epsilon}$

$\displaystyle = \lim_{\epsilon \to 0} \frac{g(x\!+\!\epsilon)\left(f(x\!+\!\epsilon)-f(x)\right)}{\epsilon}\!+\!\lim_{\epsilon \to 0} \frac{f(x)(g(x\!+\!\epsilon)-g(x))}{\epsilon},$

under the assumption that each of these last two limits exists. This of course holds, as these limits are $g(x)f'(x)$ and $f(x)g'(x)$ , respectively. $\square$

All in all, then, the product rule is easy to prove and easy to use. But — and this is of utmost pedagogical importance — is the product rule intuitive? By this proof alone, I would argue not; the manipulation of the numerator is weakly-motivated and our result falls out without reference to more general phenomena.

In this post, we’ll explore the merits of a second proof of the product rule, one that I hope presents a motivated and compelling argument as to why the product rule should look the way it does.

— PART I (PRODUCTS AND CHAINS) —

In what sense, if any, should the product rule be natural? As suggested in the introduction, the derivative is — fundamentally — a linear operator. What business, then, does the derivative have in respecting products?

In one sense, very little. To hash this thought out more fully, let $A$ be an algebra over the ring $k$ . A $k$ -linear operator $D: A \to A$ is called a derivation if $D$ satisfies the product rule, i.e.

$D(a_1a_2)=a_1D(a_2)+a_2D(a_1)$

for all $a_1,a_2 \in A$ . Derivations can be thought of as formal counterparts of the derivative, and in this light we make two observations: firstly, that the product rule shines as the characteristic property of derivations; secondly, that this holds because (and only because!) we have prescribed it.

To put some of this into perspective, I’d like to compare the product rule to a second mainstay of differential calculus: the familiar chain rule. Frequently, this is taught long after the product rule, in part for the following:

While the quotient rule is perhaps most naturally a corollary of the product and chain rules, it can be (and often is) derived independently. By circumventing the chain rule, one can differentiate the trigonometric functions sooner (i.e. before returning to the chain rule). This is done in Stewart, for example.
In a curriculum that focuses on differentiating each of the so-called “elementary functions“, the chain rule is only required insofar as it used to derive the differentiation laws for inverse functions (e.g. the inverse trigonometric functions and either the logarithm or the exponential).

There’s also the question about proof: on a moral level, the chain rule follows from the factorization

$\displaystyle(f \circ g)'(x) = \lim_{\epsilon \to 0} \frac{f(g(x+\epsilon))-f(g(x))}{\epsilon}$

$\displaystyle = \lim_{\epsilon \to 0}\frac{f(g(x+\epsilon))-f(g(x))}{g(x+\epsilon)-g(x)} \cdot \frac{g(x+\epsilon)-g(x)}{\epsilon}$

in which the first term is recognized as $f'(g(x))$ and the latter as $g'(x)$ . Unfortunately, it may be the case that $g$ fails to inject in any neighborhood of $x$ , in which case our “moral proof” falls short.

Remark: This is no more than a technical obstruction: for $\epsilon$ such that $g(x)=g(x+\epsilon)$ , we simply replace our left-most difference quotient by $f'(g(x))$ . (This all works by continuity of $g$ .)

Despite this obstruction, our moral proof of the chain rule is elegant in form and obvious in execution. As one might expect, this simplicity has categorical significance: the chain rule encodes precisely the fact that the derivative (and generalizations) give functors from the category of differentiable manifolds to the category of tangent bundles.

— PART II (LOGARITHMS) —

By now, I hope that this post has made two opinions clear: that the derivative is fundamentally a linear object, and that the chain rule respects this linearity in ways that the product rule does not. This motivates our present interest in logarithms, as a method to turn products into sums. As it turns out, we’ll need just one Lemma:

Lemma: Let $\log x$ be defined on an open set $U$ . Then $\log' x =1/x$ for all $x \in U$ , and $\log xy = \log x + \log y$ for $x,y \in U$ (provided that $xy \in U$ ).

Remark: In most usual definitions of the logarithm, one of these statements will be obvious. If the logarithm is defined as an anti-derivative of $1/x$ , for example (making our first assertion tautological), then a result due to Saint-Vincent (1647) implies that $\log (xy)=\log x + \log y$ . On the other hand, it is also common to first define the logarithm as inverse to the exponential (which gives the stated functional equation), and prove that $e^x$ equals its own derivative. (This, in turn, can be used to define $e$ .)

We are now primed to present a second proof of the product rule. Regrettably, we must finally break the symmetry we’ve created between the product rules for differentiable (resp. complex-differentiable, i.e. holomorphic) functions defined between subsets of $\mathbb{R}$ (resp. subsets of $\mathbb{C}$ ).

Proposition: Suppose that $f$ and $g$ are differentiable and non-vanishing on the open set $U$ . Then $fg$ is differentiable on $U$ , and we have

$(fg)'(x) = f'(x)g(x)+g'(x)f(x)$ for all $x \in U$ .

Proof: Let $x \in V$ , a connected component of $U$ . If $f$ and $g$ are functions of a real variable, we may assume by continuity that $f,g > 0$ on $V$ (negating $f$ or $g$ if necessary). Then $\log fg= \log f + \log g$ on $V$ , and implicit differentiation gives

$\displaystyle \frac{(fg)'}{fg}=\frac{f'}{f}+\frac{g'}{g}=\frac{f'g+fg'}{fg}, \hspace{32 mm} (1)$

in which we have used the chain rule and our Lemma. Our result follows by clearing denominators.

In the complex case, the fact that $f$ and $g$ are non-vanishing throughout $V$ gives the existence of local branches to the logarithm. With these branches, our proof carries through as in the real case. $\square$

As it stands, this version of the product rule has been artificially weakened by the hypothesis that $f$ and $g$ be non-vanishing on $U$ . In this sense, I would compare it to our (somewhat incomplete) proof of the chain rule – an elegant proof with some technical holes pushed under the rug.

On the other hand, this gap is not so hard to fill: borrowing some intuition from perturbation theory, we are led to consider functions of the form $(f+\epsilon)(g+\epsilon)$ , in which the perturbation $\epsilon$ is chosen such that $f+\epsilon$ and $g+\epsilon$ become locally non-vanishing (about a fixed point in the domain of differentiability of $f$ and $g$ ). Then

$\left((f+\epsilon)(g+\epsilon)\right)'=(f+\epsilon)(g+\epsilon)'+(f+\epsilon)'(g+\epsilon)$

$=fg'+gf'+\epsilon(g'+f')$

by our Proposition. On the other hand, linearity of the differential gives

$\left((f+\epsilon)(g+\epsilon)\right)'=(fg)'+\epsilon(g'+f')$ .

It follows that $(fg)'=fg'+f'g$ , after cancellation, i.e. the product rule. $\square$

— PART III (THE PROBLEM WITH RINGS) —

And now, for some last-minute abstract nonsense:

Having seen these two proofs, it’s obvious why our first dominates the classroom, despite the haunting simplicity of line (1). Less obvious — and far more troubling — is the inherent difficulty in relating additive and multiplicative constructs (cf. the Goldbach and abc Conjectures), a thorn in the side of number theorists and algebraists the world over.

When multiplication and addition do behave (in some predetermined context), it is frequently because there exist sufficiently well-behaved analogues of the logarithm and exponential functions. In the case at hand (asking how a certain linear operator respects multiplication of functions), it has been enough to know that the logarithm satisfies a characteristic functional equation and has a well-understood derivative.

In the case of formal group laws over a ring of positive characteristic, for example, the non-existence of logarithms/exponentials is central to the field’s depth. In certain cases, these formal group laws give rise to actual group laws, e.g. on the completion of the base ring with respect to the $\mathfrak{m}$ -adic topology. In particular, $p$ -adic convergence of the logarithm affords us — in a small but tangible way — a better understanding of the group structure on an elliptic curve.

— EXERCISES —

Exercise: The product rule trivializes if we assume some multivariable calculus. Let $z=f(x)$ and $w = g(x)$ , and define $P(z,w)=zw$ . Calculate

$\displaystyle\frac{d}{dx} P(z,w)$

using the multivariate chain rule.

Exercise: Given a matrix Lie group $G \subset GL_n(\mathbb{C})$ , let $\mathfrak{g} \subset M_n(\mathbb{C})$ denote the set of matrices $X$ such that $\mathrm{exp}(X) \subset G$ , where $\mathrm{exp}$ denotes the matrix exponential. Then $\mathfrak{g}$ is a Lie algebra, known as the Lie algebra associated to $G$ . Find the Lie algebras associated to $GL_n(\mathbb{C})$ and $SL_n(\mathbb{C})$ .