Re: Inference rules in the Veda system

Vaughan Pratt (pratt@cs.stanford.edu)
Thu, 14 Mar 1996 12:19:22 -0800

I did the proof of Victor Makarov challange:
1^2 + ... + n^2 = n*(n+1)*(2*n+1)/6

Nice.

While it's probably not relevant to the point of the example, which was
to exercise an already given axiomatization, people might be interested
in knowing how Boole would prove this using his difference calculus in
"A Treatise on the Calculus of Finite Differences." The idea is really
cool, in that it redoes calculus (for polynomials like x^3 anyway)
without infinitesimals. h doesn't have to go to zero but can be left
at Planck's constant, or 1 (as in the above example), or the size of
the universe, as convenient. Yet the differentials and integrals are
exact! Moreover they can be used to calculate the above result simply
by multiplying the vector (0,0,1,0) (the coefficients of the polynomial
n^2, low order first) by four 4x4 matrices in turn, to get the above
answer (0,1/6,1/2,1/3).

Here's the main trick. Treat x^i as the limit, as h tends to zero, of
x(x-h)(x-2h)...(x-(n-1)h).

Now consider the standard definition of derivative,

f(x+h) - f(x)
lim -------------
h->oo h

We can calculate the derivative of f(x) = x^3 by setting x^3 =
x(x-h)(x-2h), plugging it into the body of the limit, and then taking
the limit *after* doing this calculation, since lim+ = +lim.

Now f(x+h) = (x+h)x(x-h), while f(x) = x(x-h)(x-2h). The difference is
therefore x(x-h)2h. After division by h, we have 2x(x-h).

Now taking the limit of this is a breeze compared to the usual fiddling
with powers of h---we simply note that x-h tends to x whence 2x(x-h)
tends to 2x^2 as h tends to 0.

BUT we can go one step better. Notice that 2x(x-h) = 2x^2 *even before
h tends to 0*. So if we keep redefining x^2 while we're taking the
limit, the *representation* of the answer as 2x^2 remains fixed!

But this now gives us a calculus. It's not *the* differential calculus
where h is always infinitesimal, instead it's Boole's *difference*
calculus where h is whatever you want it to be.

Boole only treated h=1, but this can be understood simply as Boole
choosing his units to *make* h=1, like particle physicists do when they
choose their units to make \hbar=c=1. The passage to general h is not
a big deal except when you try to make 1 tend to 0 (but that can be
understood as scaling everything *but* the unit of measure up to
infinity). Here we don't have to move h incrementally, though we do
need to transform between the h=0 and h=1 *coordinates*. So henceforth
let's assume h=1 or h=0, nothing in between.

Integrals are just the inverse of differentials. Hence we can
integrate as inverse differentiation in the usual way. Thus the
integral of x^2 is x^3/3 as usual.

But an integral is just a sum over a half-open interval [a,b) (but see
below re (a,b]). The reasoning behind "half-open" is that every point
has measure h, so for h nonzero, endpoints need to be handled
conservatively. The upper endpoint belongs to the next interval up,
and so cannot be reused in integrating over this interval as well.

So to sum k^2 for k from 1 to n *inclusive*, first transform x^2 (which
we are writing here in the h=0 basis) to the h=1 basis, which makes it
x(x-1) + x.

Now take the discrete integral over the interval [1,n] = [1,n+1),
namely x^3/3 + x^2/2. Plugging the endpoints 1,n+1 into x^3/3 and
taking the difference as usual gives

[ (n+1)^3 - 1^3 + ] / 3 + [ (n+1)^2 - 1^2 ] / 2
= [ (n+1)n(n-1) - 0 ] /3 + [ (n+1)n - 0 ] / 2
= (n+1)n ( 2(n-1) + 3 ) / 6
= (n+1)n(2n+1)/6.

which is the desired answer.

==============================

But you don't need Maple or Mathematica to do all this polynomial
manipulation. *All* this work can be done purely by linear algebra,
manipulating coefficients by linear transformations.

The standard notation for polynomials makes them linear combinations of
monomials. The monomials can be shown to be linearly independent and
thus span the infinite-dimensional vector space they generate in this
way, its points being all and only the polynomials, in your favorite
ring (see picky point at end) R of coefficients.

The above process can be carried out by the following four steps.

1. Transform the vector whose h=0 coordinates are (0,0,1,0,...) to the
h=1 basis, where its coordinates become (0,1,1,0,...). This is
accomplished by multiplying the column vector of h=0 coefficients by the
matrix

1 0 0 0 0 .
0 1 1 1 1 .
0 0 1 3 7 .
0 0 0 1 6 .
0 0 0 0 1 .
. . . . . . . .
{i}
These are *Stirling numbers of the second kind*, denoted { } or {i,j}
{j}
for the entry at row j, column i. These count the number of partitions
of i things into j blocks. Thus the sum of column i must be the number
of partitions of i, the *Bell numbers*, which run
1,1,2,5,15,52,203,...

2. Integrate, transforming (0,1,1,0,...) to its indefinite integral
(0,0,1/2,1/3,0,...). Integration of polynomials is a linear
transformation of their vector space which in the h=1 basis is
represented by the matrix

0 0 0 0 .
1 0 0 0 .
0 1/2 0 0 .
0 0 1/3 0 .
. . . . . .

This of course is the inverse of differentation, which is integer
valued, a band 1,2,3,... on the other side of the diagonal.

3. Transform back to h=0 coordinates. This transforms
(0,0,1/2,1/3,0,...) to (0,1/6,-1/2,1/3,0,...).

For this we use the inverse of the matrix for the Stirling numbers of
the second kind, namely

1 0 0 0 .
0 1 -1 2 .
0 0 1 -3 .
0 0 0 1 .
. . . . . . . ,

the *Stirling numbers of the first kind*. The entry at row j column i
[ i ]
is denoted | | or [i,j], and counts the number of permutations of i
[ j ]
things having j cycles, with the sign giving the parity of the
associated permutations. (So the sum of row i>1 must be 0 while its
absolute sum must be i!.)

4. Evaluate. First do this at [0,n) rather than at [1,n+1), which is
easier, being just the identity transform (a nice feature of the
interval [0,n)). So we still have (0,1/6,-1/2,1/3,0,...).

To move the interval up a notch to [1,n+1), note that adding one to the
variable of a polynomial is itself a linear tranformation, given by

1 1 1 1 .
0 1 2 3 .
0 0 1 3 .
0 0 0 1 .
. . . . . . .

These are the *binomial coefficients*, or Pascal's Triangle. When we
multiply by that we get (0,1/6,1/2,1/3,0,...) But these are the
desired coefficients of n(n+1)(2n+1)/6.

One way to think of this last step is that it is the linear
transformation that converts between the two ways of parcelling out the
endpoints, namely to the interval above or below. Combinatorial
mathematicians normally give each endpoint to the interval above,
resulting in half-open intervals of the form [a,b). But *normal*
people don't count from 0 to n-1, they count from 1 to n, and for that
counting system you should give the endpoints to the interval below,
leading to the other kind of half-open interval, (a,b]. The original
problem was to sum k^2 over (0,n]. The Pascal transform (I don't think
it *has* an official name, so why not) describes how to get from the
[a,b) type of domain to the (a,b] kind when a and b themselves are held
fixed. Its inverse gets you back:

1 -1 1 -1 .
0 1 -2 3 .
0 0 1 -3 .
0 0 0 1 .
. . . . . . .

It is noteworthy that all the matrices we used except those for
differentiation and integration had integer coefficients for both
themselves and their inverses.

Vaughan Pratt
---------------------
(Picky point: when R is not a field, e.g. the ring of integers, we
should say R-module rather than vector space. But although polynomials
over rings always have derivatives they don't always have integrals.
When R is a field however they do have integrals, so for the
integration problem at hand the picky point is immaterial since we will
pick R to be a field to make sure we can integrate. The rationals,
reals, or complexes are all equally fine for *polynomial* integration,
including rational polynomials.)