 Actually, I think that almost all mainstream uses of von Neumann
 ordinals can be eliminated in favor of mappings between wellordered
 sets.

 That's fine for logicians, but then your metatheory had better be very
 clever at hiding these locutions, or you'll never sell it to the
 mathematicians.
Well, that's very true. But I feel that if we are going to have a
sophisticated metatheory, we shouldn't be afraid to use it to elide details
of the underlying theory. For example, as I recall, in the Axiom computer
algebra system, N and R are formally different types, but the user almost
never sees the type cast (it will even cast 1 to the unit in a ring of
matrices and other stuff like that). Another example might be the use of
proper classes in ZF, which is normally conceived of as metatheory.
The importance of this sort of elision depends very much on the foundational
system. For example in type theory the notion of function is normally
primary, and function application is primitive. By contrast, in set theory
it is not trivial, but normal users will usually want to pretend that it is.
This might explain the relative scarcity of theorem provers based on set
theory: a substantial effort is required to insulate the user from the
underlying "implementation".
On the other hand, I agree that Ken Kunen's examples may prove awkward to
handle using metatheory.
 Let F be a field, and now form the sequence of
 field extensions,
 F_0 = F , F_1 , F_2, ...
 where for each natural number n, F_{n+1} adds a root of some polynomial.
 Now, take the union of all the F_n.
In simple type theory at least, constructions like this are very awkward. I
guess with sufficient care one can usually find a big enough universe type
in which to embed such ascending chains, but it's messy. A possible
application for the standard theorems about first order logic might be to
show that one can restrict one's attention to domains like the terms of some
first order language. Which leads me on to:
>> For example, I recently wanted to prove in HOL that for any infinite set A,
>> there is a bijection A x A <> A. ............
>> (Of course the cutest proof is to apply the upward LowenheimSkolem theorem
>> to a first order theory of surjective pairing, the assertion being easy for
>> the special case of the natural numbers. In a higher order formalism
>> no metatheory is required to do this as the syntax and interpretation of
>> first order sentences is expressible in the object logic.)
 Interesting idea. Does this really work in HOL? It would seem to me
 that even given the syntax and semantics of firstorder sentences, the
 upward LowenheimSkolem theorem is not a completely trivial result. The
 proofs I know use as a lemma the fact that A and A x A (and the set of
 all finite sequences from A) have the same cardinality.
I think you are right that this application is somewhat circular. One can
obviously force up the cardinality of a model by adding more constants and
appealing to compactness, but then to get exactly the cardinality you want,
you'd need to show that the cardinality of one of these "term model" domains
is the same as the number of constants you've added, and this will require
something like k^2 = k as a lemma. (Anyway at least this circularity shows
that upward LS is equivalent to k^2 = k and hence to AC).
I didn't mean to suggest that the LS proof was the simplest  it certainly
isn't that  merely that it's rather nice to apply logical metatheorems to
get "straight" mathematical results.
I am working (in a desultory way) on doing some proofs like this in HOL. I
don't have a *constructive* proof that it works yet, but I am fairly
confident. The main difficulty is that since types cannot be quantified over
in the HOL logic except at the sequent level (like variables in PRA I guess)
it's not easy to speak about "all domains". I believe one can normally get
round this in practice, and in any case, an extension to the HOL logic in
this direction, proposed by Tom Melham, is likely to be experimented with in
the near future.
John.
P.S. Didn't Cantor first regard ordinals and cardinals as distinct from sets
themselves? I'm not really sure of my history here, but if so, he got a fair
way without an embedding in set theory.