First published Mon Aug 13, 2018

The main part of our article covers these exciting investigations for an expanded Hilbert Program through 1999—with special, detailed attention to results and techniques that by now can be called “classical” and are of continued interest. Newer, but still closely connected developments are sketched in Appendices: the proof theory of set theories in Appendix D combinatorial independence results in Appendix E, and provably total functions in Appendix F. Here (infinitary) sequent calculi and suitable systems of ordinal notations are crucial proof theoretic tools. However, we discuss in section 4.2 also Gödel’s Dialectica Interpretation and some of its extensions as an alternative for obtaining relative consistency proofs and describe in section 5.2.1 the systematic attempt of completing the incomplete through recursive progressions. Both topics are analyzed further in Appendix C.2 and Appendix B, respectively. To complete this bird’s eye view of our article, we mention that the Epilogue, section 6, not only indicates further proof theoretic topics, but also some directions of current research that are connected to proof theory and of deep intrinsic interest. We have tried to convey the vibrancy of a subject that thrives on concrete computational and (meta-) mathematical work, but also invites and is grounded in general philosophical reflection.

In an attempt to mediate between conflicting foundational positions, Hilbert shifted issues, already around 1900, from a mathematical to a vaguely conceived metamathematical level. That approach was rigorously realized in the 1920s, when he took advantage of the possibility of formalizing mathematics in deductive systems and investigated the underlying formal frames from a strictly constructive, “finitist” standpoint. Hilbert’s approach raised fascinating metamathematical questions—from semantic completeness through mechanical decidability to syntactic incompleteness; however, the hoped-for mathematical resolution of the foundational issues was not achieved. The failure of his finitist consistency program raised and deepened equally fascinating methodological questions. A broadened array of problems with only partial solutions has created a vibrant subject that spans computational, mathematical, and philosophical issues—with a rich history.

Proof theory is not an esoteric technical subject that was invented to support a formalist doctrine in the philosophy of mathematics; rather, it has been developed as an attempt to analyze aspects of mathematical experience and to isolate, possibly overcome, methodological problems in the foundations of mathematics. The origins of those problems, forcefully and sometimes contentiously formulated in the 1920s, are traceable to the transformation of mathematics in the nineteenth century: the emergence of abstract mathematics, its reliance on set theoretic notions, and its focus on logic in a broad, foundational sense. Substantive issues came to the fore already in the mathematical work and the foundational essays of Dedekind and Kronecker; they concerned the legitimacy of undecidable concepts, the existence of infinite mathematical objects, and the sense of non-constructive proofs of existential statements.

1. Proof Theory: A New Subject

Hilbert viewed the axiomatic method as the crucial tool for mathematics (and rational discourse in general). In a talk to the Swiss Mathematical Society in 1917, published the following year as Axiomatisches Denken (1918), he articulates his broad perspective on that method and presents it “at work” by considering, in detail, examples from various parts of mathematics and also from physics. Proceeding axiomatically is not just developing a subject in a rigorous way from first principles, but rather requires, for advanced subjects, their deeper conceptual organization and serves, for newer investigations, as a tool for discovery. In his talk Hilbert reflects on his investigations of the arithmetic of real numbers and of Euclidean geometry from before 1900. We emphasize the particular form of his axiomatic formulations; they are not logical formulations, but rather mathematical ones: he defines Euclidean space in a similar way as other abstract notions like group or field; that’s why we call it structural axiomatics.[1] However, Hilbert turns from the past and looks to the future, pointing out a programmatic direction for research in the foundations of mathematics; he writes:

To conquer this field [concerning the foundations of mathematics] we must turn the concept of a specifically mathematical proof itself into an object of investigation, just as the astronomer considers the movement of his position, the physicist studies the theory of his apparatus, and the philosopher criticizes reason itself.

He then asserts, “The execution of this program is at present still an unsolved task”. During the following winter term 1917–18, Hilbert—with the assistance of Paul Bernays—gave a lecture course entitled Prinzipien der Mathematik. Here modern mathematical logic is invented in one fell swoop and completes the shift from structural to formal axiomatics. This dramatic shift allows the constructive, elementary definition of the syntax of theories and, in particular, of the concept of proof in a formal theory. This fundamental insight underpins the articulation of the consistency problem and seems to open a way of proving, meta-mathematically, that no proof in a formal theory establishes a contradiction.

That perspective is formulated first in a talk Bernays presented in the fall of 1921, published as “Über Hilberts Gedanken zur Grundlegung der Mathematik” (1922). Starting with a discussion of structural axiomatics and pointing out its assumption of a system of objects that satisfies the axioms, he asserts this assumption contains “something so-to-speak transcendent for mathematics”. He raises the question, “which principled position should be taken with respect to it?” Bernays believes that it might be perfectly coherent to appeal to an intuitive grasp of the natural number sequence or even of the manifold of real numbers. However, that could not be an intuition in any primitive sense and would conflict with the tendency of the exact sciences to use only restricted means to acquire knowledge.

Under this perspective we are going to try, whether it is not possible to give a foundation to these transcendent assumptions in such a way that only primitive intuitive knowledge is used. (Bernays 1922: 11)

Meaningful mathematics is to be based, Bernays demands, on primitive intuitive knowledge that includes, however, induction concerning natural numbers—both as a proof and definition principle. In the outline for the lectures Grundlagen der Mathematik to be given in the winter term 1921–22, Bernays discusses a few weeks after his talk “constructive arithmetic” and then the “broader formulation of constructive thought”:

Construction of the proofs, by means of which the formalization of the higher inferences is made possible and the consistency problem is becoming accessible in a general way.

Bernays concludes the outline by suggesting, “This would be followed by the development of proof theory”. The outline was sent to Hilbert on 17 October 1921 and it was followed strictly in the lectures of the following term—with only one terminological change: “constructive” in the above formulations is turned into “finitist”.[2]

Bernays’s notes of the 1921/22 lectures reflect the consequence of that change in attitude. They contain a substantial development of “finitist arithmetic” and “finitist logic” in a quantifier-free formalism. Finitist arithmetic involves induction and primitive recursion[3] from the outset, and the central metamathematical arguments all proceed straightforwardly by induction on proof figures. The third part of these lectures is entitled The grounding of the consistency of arithmetic by Hilbert’s new proof theory. Here we find the first significant consistency proof—from a finitist perspective.[4] The proof is sketched in Hilbert’s Leipzig talk (Hilbert 1923: 184) and was presented in a modified form during the winter term of 1922/23; in that form the proof is given in Ackermann 1925: 4–7. Ackermann’s article was submitted for publication in early 1924, and by then the proof had taken on a certain canonical form that is still found in the presentation of Hilbert and Bernays 1934: 220–230. Let us see what was achieved by following Ackermann’s concise discussion.

1.1 Hilbert’s Ansatz and Results

The proof is given in section II of Ackermann’s paper entitled, tellingly, The consistency proof before the addition of the transfinite axioms. Ackermann uses a logical calculus in axiomatic form that is taken over from Hilbert’s lectures and is discussed below in section 2. Here we just note that it involves two logical rules, namely, modus ponens and substitution (for individual, function and statement variables) in axioms. The non-logical axioms concern identity, zero and successor, and recursion equations that define primitive recursive functions. The first step in the argument turns the linear proof into a tree, so that any formula occurrence is used at most once as a premise of an inference (Auflösung in Beweisfäden); this is done in preparation for the second step, namely, the elimination of all necessarily free variables (Ausschaltung der Variablen); in the third step, the numerical value of the closed terms is computed (Reduktion der Funktionale). The resulting syntactic configurations, a Beweisfigur, contains now only numerical formulae that are built up from equations or inequations between numerals and Boolean connectives; these formulae can be effectively determined to be true or false. By induction on the “Beweisfigur” one shows that all its component formulae are true; thus, a formula like \(0

e 0\) is not provable. The induction principle can be directly incorporated into these considerations when it is formulated as a rule for quantifier-free statements. That was not done in Ackermann’s discussion of the proof, but had been achieved already by Hilbert and Bernays in their 1922/23 lectures.

These proof theoretic considerations are striking and important as they involve for the first time genuine transformations of formal derivations. Nevertheless, they are preliminary as they concern a quantifier-free theory that is a part of finitist mathematics and need not be secured by a consistency proof. What has to be secured is “transfinite logic” with its “ideal elements”, as Hilbert put it later. The strategy was direct and started to emerge already in 1921. First, introduce functional terms by the transfinite axiom[5]

\[A(a) \to A(\varepsilon x\ldot A(x))\]

and define quantifiers by

\[ \exists x A(x) \leftrightarrow A(\varepsilon x\ldot A(x)) \]

and

\[ \forall x A(x) \leftrightarrow A(\varepsilon x\ldot

eg A(x)). \]

Using the epsilon terms, quantifiers can now be eliminated from proofs in quantificational logic, thus transforming them into quantifier-free ones. Finally, the given derivation allows one, so it was conjectured, to determine numerical values for the epsilon terms. In his Leipzig talk of September 1922, published in 1923, Hilbert discussed this Ansatz for eliminating quantifiers and reducing the transfinite case to that of the quantifier-free theory. He presented the actual execution of this strategic plan only “for the simplest case” (in Hilbert 1923: 1143–1144). However, the talk was crucial in the development of proof theory and the finitist program: “With the structure of proof theory, presented to us in the Leipzig talk, the principled form of its conception had been reached”. That is how Bernays characterizes its achievement in his essay on Hilbert’s investigations of the foundations of arithmetic (1935: 204)

Ackermann continued in section III of his 1925 at the very spot where Hilbert and Bernays had left off. His paper, submitted to Mathematische Annalen in March of 1924, and the corrective work he did in 1925 led to the conviction that the consistency of elementary arithmetic had been established. The corrective work had been done to address difficulties von Neumann had pointed out, but was not published by Ackermann; it was only presented in the second volume of Hilbert and Bernays 1939 (pp. 93–130).[6] Von Neumann’s own proof theoretic investigations, submitted to Mathematische Zeitschrift in July 1925, were published under the title Zur Hilbertschen Beweistheorie in 1927. Hilbert’s 1928 Bologna Lecture prominently took Ackermann’s and von Neumann’s work as having established the consistency of elementary arithmetic, the proof making use only of finitist principles. Let F be a theory containing exclusively such principles, like primitive recursive arithmetic PRA; the principles of PRA consist of the Peano axioms for zero and successor, the defining equations for all primitive recursive functions (defined in note 3), and quantifier-free induction. Now the significance of a consistency proof in F can be articulated as follows:

Theorem 1.1 Let T be a theory that contains a modicum of arithmetic and let A be a \(\Pi^0_1\)-statement, i.e., one of the form \(\forall x_1\ldots\forall x_n\,P(x_1,\ldots,x_n)\) with quantifiers ranging over naturals and P a primitive recursive predicate, i.e., a predicate with a primitive recursive characteristic function. If F proves the consistency of T and T proves A, then F proves A.

This theorem can be expressed and proved in PRA and ensures that a T-proof of a “real”, finitistically meaningful statement A leads to a finitistically valid statement. This point is made clear in Hilbert’s 1927-Hamburg lecture (Hilbert 1927). There he takes A to be the Fermat proposition and argues that if we had a proof of A in a theory containing “ideal” elements, a finistist consistency proof for that theory would allow us to transform that proof into a finitist one.

The belief that Ackermann and von Neumann had established the consistency of elementary arithmetic was expressed as late as December 1930 by Hilbert in his third Hamburg talk (Hilbert 1931a) and by Bernays in April 1931 in a letter to Gödel (see Gödel 2003: 98–103). Bernays asserts there that he has “repeatedly considered [Ackermann’s modified proof] and viewed it as correct”. He continues, referring to Gödel’s incompleteness results,

On the basis of your results one must now conclude, however, that that proof cannot be formalized within the system Z [of elementary number theory]; this must in fact hold true even if the system is restricted so that, of the recursive definitions, only those for addition and multiplication are retained. On the other hand, I don’t see at which place in Ackermann’s proof the formalization within Z should become impossible, …

At the end of his letter, Bernays mentions that Herbrand misunderstood him in a recent conversation on which Herbrand had reported in a letter to Gödel with a copy to Bernays. Not only had Herbrand misunderstood Bernays, but Bernays had also misunderstood Herbrand as to the extent of the latter’s consistency result that was to be published a few months later as Herbrand 1931. Bernays understood Herbrand as having claimed that he had established the consistency of full first-order arithmetic: Herbrand’s system is indeed a first-order theory with a rich collection of finitist functions, but it uses the induction principle only for quantifier-free formulae.[7] Gödel asserted in December 1933 that this theorem of Herbrand’s was even then the strongest result that had been obtained in the pursuit of Hilbert’s finitist program, and he formulated the result in a beautiful informal way as follows:

If we take a theory, which is constructive in the sense that each existence assertion made in the axioms is covered by a construction, and if we add to this theory the non-constructive notion of existence and all the logical rules concerning it, e.g., the law of the excluded middle, we shall never get into any contradiction. (Gödel 1933: 52)

Gödel himself had been much more ambitious in early 1930; his goal was then to prove the consistency of analysis! According to Wang (1981: 654), his idea was “to prove the consistency of analysis by number theory, where one can assume the truth of number theory, not only the consistency”. The plan for establishing the consistency of analysis relative to number theory did not work out, instead Gödel found that sufficiently strong formal theories like Principia Mathematica and Zermelo-Fraenkel set theory are (syntactically) incomplete.

1.2 Incompleteness and a Reduction

In 1931 Gödel published a paper (1931a) that showed that there are true arithmetic statements that cannot be proved in the formal system of Principia Mathematica, assuming PM to be consistent. His methods not only applied to PM but to any formal system that contains a modicum of arithmetic. A couple of months after Gödel had announced this result at a conference in Königsberg in September 1930, von Neumann and Gödel independently realized that a striking corollary could be drawn from the incompleteness theorem. Every consistent and effectively axiomatized theory that allows for the development of basic parts of arithmetic cannot prove its own consistency. This came to be known as the second incompleteness theorem. (For details on these theorems and their history see appendix A.4) The second incompleteness theorem refutes the general ambitions of Hilbert’s program under the sole and very plausible assumption that finitist mathematics is contained in one of the formal theories of arithmetic, analysis or set theory. As a matter of fact, contemporary characterizations of finitist mathematics have elementary arithmetic as an upper bound.[8] In response to Gödel’s result, Hilbert attempted in his last published paper (1931b) to formulate a strategy for consistency proofs that is reminiscent of his considerations in the early 1920s (when thinking about the object theories as constructive) and clearly extends the finitist standpoint. He introduced a broad constructive framework that includes full intuitionist arithmetic and suggested extendibility of the new “method” to “the case of function variables and even higher sorts of variables”. He also formulated a new kind of rule that allowed the introduction of a new axiom \(\forall x A(x)\) as soon as all the numerical instances \(A(n)\) had been established by finitist proofs; in 1931 that is done for quantifier-free \(A(x)\), whereas in 1931b that is extended to formulae of arbitrary complexity. The semi-formal calculi, which articulate the broader framework, are based on rules that reflect mathematical practice, but also define the meaning of logical connectives. Indeed, Hilbert’s reasons for taking them to be evidently consistent are expressed in a single sentence: “All transfinite rules and inference schemata are consistent; for they amount to definitions”. Adding the tertium non datur in the form

\[{\forall x A(x)} \lor {\exists x

eg A(x)}\]

yields now the classical version of the theory and it is that addition that has to be justified.[9] Hilbert’s problematic considerations for this new metamathematical step inspired Gentzen’s “Urdissertation” when he began working in late 1931 on a consistency proof for elementary arithmetic.[10]

As part of his “Urdissertation”, Gentzen had established by the end of 1932 the reduction of classical to intuitionist arithmetic, a result that had also been obtained by Gödel. Gentzen’s investigations led, finally in 1935, to his first consistency proof for arithmetic. In the background was a normal form theorem for intuitionist logic that will be discussed in the next section together with Gentzen’s actual dissertation and the special calculi he introduced there. Now we just formulate the Gentzen-Gödel result “connecting” classical first-order number theory PA with its intuitionist version HA. The non-logical principles of these theories aim at describing \(\fN\), the arguably most important structure in mathematics, namely, Dedekind’s simply infinite system \(\bbN\) together with zero, successor, multiplication, exponentiation and the less-than relation:

\[{\fN}=(\bbN; 0^{\bbN}, S^{\bbN},+^{\bbN},\times^{\bbN},E^{\bbN},

They are formulated in the first-order language that has the relation symbols =, <, the function symbols S, +, \(\times\), E and the constant symbol 0. The axioms comprise the usual equations for zero, successor, addition, multiplication, exponentiation, and the less-than relation. In addition, the induction principle is given by the schema \[\tag{IND} {F(0)} \land {\forall x[F(x)\to F(Sx)]\to \forall x F(x)} \] for all formulae \(F(x)\) of the language. These principles together with classical logic constitute the theory of first order arithmetic or first order number theory, also known as Dedekind-Peano arithmetic, PA; together with intuitionist logic they constitute intuitionistic first order arithmetic commonly known as Heyting-arithmetic, HA.

Now we are considering the syntactic translation \(\tau\) from the common language of PA and HA into its “negative” fragment that replaces disjunctions \(A\lor B\) by their de Morgan equivalent \(

eg (

eg A\land

eg B)\) and existential statements \(\exists x A(x)\) by \(

eg \forall x

eg A(x)\). The reductive result obtained by Gentzen and Gödel in 1933 is now stated quite easily:

Theorem 1.2 PA proves the equivalence of A and \(\tau(A)\) for any formula A. If PA proves A, then HA proves \(\tau(A)\).

For atomic sentences like \(1

e 1\) the translation \(\tau\) is clearly the identity, and the provability of \(1

e 1\) in PA would imply its provability in HA. Thus, PA is consistent relative to HA. This result is technically of great interest and had a profound effect on the perspective concerning the relationship between finitism and intuitionism: finitist and intuitionist mathematics were considered as co-extensional; this theorem showed that intuitionist mathematics is actually stronger than finitist mathematics. Thus, if the intuitionist standpoint is taken to guarantee the soundness of HA, then it guarantees the consistency of PA. The corresponding connection between classical and intuitionist logic had been established already by Kolmogorov (1925) who not only formalized intuitionist logic but also observed the translatability of classical into intuitionist logic. His work, though, seems not to have been noticed at the time or even in 1932, when Gentzen and Gödel established their result for classical and intuitionist arithmetic.

The foundational discussion concerning extended “constructive” viewpoints is taken up in section 4. There, and throughout our paper the concepts of “proof-theoretic reducibility” and “proof-theoretic equivalence” will play a central role. The connection between PA and HA is paradigmatic and leads to the notion of proof-theoretic reduction. Before we can furnish a precise definition, we should perhaps stress that many concepts can be expressed in the language of PRA (as well as PA) via coding, also known as Gödel numbering. Any finite object such as a string of symbols or an array of symbols can be coded via a single natural number in such a way that the string or array can be retrieved from the number when we know how the coding is done. Typical finite objects include formulae in a given language and also proofs in a theory. Talk about formulae or proofs can then be replaced by talking about predicates of numbers that single out the codes of formulae and proofs, respectively. We then say that the concepts of formula and proof have been arithmetized and thereby rendered expressible in the language of PRA.

Definition 1.3 Let \(\bT_1\), \(\bT_2\) be a pair of theories with languages \(\cL_1\) and \(\cL_2\), respectively, and let \(\Phi\) be a (primitive recursive) collection of formulae common to both languages. Furthermore, \(\Phi\) should contain the closed equations of the language of PRA. We then say that \(\bT_1\) is proof-theoretically \(\Phi\)-reducible to \(\bT_2\), written \(\bT_1\leq_{\Phi}\bT_2\), if there exists a primitive recursive function f such that \[\tag{1} \PRA\vdash \forall x \forall y\,[ {\rform_{\Phi}(x)} \land {\proof_{\bT_1}( y,x)} \rightarrow {\proof_{\bT_2}(f(y),x)}].\] \[\tag{1} \PRA\vdash \forall x \forall y\,[ {\rform_{\Phi}(x)} \land {\proof_{\bT_1}( y,x)} \rightarrow {\proof_{\bT_2}(f(y),x)}].\] Here \(\rform_{\Phi}\) and \(\proof_{\bT_i}\) are arithmetized formalizations of \(\Phi\) and the proof relation in \(\bT_i\), respectively, i.e., \(\rform_{\Phi}(x)\) expresses that x is the Gödel number of a formula in \(\Phi\) while \(\proof_{\bT_i}(y,x)\) expresses that y codes a proof in \(\bT_i\) of a formula with Gödel number x. \(\bT_1\) and \(\bT_2\) are said to be proof-theoretically \(\Phi\)-equivalent, written \(\bT_1\equiv_{\Phi}\bT_2\), if \(\bT_1\leq_{\Phi}\bT_2\) and \(\bT_2\leq_{\Phi}\bT_1\). The appropriate class \(\Phi\) is revealed in the process of reduction itself, so that in the statement of theorems we simply say that \(\bT_1\) is proof-theoretically reducible to \(\bT_2\) (written \(\bT_1\leq \bT_2\)) and \(\bT_1\) and \(\bT_2\) are proof-theoretically equivalent (written \(\bT_1 \equiv \bT_2\)), respectively. Alternatively, we shall say that \(\bT_1\) and \(\bT_2\) have the same proof-theoretic strength when \(\bT_1\equiv \bT_2\). In practice, if \(\bT_1\equiv \bT_2\) is shown via proof-theoretic means[11] this always entails that the two theories prove at least the same \(\Pi^0_2\) sentences (those of the complexity of the twin prime conjecture). The complexity of formulae of PRA is stratified as follows. The \(\Sigma^0_0\) and \(\Pi^0_0\) formulae are of the form \(R(t_1,\ldots,t_n)\) where R is a predicate symbol for an n-ary primitive recursive predicate. A formula is \(\Sigma^0_{k+1}\) (\(\Pi^0_{k+1}\)) if it is of the form \[ \exists y_1\ldots \exists y_m\,F(y_1,\ldots,y_m) (\forall y_1\ldots \forall y_m\,F(y_1,\ldots,y_m)) \] \[ \exists y_1\ldots \exists y_m\,F(y_1,\ldots,y_m) (\forall y_1\ldots \forall y_m\,F(y_1,\ldots,y_m)) \] with \(F(y_1,\ldots,y_m)\) being of complexity \(\Pi^0_k\) (\(\Sigma^0_k\)). Thus the complexity of a formula is measured in terms of quantifier alternations. For instance \(\Pi^0_2\)-formulae have two alternations starting with a block of universal quantifiers or more explicitly they are of the shape \[\forall x_1\ldots\forall x_n \exists y_1\ldots \exists y_m\,R(x_1,\ldots,x_n,y_1,\ldots,y_m)\] \[\forall x_1\ldots\forall x_n \exists y_1\ldots \exists y_m\,R(x_1,\ldots,x_n,y_1,\ldots,y_m)\] with R primitive recursive.

2. New Logical Calculi

For the reduction of classical elementary number theory to its intuitionist version, Gödel and Gentzen used different logical calculi. Gödel used the system Herbrand had investigated in his 1931, whereas Gentzen employed the formalization of intuitionist arithmetic from Heyting 1930. For his further finitist investigations Gentzen introduced new calculi that were to become of utmost importance for proof theory: natural deduction and sequent calculi.

2.1 From Axioms to Rules: Natural Reasoning

As we noted above, Gentzen had already begun in 1931 to be concerned with the consistency of full elementary number theory. As the logical framework he used, what we now call, natural deduction calculi. They evolved from an axiomatic calculus that had been used by Hilbert and Bernays since 1922 and introduced an important modification of the calculus for sentential logic. The connectives \(\land \) and \(\lor\) are incorporated, and the axioms for these connectives are as follows:

\[\begin{align} A\land B & \to A \\ A\land B & \to B \\ A & \to (B \to A\land B) \\ A & \to A\lor B \\ B & \to A\lor B \\ (A \to C) & \to ((B \to C) \to (A\lor B \to C)) \end{align}\]

Hilbert and Bernays introduced this new logical formalism for two reasons, (i) to be able to better and more easily formalize mathematics, and (ii) to bring out the understanding of logical connectives in methodological parallel to the treatment of geometric concepts in Foundations of geometry. The methodological advantages of this calculus are discussed in Bernays 1927: 10:

The starting formulae can be chosen in quite different ways. A great deal of effort has been spent, in particular, to get by with a minimal number of axioms, and the limit of what is possible has indeed been reached. However, for the purposes of logical investigations it is better to separate out, in line with the axiomatic procedure for geometry, different axiom groups in such a way that each of them expresses the role of a single logical operation.

Then Bernays lists four groups, namely, axioms for the conditional \(\to\), for \(\land \) and \(\lor\) as above, and for negation \(

eg\). The axioms for the conditional are not only reflecting logical properties, but also structural features as in the later sequent calculus (and in Frege’s Begriffsschrift, 1879).

\[\begin{align} A & \to (B \to A)\\ (A \to (A \to B)) & \to (A \to B)\\ (A \to (B \to C)) & \to (B \to (A \to C))\\ (B \to C) & \to ((A \to B) \to (A \to C)) \end{align}\]

As axioms for negation one can choose:

\[\begin{align} A & \to (

eg A \to B)\\ (A \to B) & \to ((

eg A \to B) \to B) \end{align}\]

Hilbert formulates this logical system in Über das Unendliche and in his second Hamburg talk of 1927, but gives a slightly different formulation of the axioms for negation, calling them the principle of contradiction and the principle of double negation:

\[\begin{align} (A \to (B \land

eg B)) &\to

eg A\\

eg

eg A & \to A \end{align}\]

Clearly, the axioms correspond directly to the natural deduction rules for these connectives, and one finds here the origin of Gentzen’s natural deduction calculi. Bernays had investigated in his Habilitationsschrift (1918) rule based calculi. However, in the given context, the simplicity of the metamathematical description of calculi seemed paramount, and in Bernays 1927 (p. 17) one finds the programmatic remark: “We want to have as few rules as possible, rather put much into axioms”.

Gentzen was led to a rule-based calculus with introduction and elimination rules for every logical connective. The truly distinctive feature of this new type of calculus was for Gentzen, however, making and discharging assumptions. This feature, he remarked, most directly reflects a crucial aspect of mathematical argumentation.[12] Here we formulate the distinctive rules that involve contradictions and go beyond minimal logic that has I- and E-rules for all logical connectives. Intuitionist logic is obtained from minimal logic by adding the rule: from \(\perp\) infer any formula A, i.e., ex falso quodlibet. In the case of classical logic, if a proof of \(\perp\) is obtained from the assumption \(

eg A\), then infer A (and cancel the assumption \(

eg A\)). Gentzen discovered a remarkable fact for the intuitionist calculus, having observed that proofs can have peculiar detours of the following form: a formula is obtained by an I-rule and is then the major premise of the corresponding E-rule. For conjunction such a detour is depicted as follows:

\[ \cfrac{\begin{array}{c}\vdots \\ A\end{array} \quad \begin{array}{c}\vdots \\ B\end{array}} {\cfrac{A\land B}{B}} \]

Clearly, a proof of B is already contained in the given derivation. Proofs without detours are called normal, and Gentzen showed that any proof can be effectively transformed via “reduction steps” into a normal one.

Theorem 2.1 (Normalization for intuitionist logic) A proof of A from a set of assumptions \(\Gamma\) can be transformed into a normal proof of A from the same set of assumptions.

Focusing on normal proofs, Gentzen proved then that the complexity of formulae in such proofs can be bounded by that of assumptions and conclusion.

Corollary 2.2 (Subformula property) If \(\cD\) is a normal proof of A from \(\Gamma\), then every formula in \(\cD\) is either a subformula of an element \(\Gamma\) or of A.

As Gentzen recounts matters at the very beginning of his dissertation (1934/35), he was led by the investigation of the natural calculus to his Hauptsatz[13] when he could not extend the considerations to classical logic.

To be able to formulate it [the Hauptsatz] in a direct way, I had to base it on a particularly suitable logical calculus. The calculus of natural deduction turned out not to be appropriate for that purpose.

So, Gentzen focused his attention on sequent calculi that had been introduced by Paul Hertz and which had been the subject of Gentzen’s first scientific paper (1932).

2.2 Sequent Calculi

In his thesis Gentzen introduced a form of the sequent calculus and his technique of cut elimination. As this is a tool of utmost importance in proof theory, an outline of the underlying ideas will be discussed next. The sequent calculus can be generalized to so-called infinitary logics and is central for ordinal analysis. The Hauptsatz is also called the cut elimination theorem.

We use upper case Greek letters \(\Gamma,\Delta,\Lambda,\Theta,\Xi\ldots\) to range over finite lists of formulae. \(\Gamma\subseteq \Delta\) means that every formula of \(\Gamma\) is also a formula of \(\Delta\). A sequent is an expression \(\Gamma\Rightarrow \Delta\) where \(\Gamma\) and \(\Delta\) are finite sequences of formulae \(A_1,\ldots,A_n\) and \(B_1,\ldots, B_m\), respectively. We also allow for the possibility that \(\Gamma\) or \(\Delta\) (or both) are empty. The empty sequence will be denoted by \(\emptyset\). \(\Sigma \Rightarrow \Delta\) is read, informally, as \(\Gamma\) yields \(\Delta\) or, rather, the conjunction of the \(A_i\) yields the disjunction of the \(B_j\).

The logical axioms of the calculus are of the form

\[A \Rightarrow A \]

where A is any formula. In point of fact, one could limit this axiom to the case of atomic formulae A. We have structural rules of the form

\[ \frac{\Gamma \Rightarrow \Delta}{\Gamma' \Rightarrow \Delta'} \qquad \textrm{if } \Gamma \subseteq \Gamma', \Delta \subseteq \Delta'. \]

A special case of the structural rule, known as contraction, occurs when the lower sequent has fewer occurrences of a formula than the upper sequent. For instance, \(A, \Gamma\Rightarrow\Delta, B\) follows structurally from \(A,A,\Gamma \Rightarrow \Delta,B,B\).

Now we list the rules for the logical connectives.

\[\begin{array}{cc} \textrm{Left} & \textrm{Right} \\ \displaystyle \frac{\Gamma \Rightarrow {\Delta,A}}{{

eg A, \Gamma} \Rightarrow \Delta} & \displaystyle \frac{{B, \Gamma} \Rightarrow {\Delta}}{{\Gamma} \Rightarrow {\Delta,

eg B}} \\[2ex] \displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A} \qquad {B,\Lambda} \Rightarrow {\Theta}} {{{A\rightarrow B},\Gamma, \Lambda} \Rightarrow {\Delta, \Theta}} & \displaystyle \frac{{A, \Gamma} \Rightarrow {\Delta, B}}{{\Gamma} \Rightarrow {\Delta, {A \rightarrow B}}} \\[2ex] \displaystyle \frac{{A, \Gamma} \Rightarrow {\Delta}}{{A\land B,\Gamma} \Rightarrow {\Delta}} \quad \frac{{B, \Gamma} \Rightarrow {\Delta}}{{A\land B,\Gamma} \Rightarrow {\Delta}} & \displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A} \qquad {\Gamma} \Rightarrow {\Delta, B}} {{\Gamma} \Rightarrow {\Delta, {A\land B}}} \\[2ex] \displaystyle \frac{{A,\Gamma} \Rightarrow {\Delta} \qquad {B,\Gamma} \Rightarrow {\Delta}} {{{A\lor B},\Gamma} \Rightarrow {\Delta}} & \displaystyle \frac{{\Gamma} \Rightarrow {\Delta, A}}{{\Gamma} \Rightarrow {\Delta, {A\lor B}}} \quad \frac{{\Gamma} \Rightarrow {\Delta, B}}{{\Gamma} \Rightarrow {\Delta, {A\lor B}}} \end{array} \] \[\begin{array}{lccclc} \forall L & \displaystyle \frac{F(t),\Gamma \Rightarrow \Delta}{\forall x F(x),\Gamma \Rightarrow \Delta} & && \forall R & \displaystyle \frac{\Gamma \Rightarrow {\Delta, F(a)}}{\Gamma \Rightarrow {\Delta, \forall x F(x)}} \\[2ex] \exists L & \displaystyle \frac{F(a),\Gamma \Rightarrow \Delta}{\exists x F(x),\Gamma \Rightarrow \Delta} & && \exists R & \displaystyle \frac{\Gamma \Rightarrow {\Delta, F(t)}}{\Gamma \Rightarrow {\Delta, \exists x F(x)}} \end{array} \]

In \(\forall L\) and \(\exists R\), t is an arbitrary term. The variable a in \(\forall R\) and \(\exists L\) is an eigenvariable of the respective inference, i.e., a is not to occur in the lower sequent.

Finally, we have the special Cut Rule

\[ \frac{\Gamma \Rightarrow {\Delta, A} \qquad {A, \Lambda} \Rightarrow \Theta} {{\Gamma, \Lambda} \Rightarrow {\Delta, \Theta}} \tag*{Cut} \]

The formula A is called the cut formula of the inference.

In the rules for logical operations, the formulae highlighted in the premises are called the minor formulae of that inference, while the formula highlighted in the conclusion is the principal formula of that inference. The other formulae of an inference are called side formulae. A proof (also known as a deduction or derivation) \(\cD\) is a tree of sequents satisfying the conditions that (i) the topmost sequents of \(\cD\) are logical axioms and (ii) every sequent in \(\cD\) except the lowest one is an upper sequent of an inference whose lower sequent is also in \(\cD\). A sequent \(\Gamma \Rightarrow \Delta\) is deducible if there is a proof having \(\Gamma \Rightarrow \Delta\) as its bottom sequent.

The Cut rule differs from the other rules in an important respect. With the rules for introducing connectives, one sees that every formula that occurs above the line occurs below the line either directly, or as a subformula of a formula below the line. That is also true for the structural rules. (Here \(A(t)\) is counted as a subformula, in a slightly extended sense, of both \(\exists xA(x)\) and \(\forall xA(x)\).) But in the case of the Cut rule, the cut formula A vanishes. Gentzen showed that such “vanishing rules” can be eliminated.

Theorem 2.3 (Gentzen’s Hauptsatz) If a sequent \(\Gamma \Rightarrow \Delta\) is deducible, then it is also deducible without the Cut rule; the resulting proof is called cut-free or normal.

The secret to Gentzen’s Hauptsatz is the symmetry of left and right rules for all the logical connectives including negation. The proof of the cut elimination theorem is rather intricate as the process of removing cuts interferes with the structural rules. It is contraction that accounts for the high cost of eliminating cuts. Let \(\lvert \cD\rvert\) be the height of the deduction \(\cD\) and let \(rank(\cD)\) be the supremum of the lengths of cut formulae occurring in \(\cD\). Turning \(\cD\) into a cut-free deduction of the same end sequent results, in the worst case, in a deduction of height \(\cH(rank(\cD),\lvert \cD\rvert)\) where \(\cH(0,n)=n\) and \(\cH(k+1,n)=4^{\cH(k,n)}\), yielding hyper-exponential growth.

The sequent calculus we have been discussing allows the proof of classically, but not intuitionistically correct formulae, for example, the law of excluded middle. An intuitionist version of the sequent calculus can be obtained by a very simple structural restriction: there can be at most one formula on the right hand side of the sequent symbol \(\Rightarrow\). The cut elimination theorem is also provable for this intuitionist variant. In either case, the Hauptsatz has an important corollary that parallels that of the Normalization theorem (for intuitionist logic) and expresses the subformula property.

Corollary 2.4 (Subformula property) If \(\cD\) is a cut-free proof of the sequent \(\Gamma\Rightarrow\Delta\), then all formulae in \(\cD\) are subformulae of elements in either \(\Gamma\) or \(\Delta\).

This Corollary has another direct consequence that explains the crucial role of the Hauptsatz for obtaining consistency proofs.

Corollary 2.5 (Consistency) A contradiction, i.e., the empty sequent \(\emptyset \Rightarrow \emptyset\), is not provable.

Proof: Assume that the empty sequent is provable; then, according to the Hauptsatz it has a cut-free derivation \(\cD\). The previous corollary assures us that only empty sequents can occur in \(\cD\); but such a \(\cD\) does not exist since every proof must contain axioms. \(\qed\)

The foregoing results are solely concerned with pure logic. Formal theories that axiomatize mathematical structures or serve as formal frameworks for developing substantial chunks of mathematics are based on logic but have additional axioms germane to their purpose. If they are of the latter kind, such as first-order arithmetic or Zermelo-Fraenkel set theory, they will assert the existence of mathematical objects and their properties. What happens when we try to apply the procedure of cut elimination to theories? Axioms are usually detrimental to this procedure. It breaks down because the symmetry of the sequent calculus is lost. In general, one cannot remove cuts from deductions in a theory T when the cut formula is an axiom of T. However, sometimes the axioms of a theory are of bounded syntactic complexity. Then the procedure applies partially in that one can remove all cuts that exceed the complexity of the axioms of T. This gives rise to partial cut elimination. It is a very important tool in proof theory. For example, it can be used to analyze theories with restricted induction (such as fragments of PA; cf. Sieg 1985). It also works very well if the axioms of a theory can be presented as atomic intuitionist sequents (also called Horn clauses), yielding the completeness of Robinson’s resolution method (see Harrison 2009).

Using the Hauptsatz and its Corollary, Gentzen was able to capture all of the consistency results that had been obtained prior to 1934 including Herbrand’s, that had been called by Gödel in his 1933 “the most far-reaching” result. They had been obtained at least in principle for fragments of elementary number theory; in practice, Gentzen did not include the quantifier-free induction principle. Having completed his dissertation, Gentzen went back to investigate natural deduction calculi and obtained in 1935 his first consistency proof for full first-order arithmetic. He formulated, however, the natural calculus now in “sequent form”: instead of indicating the assumptions on which a particular claim depended by the undischarged ones in its proof tree, they are attached now to every node of the tree. The “sequents” that are being proved are of the form \(\Gamma \Rightarrow A\), where all the logical inferences are carried out on the right hand side. This proof was published only in 1974; it was subsequently analyzed most carefully in Tait 2015 and Buchholz 2015. It was due to criticism of Bernays and Gödel that Gentzen modified his consistency proof quite dramatically; he made use of transfinite induction, as will be discussed in detail in the next section. Here we just mention that Bernays extensively discussed transfinite induction in Grundlagen der Mathematik II. The main issue for Bernays was the question, is it still a finitist principle?—Bernays did not discuss, however, two other aspects of Gentzen’s work, namely, the use of structural features of formal proofs for consistency proofs and the attempt of gaining a constructive, semantic understanding of intuitionist arithmetic. The former became crucial for proof theoretic investigations; the latter influenced Gödel and his functional interpretation via computable functionals of finite type.[14] The two aspects together opened a new era for proof theory and mathematical logic with the goal of proving the consistency of analysis. We will see, how far current techniques lead us and what foundational significance one can attribute to them.

3. Gentzen’s Consistency Proof

Cut elimination fails for first-order arithmetic (i.e., PA), not even partial cut elimination is possible since the induction axioms have unbounded complexity. Gentzen, however, found an ingenious way of dealing with purported contradictions in arithmetic. In Gentzen 1938b he showed how to effectively transform an alleged PA-proof of an inconsistency (the empty sequent) in his sequent calculus into another proof of the empty sequent such that the latter gets assigned a smaller ordinal than the former. Ordinals are a central concept in set theory as well as in proof theory. To present Gentzen’s work we shall first discuss the notion of ordinal from a proof-theoretic point of view.

3.1 Ordinals in Proof Theory

This is the first time we talk about the transfinite and ordinals in proof theory. Ordinals have become very important in advanced proof theory. The concept of an ordinal is a generalization of that of a natural number. The latter are used for counting finitely many things whereas ordinals can also “count” infinitely many things. It is derived from the concept of an ordering \(\prec\) of a set X which arranges the objects of X in order, one after another, in such a way that for every predicate P on X there is always a first element of X with respect to \(\prec\) that satisfies P if there is at least one object in X satisfying P. Such an ordering is called a well-ordering of X. Certainly the usual less-than relation on \(\bbN\) is a well-ordering. Here every number \(

e 0\) is the successor of another number. If one orders the natural numbers \(>0\) in the usual way but declares that 0 is bigger than every number \(

e 0\) one arrives at another ordering of \(\bbN\). Let’s call it \(\prec\). \(\prec\) is also a well-ordering of \(\bbN\). This time 1 is the least number with respect to \(\prec\). However, 0 plays a unique role. There are infinitely many numbers \(\prec 0\) and there is no number \(m\prec 0\) such that 0 is the next number after m. Such numbers are are called limit numbers (with respect to \(\prec\)).

In order to be able to formulate Gentzen’s results from the end of section 3.3, we have to “arithmetize” the treatment of ordinals. Let us first state some precise definitions and a Cantorian theorem.

Definition 3.1 A non-empty set A equipped with a total ordering \(\prec\) (i.e., \(\prec\) is transitive, irreflexive, and \[\forall x,y\in A\,[{x\prec y}\lor {x=y}\lor {y\prec x}])\] \[\forall x,y\in A\,[{x\prec y}\lor {x=y}\lor {y\prec x}])\] is a well-ordering if every non-empty subset X of A contains a \(\prec\)-least element, i.e., \[(\exists u\in X)(\forall y\in X)[{u\prec y}\lor {u=y}].\] \[(\exists u\in X)(\forall y\in X)[{u\prec y}\lor {u=y}].\] The elements of a well-ordering \((A,\prec)\) can be divided into three types. Since A is non-empty there is least element with respect to \(\prec\) which is customarily denoted by 0 or \(0_A\). Then there are elements \(a\in A\) such that there exists \(b\prec a\) but there is no c between b and a. These are the successor elements of A, with a being the successor of b. An element \(c\in A\) such that \(0\prec c\) and for all \(b\prec c\) there exists \(d\in A\) with \(b\prec d\prec c\) is said to be a limit element of A. In set theory a set is called transitive just in case all its elements are also subsets. An ordinal in the set-theoretic sense is a transitive set that is well-ordered by the elementhood relation \(\in\). It follows that each ordinal is the set of predecessors. According to the trichotomy above, there is a least ordinal (which is just the empty set) and all other ordinals are either successor or limit ordinals. The first limit ordinal is denoted by \(\omega\).

Fact 3.2 Every well-ordering \((A,\prec)\) is order isomorphic to a unique ordinal \((\alpha,\in)\).

Ordinals are traditionally denoted by lower case Greek letters \(\alpha,\beta,\gamma,\delta,\ldots\) and the relation \(\in\) on ordinals is notated simply by \(<\). If \(\beta \) is a successor ordinal, i.e., \(\beta\) is the successor of some (necessarily unique) ordinal \(\alpha\) we also denotes \(\beta\) by \(\alpha'\). Another important fact is that for any family of ordinals \(\alpha_i\) for \(i\in I\) (I some set) there is a smallest ordinal, denoted by \(\sup_{i\in I}\alpha_i\) that is bigger than every ordinal \(\alpha_i\).

The operations of addition, multiplication, and exponentiation can be defined on all ordinals by using case distinctions and transfinite recursion (on \(\alpha\)). The following states the definitions just to convey the flavor:

\[\begin{align} \beta+0 &= \beta & \beta+\alpha' &=(\beta+\alpha)' &\displaystyle \beta+\lambda &=\sup_{\xi

However, addition and multiplication are in general not commutative.

We are interested in representing specific ordinals \(\alpha\) as relations on \(\bbN\). In essence Cantor defined the first ordinal representation system in 1897. Natural ordinal representation systems are frequently derived from structures of the form

\[\tag{2} {\frakA} = \langle\alpha,f_1,\ldots,f_n,

where \(\alpha\) is an ordinal, \(<_{\alpha}\) is the ordering of ordinals restricted to elements of \(\alpha\) and the \(f_i\) are functions

\[\tag{3} f_i:\underbrace{\alpha\times\cdots\times\alpha}_{k_i \textrm{ times}} \longrightarrow \alpha \]

for some natural number \(k_i\).

\[ \bbA = \langle A,g_1,\ldots,g_n,\prec\rangle \]

is a computable (or recursive) representation of \(\frakA\) if the following conditions hold:

\(A\subseteq\bbN\) and A is a computable set. \(\prec\) is a computable total ordering on A and the functions \(g_i\) are computable. \(\frakA \cong\bbA\), i.e., the two structures are isomorphic.

Theorem 3.3 (Cantor 1897) For every ordinal \(\beta>0\) there exist unique ordinals \(\beta_0\geq\beta_1\geq\dots\geq\beta_n\) such that \[\tag{4}\label{epsilon} \beta = \omega^{\beta_0}+\ldots+\omega^{\beta_n}.\] \[\tag{4}\label{epsilon} \beta = \omega^{\beta_0}+\ldots+\omega^{\beta_n}.\]

The representation of \(\beta\) in (4) is called the Cantor normal form. We shall write \(\beta \mathbin{=_{\sCNF}} \omega^{\beta_1}+\cdots +\omega^{\beta_n}\) to convey that \(\beta_0\geq\beta_1\geq\dots\geq\beta_k\).

The rather famous ordinal that emerged in Gentzen’s consistency proof of PA is denoted by \(\varepsilon_0\). It refers to first ordinal \(\alpha>0\) such that \((\forall \beta<\alpha)\,\omega^{\beta}<\alpha\). \(\varepsilon_0\) can also be described as the least ordinal \(\alpha\) such that \(\omega^{\alpha}=\alpha\).

Ordinals \(\beta<\varepsilon_0\) have a Cantor normal form with exponents \(\beta_i<\beta\) and these exponents have Cantor normal forms with yet again smaller exponents. As this process must terminate, ordinals \(<\varepsilon_0\) can be coded by natural numbers. For instance a coding function

\[{\Corner{\mathord{\,.\,}}}:\varepsilon_0\longrightarrow \bbN\]

could be defined as follows:

\[{\Corner{\alpha}} = \left\{\begin{array}{ll} 0 & \textrm{if } \alpha=0\\ \langle{\Corner{\alpha_1}},\ldots,{\Corner{\alpha_n}}\rangle & \textrm{if } \alpha \mathbin{=_{\sCNF}} \omega^{\alpha_1}+\cdots+\omega^{\alpha_n} \end{array}\right.\]

where \(\langle k_1,\cdots,k_n\rangle\coloneqq 2^{k_1+1}\cdot\ldots\cdot p_n^{k_n+1}\) with \(p_i\) being the ith prime number (or any other coding of tuples). Further define:

\[\begin{align} A_0 &{} \coloneqq \textrm{range of }\ {\Corner{\mathord{.} }} \\ {\Corner{\alpha}}\prec{\Corner{\beta}} &{} \mathbin{:\Leftrightarrow} \alpha

Then

\[ {\langle\varepsilon_0,+,\cdot,\delta\mapsto \omega^{\delta},

\(A_0,\hat{+},\hat{\cdot},x\mapsto\hat{\omega}^{x},\prec\) are primitive recursive. Finally, we can spell out the scheme PR-TI\((\varepsilon_0)\) in the language of PA:

\[ \forall x\, {\left[\forall y\, \left({y\prec x} \to {P(y)}\right) \to {P(x)}\right]} \to {\forall x\, {P(x)}} \]

for all primitive recursive predicates P.

Given a natural ordinal representation system \(\langle A,\prec,\ldots\rangle\) of order type \(\tau\) let \(\PRA+\rTI_{\qf}(<\tau)\) be PRA augmented by quantifier-free induction over all initial (externally indexed) segments of \(\prec\). This is perhaps best explained via the representation system for \(\varepsilon_0\) given above. There one can take the initial segments of \(\prec\) to be determined by the Gödel numbers of the ordinals \(\omega_0\coloneqq 1\) and \(\omega_{n+1}\coloneqq \omega^{\omega_n}\) whose limit is \(\varepsilon_0\).

Definition 3.4 We say that a theory T has proof-theoretic ordinal \(\tau\), written \(\lvert T\rvert =\tau\), if T can be proof-theoretically reduced to \(\PRA+\rTI_{\qf}(<\tau)\), i.e., \[ T \mathbin{\equiv_{\Pi^0_2}} \PRA+\rTI_{\qf}(<\tau). \] \[ T \mathbin{\equiv_{\Pi^0_2}} \PRA+\rTI_{\qf}(

Unsurprisingly, the above notion has certain intensional aspects and hinges on the naturality of the representation system (for a discussion see Rathjen 1999a: section 2.).

3.2 Infinite Proofs

Gentzen’s consistency proof for PA employs a reduction procedure \(\cR\) on proofs P of the empty sequent together with an assignment ord of representations for ordinals to proofs such that \(\ord(\cR(P))< \ord(P)\). Here \(<\) denotes the ordering on ordinal representations induced by the ordering of the pertinent ordinals. For this purpose he needed representations for ordinals \(<\varepsilon_0\) where \(\varepsilon_0\) is the smallest ordinal \(\tau\) such that whenever \(\alpha<\tau\) then also \(\omega^{\alpha}<\tau\) with \(\alpha\mapsto \omega^{\alpha}\) being the function of ordinal exponentiation with base \(\omega\). Moreover, the functions \(\cR\) and ord and the relation \(<\) are computable (when viewed as acting on codes for the syntactic objects), they can be chosen to be primitive recursive. With \(g(n)= \ord(\cR^n(P))\), the n-fold iteration of \(\cR\) applied to P, one has \(g(0)>g(1)> g(2)> \ldots > g(n)\) for all n, which is absurd as the ordinals \(<\varepsilon_0\) are well-founded. Hence PA is consistent.

Gentzen’s proof, though elementary, was very intricate and thus more transparent proofs were sought. As it turned out, the obstacles to cut elimination, inherent to PA, could be overcome by moving to a richer proof system, albeit in a drastic way by going infinite. This richer system allows for proof rules with infinitely many premises.[15] The inference commonly known as the \(\omega\)-rule consists of the two types of infinitary inferences:

\[ \begin{align} \frac{\Gamma \Rightarrow {\Delta, F(0)};\; \Gamma \Rightarrow {\Delta,F(1)};\; \ldots; \Gamma \Rightarrow {\Delta,F(n)};\; \ldots} {\Gamma \Rightarrow {\Delta,\forall x\,F(x)}} \tag*{\(\omega R\)} \\[1ex] \frac{{F(0),\Gamma} \Rightarrow {\Delta};\; {F(1), \Gamma} \Rightarrow {\Delta};\; \ldots; {F(n),\Gamma} \Rightarrow {\Delta}; \ldots} {\exists x\,F(x), {\Gamma \Rightarrow \Delta} } \tag*{\(\omega L\)} \end{align}\]

The price to pay will be that deductions become infinite objects, i.e., infinite well-founded trees.

The sequent-style version of Peano arithmetic with the \(\omega\)-rule will be denoted by \(\PA_{\omega}\). \(\PA_{\omega}\) has no use for free variables. Thus free variables are discarded and all terms will be closed. All formulae of this system are therefore closed, too. The numerals are the terms \(\bar{n}\), where \(\bar{0}=0\) and \(\overline{n+1}=S\bar{n}\). We shall identify \(\bar{n}\) with the natural number n. All terms t of \(\PA_{\omega}\) evaluate to a numeral \(\bar{n}\).

\(\PA_{\omega}\) has all the inference rules of the sequent calculus except for \(\forall R\) and \(\exists L\). In their stead, \(\PA_{\omega}\) has the \(\omega R\) and \(\omega L\) inferences. The Axioms of \(\PA_{\omega}\) are the following: (i) \(\emptyset\Rightarrow A\) if A is a true atomic sentence; (ii) \(B\Rightarrow \emptyset\) if B is a false atomic sentence; (iii) \(F(s_1,\ldots,s_n)\Rightarrow F(t_1,\ldots,t_n)\) if \(F(s_1,\ldots,s_n)\) is an atomic sentence and the \(s_i\) and \(t_i\) evaluate to the same numerals, respectively.

With the aid of the \(\omega\)-rule, each instance of the induction scheme becomes logically deducible with an infinite proof tree. To describe the cost of cut elimination for \(\PA_{\omega}\), we introduce the measures of height and cut rank of a \(\PA_{\omega}\) deduction \(\cD\). We will notate this by

\[\cD \stile{\alpha}{k} \Gamma \Rightarrow \Delta.\]

The above relation is defined inductively following the buildup of the deduction \(\cD\). For the cut rank we need the definition of the length, \(\lvert A\rvert\) of a formula:

\[ \begin{align} \lvert A\rvert & =0 \textrm{ if } A \textrm{ is atomic; }\\ \lvert

eg A_0\rvert & =\lvert A_0\rvert +1; \\ \lvert A_0\mathbin{\Box} A_1\rvert & =\max(\lvert A_0,A_1\rvert)+1 \end{align} \]

where \(\Box=\land,\lor,\to\); \(\lvert \exists x\,F(x)\rvert =\lvert \forall x\,F(x)\rvert =\lvert F(0)\rvert +1\).

Now suppose the last inference I of \(\cD\) is of the form

\[ \frac{ \begin{array}{c} \cD_0\\ {\Gamma_0\Rightarrow \Delta_0} \end{array} \ \ldots\ \begin{array}{c} \cD_n\\ {\Gamma_n\Rightarrow \Delta_n} \end{array} \ \ldots\ n

where \(\tau=1,2,\omega\) and the \(\cD_n\) are the immediate subdeductions of \(\cD\). If

\[{\cD_n \stile{\alpha_n}{k} \Gamma_n} \Rightarrow \Delta_n\]

and \(\alpha_n<\alpha\) for all \(n<\tau\) then

\[{\cD \stile{\alpha}{k} \Gamma} \Rightarrow \Delta\]

providing that in the case of I being a cut with cut formula A we also have \(\lvert A\rvert <k\). We will write \({\PA_{\omega} \stile{\alpha}{k} \Gamma} \Rightarrow \Delta\) to convey that there exists a \(\PA_{\omega}\)-deduction \({\cD\stile{\alpha}{k} \Gamma}\Rightarrow \Delta\). The ordinal analysis of PA proceeds by first unfolding any PA-deduction into a \(\PA_{\omega}\)-deduction:

\[\tag{5}\label{einbett} \textrm{If } {\PA \vdash \Gamma}\Rightarrow \Delta \textrm{ then } {\PA_{\omega} \stile{\omega+m}{k} \Gamma} \Rightarrow \Delta \]

for some \(m,k<\omega\). The next step is to get rid of the cuts. It turns out that the cost of lowering the cut rank from \(k+1\) to k is an exponential with base \(\omega\).

Theorem 3.5 (Cut Elimination for \(\PA_{\omega}\)) If \({\PA_{\omega} \stile{\alpha}{k+1} \Gamma} \Rightarrow \Delta\), then \[ {\PA_{\omega} \stile{\omega^{\alpha}}{k} \Gamma} \Rightarrow \Delta. \] \[ {\PA_{\omega} \stile{\omega^{\alpha}}{k} \Gamma} \Rightarrow \Delta. \]

As a result, if \({\PA_{\omega} \stile{\alpha}{n} \Gamma}\Rightarrow \Delta\), we may apply the previous theorem n times to arrive at a cut-free deduction \({\PA_{\omega} \stile{\rho}{0} \Gamma} \Rightarrow \Delta\) with \(\rho=\omega^{\omega^{\iddots^{\omega^{\alpha}}}}\), where the stack has height n. Combining this with the result from \((\ref{einbett})\), it follows that every sequent \(\Gamma\Rightarrow \Delta\) deducible in PA has a cut-free deduction in \(\PA_{\omega}\) of length \(<\varepsilon_0\). Ruminating on the details of how this result was achieved yields a consistency proof for PA from transfinite induction up to \(\varepsilon_0\) for elementary decidable predicates on the basis of finitist reasoning (as described below).

Gentzen did not deal explicitly with infinite proof trees in his second published proof of the consistency of PA (Gentzen 1938b). However, in the unpublished first consistency proof of Gentzen 1974 he aims at showing that a proof of a sequent in first-order arithmetic gives rise to a a well-founded reduction tree; that tree can be identified with a cut-free proof in the sequent calculus with the \(\omega\)-rule. The infinitary version of PA with the \(\omega\)-rule was investigated by Schütte (1950). There remained the puzzle, how Gentzen’s work that used an ingenious method of assigning ordinals to purported proofs of the empty sequent relates to the infinitary approach. Much later work by Buchholz (1997) and others revealed an intrinsic connection between Gentzen’s assignment of ordinals to deductions in PA and the standard one to infinite deductions in \(\PA_{\omega}\). In the 1950s infinitary proof theory flourished in the hands of Schütte. He extended his approach to PA to systems of ramified analysis to be discussed below in section 5.2.

One last remark about the use of ordinals: Gentzen showed that transfinite induction up to the ordinal \(\varepsilon_0\) suffices to prove the consistency of PA. Using the arithmetized formalization of the proof predicate (see above, after Definition 1.3) and taking k as the numeral denoting the Gödel number of the formula \(0=1\), we can express the consistency of PA, \(\Con (\PA)\), by the formula \(\forall x \,

eg\proof_{\PA}(x,k)\). To appreciate Gentzen’s result it is pivotal to note that he applied transfinite induction up to \(\varepsilon_0\) only for primitive recursive predicates (the latter principle was denoted above by PR-TI\((\varepsilon_0)\)). Otherwise, Gentzen’s proof used only finistist means. Hence, a more accurate formulation of Gentzen’s result is

\[\tag{6}\label{picture} \bF+\textrm{PR-TI}(\varepsilon_0) \vdash \Con (\PA),\]

where F, as above, contains only finitistically acceptable means. In his 1943 paper Gentzen also showed that this result is best possible, as PA proves transfinite induction up to any \(\alpha<\varepsilon_0\). So one might argue that the non-finitist part of PA is encapsulated in PR-TI\((\varepsilon_0)\) and therefore “measured” by \(\varepsilon_0\). \(\varepsilon_0\) is also the proof-theoretic ordinal of PA as specified in Definition 3.4. Gentzen hoped that results of this character could also be obtained for stronger mathematical theories, in particular for analysis. Hilbert’s famous second problem asked for a direct consistency proof of that mathematical theory. Gentzen wrote in 1938 that

the most important [consistency] proof of all in practice, that for analysis, is still outstanding. (1938a [Gentzen 1969: 236]).

He actually worked on a consistency proof for analysis as letters (e.g. one to Bernays on 23.6.1935 translated in von Plato 2017: 240) and stenographic notes from 1945 (e.g., Gentzen 1945) show. Formally, “analysis” was identified already in Hilbert 1917/18 as a form of second order number theory. Hilbert and Bernays developed mathematical analysis in a supplement of the second volume of their “Grundlagen der Mathematik”. We take \(\bZ_2\) as given in the following way. Its language extends that of PA by an additional sort of variables \(X,Y,Z,\ldots\) that range over sets of numbers and the binary membership relation \(t\in X\). Its axioms are those of PA, but the principle of proof by induction is formulated as the second order induction axiom

\[\forall X(0\in X\land\forall x(x\in X\rightarrow S(x)\in X) \rightarrow \forall x(x\in X)).\]

Finally, the axiom schema of comprehension, CA, asserts that for every formula \({F}(u)\) of the language of \(\bZ_2\), there is a set \(X=\{u\mid {F}(u)\}\) having exactly those numbers u as members that satisfy \({F}(u)\). More formally,

\[\label{CA}\tag{\(\bCA\)} \exists X\forall u(u\in X\leftrightarrow{F}(u)) \]

for all formulae \({F}(u)\) in which X does not occur. That \(\bZ_2\) is often called “analysis” is due to the realization (e.g., in Hilbert & Bernays 1939) that, via the coding of real numbers and continuous functions as sets of natural numbers, a good theory of the continuum can be developed from these axioms.

Modern analyses of “finitist mathematics” consider it as situated between PRA and PA. When arguing that Gödel’s second incompleteness theorem refutes Hilbert’s finitist program, von Neumann argued that finitist mathematics is included in PA and, if not there, undoubtedly in \(\bZ_2\). So it is quite clear that a consistency proof of \(\bZ_2\) would use non-finitist principles or that the pursuit of the consistency program would require an extension of the finitist standpoint. In the next section we discuss briefly a variety of extensions and elaborate two in greater detail.

4. Hilbert’s Program, Extended

According to Bernays, the reductive result due to Gödel and Gentzen, Theorem 1.2, has a dramatic impact on the work concerned with Hilbert’s program. It opened in a very concrete and precise way the finitist perspective to a broader “constructive” one. Hilbert himself had taken such a step in a much vaguer way in his last paper (Hilbert 1931b). Theorem 1.2 showed, after all, that PA is contained in HA via the negative translation. Since HA comprises just a fragment of Brouwer’s intuitionism, the consistency of PA is secured on the basis of the intuitionist standpoint. In a letter to Heyting of 25 February 1933, Gentzen suggested investigating the consistency of HA since a consistency proof for classical arithmetic had not been given so far by finitist means. He then continued

If on the other hand, one admits the intuitionistic position as a secure basis in itself, i.e., as a consistent one, the consistency of classical arithmetic is secured by my result. If one wished to satisfy Hilbert’s requirements, the task would still remain of showing intuitionistic arithmetic consistent. This, however, is not possible by even the formal apparatus of classical arithmetic, on the basis of Gödel’s result in combination with my proof. Even so, I am inclined to believe that a consistency proof for intuitionistic arithmetic, from an even more evident position, is possible and desirable. (quoted and translated in von Plato 2009: 672)

Gödel took a very similar position in December of 1933 (Gödel 1995: 53). There he broadened the idea of a revised version of Hilbert’s program allowing constructive means that go beyond the finitist ones without accepting fully fledged intuitionism; the latter he considered to be problematic, in particular on account of the impredicative nature of intuitionist implication. As to an extension of Hilbert’s position he wrote:

But there remains the hope that in future one may find other and more satisfactory methods of construction beyond the limits of the system A [capturing finitist methods], which may enable us to found classical arithmetic and analysis upon them. This question promises to be a fruitful field for further investigations.

In section 3.2 we described Gentzen’s considerations; in section 4.2 we discuss Gödel’s as developed in the late 1930s. In section 4.1 we sketch some other constructive positions.

4.1 Constructive Frameworks

A particularly appealing idea is to pursue Hilbert’s program relative to a constructive point of view and determine which parts of classical mathematics are demonstrably consistent relative to that standpoint (see Rathjen 2009 for pursuing this with regard to Martin-Löf type theory). As one would suspect, there are differing “schools” of constructivism and different layers of constructivism. Several frameworks for developing mathematics from such a point of view have been proposed. Some we will refer to in this article (arguably the most important) are:

Arithmetical Predicativism. Theories of higher type functionals. Takeuti’s “Hilbert-Gentzen finitist standpoint”. Feferman’s explicit mathematics. Martin-Löf’s intuitionistic type theory. Constructive set theory (Myhill, Friedman, Beeson, Aczel).

At this point we will just give a very rough description of these foundational views. A few more details, especially about their scope on a standard scale of theories and proof-theoretic ordinals, will be provided at the very end of section 5.3.

(a) Arithmetical Predicativism originated in the writings of Poincaré and Russell in response to the set-theoretic paradoxes. It is characterized by a ban of impredicative definitions. Whilst it accepts the completed infinite set of naturals numbers, all other sets are required to be constructed out of them via an autonomous process of arithmetical definitions. A first systematic attempt at developing mathematics predicatively was made in Weyl’s 1918 monograph Das Kontinuum (Weyl 1918).

(b) Theories of higher type functionals comprise Gödel’s T and Spector’s extension of T via functionals defined by bar recursion. The basic idea goes back to Gödel’s 1938 lecture at Zilsel’s (Gödel 1995: 94). It was inspired by Hilbert’s 1926 Über das Unendliche, which considered a hierarchy of functionals over the natural numbers, not only of finite but also of transfinite type.

(c) To understand Takeuti’s finitist standpoint it is important to pinpoint the place where in a consistency proof à la Gentzen the means of PRA are exceeded. Gentzen’s proof employs a concrete ordering \(\prec\) of type \(\varepsilon_0\), it uses an assignment of ordinals to proofs and provides a reduction procedure on proofs such that any alleged proof of an inconsistency is reduced to another proof of an inconsistency which gets assigned a smaller element of the ordering. The ordering, the ordinal assignment and the reduction procedure are actually primitive recursive and the steps described so far can be carried out in a small fragment of PRA. The additional principle needed to infer the consistency of PA is the following:

(*) There are no infinite elementary recursive sequences \(\alpha_0,\alpha_1,\alpha_2,\ldots\) such that \(\alpha_{n+1}\prec \alpha_n\) holds for all n.

Takeuti refers to (*) as the accessibility of \(\prec\). Note that this is a weaker property than the well-foundedness of \(\prec\) which refers to arbitrary sequences. There is nothing special about the case of PA since any ordinal analysis of a theory T in the literature can be made to fit this format. Thus epistemologically (*) is the fulcrum in any such consistency proof. Takeuti’s central idea (1987, 1975) was that we can carry out Gedankenexperimente (thought experiments) on concretely given (elementary) sequences to arrive at the insight that (*) obtains.[16]

(d) Errett Bishop’s novel (informal) approach to constructive analysis (1967) made a great on impression on mathematicians with constructive leanings. In it, he dealt with different kinds of mathematical objects (numbers, functions, sets) as if they were given by explicit presentations, each kind being equipped with its own germane “equality” relation conceived in such a way that operations on them would lead from representations to representations respecting the equality relation. An important ingredient that made Bishop’s constructivism workable is the systematic use of witnessing data as an integral part of what constitutes a mathematical object. For instance, a real number comes with a modulus of convergence while a function of real numbers comes equipped with a modulus of (uniform) convergence. In his explicit mathematics, Feferman (1975, 1979) aims (among other things) at formalizing the core of Bishop’s ontology. Explicit mathematics is a theory that describes a realm of concretely and explicitly given objects (a universe U of symbols) equipped with an operation \(\bullet\) of application in such a way that given two objects \(a,b\in U\), a may be viewed as a program which can be run on input b and may produce an output \(a\bullet b\in U\) or never halt (such structures are known as partial combinatory algebras or Schönfinkel algebras). Moreover, some of the objects of U represent sets of elements of U. The construction of new sets out of given sets is either done explicitly by elementary comprehension or by a process of inductive generation. If one also adds principles to the effect that every internal operation (given as \(\lambda x.a\bullet x\) for some \(a\in U\)) which is monotone on sets possesses a least fixed point one arrives at a remarkably strong theory (cf. Rathjen 1998, 1999b, 2002).

(e) Martin-Löf type theory is an intuitionist theory of dependent types intended to be a full scale system for formalizing constructive mathematics. Its origins of can be traced to Principia Mathematica, Hilbert’s Über das Unendliche, the natural deduction systems of Gentzen, taken in conjunction with Prawitz’s reduction procedures, and to Gödel’s Dialectica system. It incorporates inductively defined data types which together with the vehicle of internal reflection via universes endow it with considerable consistency strength.

(f) Constructive set theory (as do the theories under (d) and (e)) sets out to develop a framework for the style of constructive mathematics of Bishop’s 1967 Foundations of constructive analysis in which he carried out a development of constructive analysis, based on informal notions of constructive function and set, which went substantially further mathematically than anything done before by constructivists. Where Brouwer reveled in differences, Bishop stressed the commonalities with classical mathematics. What was novel about his work was that it could be read as a piece of classical mathematics as well.

The ‘manifesto’ of constructive set theory was most vividly expressed by Myhill:

… the argumentation of [Bishop 1967] looks very smooth and seems to follow directly from a certain conception of what sets, functions, etc. are, and we wish to discover a formalism which isolates the principles underlying this conception in the same way that Zermelo-Fraenkel set-theory isolates the principles underlying classical (nonconstructive) mathematics. We want these principles to be such as to make the process of formalization completely trivial, as it is in the classical case. (Myhill 1975: 347)

Despite first appearances, there are close connections between the approaches of (d)–(f). Constructive set theory can be interpreted in Martin-Löf type theory (due to Aczel 1978) and explicit mathematics can be interpreted in constructive set theory (see Rathjen 1993b, in Other Internet Resources). Perhaps the closest fit between (e) and (f), giving back and forth interpretations, is provided by Rathjen & Tupailo 2006. Some concrete mathematical results are found at the end of section 5.3.

4.2 The Dialectica Interpretation: Gödel and Spector

Among the proposals for extending finitist methods put forward in his 1938 lecture at Zilsel’s, Gödel appears to have favored the route via higher type functions. Details of what came to be known as the Dialectica interpretation were not published until 1958 (Gödel 1958) but the D-interpretation itself was arrived at by 1941. Gödel’s system \({T}\) axiomatizes a class of functions that he called the primitive recursive functionals of finite type. \({T}\) is a largely equational theory whose axioms are equations involving terms for higher type functionals with just a layer of propositional logic on top of that. In this way the quantifiers, problematic for finists and irksome to intuitionists, are avoided. To explain the benefits of the D-interpretation we need to have a closer look at the syntax of \({T}\).

Definition 4.1 \({T}\) has a many-sorted language in that each terms is assigned a type. Type (symbols) are generated from 0 by the rule: If \(\sigma\) and \(\tau\) are types then so is \(\sigma\to\tau\). Intuitively the ground type 0 is the type of natural numbers. If \(\sigma\) and \(\tau\) are types that are already understood then \(\sigma\to\tau\) is a type whose objects are considered to be functions from objects of type \(\sigma\) to objects of type \(\tau\). In addition to variables \(x^{\tau},y^{\tau},z^{\tau},\ldots\) for each type \(\tau\), the language of \({T}\) has special constants 0, \(\rsuc\), \(\rK_{\sigma,\tau}\), \(\rS_{\rho,\sigma,\tau}\), and \(\rR_{\sigma}\) for all types \(\rho,\sigma,\tau\). The meaning of these constants is explained by their defining equations. \(\rK_{\sigma,\tau}\) and \(\rS_{\rho,\sigma,\tau}\) are familiar from combinatory logic which was introduced by Schönfinkel in 1924 and became more widely known through Curry’s work (1930). 0 plays the role of the first natural number while \(\rsuc\) embodies the successor function on objects of type 0. The constants \(\rR_{\sigma}\), called recursors, provide the main vehicle for defining functionals by recursion on \(\bbN\). Term formation starts with constants and variables, and if s and t are terms of type \(\sigma\to\tau\) and \(\sigma\), respectively, then \(s(t)\) is a term of type \(\tau\). To increase readability we shall write \(t(r,s)\) instead of \((t(r))(s)\) and \(t(r,s,q)\) instead of \((t(r,s))(q)\) etc. Also \(\rsuc(t)\) will be shortened to \(t'\). The defining axioms for the constants are the following:[17] \(

eg t'=0\) \(t'=r'\to t=r\) \(\rK_{\sigma,\tau}(s,t)=s\) \({\rS}_{\rho,\sigma,\tau}(r,s,t)= (r(t))(s(t))\) \(\rR_{\sigma}(f,g,0) = f\) \(\rR_{\sigma}(f,g,n')= g(n,\rR_{\sigma}(f,g,n)).\) The axioms of \({T}\) consist of the above defining axioms, equality axioms and axioms for propositional logic. Inference rules are modus ponens and the induction rule \[ \textrm{from } A(0) \textrm{ and } A(x) \to A(x') \textrm{ conclude }A(t) \] \[ \textrm{from } A(0) \textrm{ and } A(x) \to A(x') \textrm{ conclude }A(t) \] for t of type 0 and x not in \(A(0)\).

The first step towards the D-interpretation of Heyting arithmetic in \({T}\) consists of associating to each formula A of arithmetic a syntactic translation \(A^D\) which is of the form

\[\tag{7}\label{D-Form} A^D \equiv {\exists x^{\sigma_1} \ldots \exists x^{\sigma_n} \forall y^{\tau_1} \ldots \forall y^{\tau_m} A_D(\vec{x},\vec{y})} \]

with \(A_D(\vec{x},\vec{y})\) being quantifier free. Thus \(A^D\) is not a formula of \({T}\) but of its augmentation via quantifiers \(\forall x^{\tau}\) and \(\exists y^{\tau}\) for all types \(\tau\). The translation proceeds by induction on the buildup of A. The cases where the outermost logical symbol of A is among \(\land\), \(\lor\), \(\exists x\), \(\forall x\) are rather straightforward. The crucial case occurs when A is an implication \(B\to C\). To increase readability we shall suppress the typing of variables. Let \(B^D\equiv \exists \vec{x}\, \forall \vec{y}\, B_D(\vec{x},\vec{y})\) and \(C^D\equiv {\exists \vec{u}\, \forall \vec{v}\, C_D(\vec{u},\vec{v})}\). Then one uses a series of judicious equivalences to bring the quantifiers in \(B^D\to C^D\) to the front and finally employs skolemization of existential variables as follows:

\[\begin{align} \tag{i} \exists \vec{x}\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \exists \vec{u}\,\forall \vec{v} C_D(\vec{u},\vec{v})\\ \tag{ii} \forall \vec{x}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \exists \vec{u}\,\forall \vec{v} C_D(\vec{u},\vec{v})]\\ \tag{iii} \forall \vec{x}\,\exists\vec{u}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to \forall \vec{v} C_D(\vec{u},\vec{v})]\\ \tag{iv} \forall \vec{x}\,\exists\vec{u}\,\forall \vec{v}[\,\forall \vec{y} B_D(\vec{x},\vec{y})&\to C_D(\vec{u},\vec{v})]\\ \tag{v} \forall \vec{x}\,\exists\vec{u}\,\forall \vec{v}\,\exists \vec{y}[B_D(\vec{x},\vec{y})&\to C_D(\vec{u},\vec{v})]\\ \tag{vi} \forall \vec{x}\,\exists\vec{u}\,\exists Y\,\forall \vec{v}[B_D(\vec{x},Y(\vec{v}))&\to C_D(\vec{u},\vec{v})]\\ \tag{vii}\exists U\,\exists Z\,\forall \vec{x}\,\forall \vec{v}[B_D(\vec{x},Z(\vec{x},\vec{v}))&\to C_D(U(\vec{x}),\vec{v})]. \end{align}\]

\(A^D\) is then defined to be the formula in (vii). Note, however, that these equivalences are not necessarily justified constructively. Only (i) \(\Leftrightarrow\) (ii) and (iii) \(\Leftrightarrow\) (iv) hold constructively whereas (v) \(\Leftrightarrow\) (vi) and (vi) \(\Leftrightarrow\) (vii) are justified constructively only if one also accepts the axiom of choice for all finite types (ACft). Equivalences (ii) \(\Leftrightarrow\) (iii) and (iv) \(\Leftrightarrow\) (v) use a certain amount of classical logic known as the principle of independence of premise (IPft) and Markov’s principle (MPft) for all finite types, respectively. At this point \(A\mapsto A^D\) is just a syntactic translation. But amazingly it gives rise to a meaningful interpretation of HA in T.

Theorem 4.2 (Gödel 1958) Suppose \(\cD\) is a proof of A in HA and \(A^D\) as in \((\ref{D-Form})\). Then one can effectively construct a sequence of terms \(\vec t\) (from \(\cD\)) such that \({T}\) proves \(A_D(\vec t,\vec y\,)\).

If one combines the D-interpretation with the Kolmogorov-Gentzen-Gödel negative translation of PA into HA one also arrives at an interpretation of PA in \({T}\). Some interesting consequences of the latter are that the consistency of PA follows finitistically from the consistency of \({T}\) and that every total recursive function of PA is denoted by a term of \({T}\).

The three principles ACft, IPft and MPft which figured in the D-translation actually characterize the D-translation in the sense that over the quantifier extension of \({T}\) with intuitionistic logic, called \(\bHA^{\omega}\), they are equivalent to the schema

\[ C\leftrightarrow C^D \]

for all formulae C of that theory. Principles similar to the three above are also often validated in another type of computational interpretation of intuitionistic theories known as realizability. Thus it appears that they are intrinsically related to computational interpretations of such theories.

A further pleasing aspect of Gödel’s interpretation is that it can be extended to stronger systems such as higher order systems and even to set theory (Burr 2000, Diller 2008). Moreover, it sometimes allows one to extract computational information even from proofs of specific classical theorems (see, e.g., Kohlenbach 2007). It behaves nicely with respect to modus ponens and thus works well for ordinary proofs that are usually structured via a series of lemmata. This is in contrast to cut elimination which often requires a computationally costly transformation of proofs.

Spector (1962) extended Gödel’s functional interpretation, engineering an interpretation of \(\bZ_2\) into T augmented via a scheme of transfinite recursion on higher type orderings. This type of recursion, called bar recursion, is conceptually related to Brouwer’s bar induction principle. (For a definition of bar induction and a presentation of Spector’s result see appendix C.)

5. Beyond Arithmetic: Subsystems of \(\bZ_2\)

We described the system \(\bZ_2\) of second order arithmetic already at the end of section 3.2. It was viewed as the “next”system to be proved consistent—after first-order arithmetic PA had been shown to be. As we mentioned \(\bZ_2\) is also called “analysis”, because it allows the development of classical mathematical analysis: coding real numbers and continuous functions as sets of natural numbers, a good theory of the continuum can be developed from \(\bZ_2\)’s axioms. Indeed, Hermann Weyl showed in 1918 that a considerable portion of analysis can be developed in small fragments of \(\bZ_2\) that are actually conservative over PA. The idea of singling out the minimal fragment of \(\bZ_2\) required to expose a particular part of ordinary mathematics led in the 1980s to the research program of reverse mathematics. However, before discussing that program, we are going to proof-theoretic investigations of \(\bZ_2\) and its subsystems that have been a focal point until the very early 1980s.

5.1 Takeuti’s Fundamental Conjecture

After Gentzen, it was Gaisi Takeuti who worked on a consistency proof for \(\bZ_2\) in the late 1940s. He conjectured that Gentzen’s Hauptsatz not only holds for first order logic but also for higher order logic, also known as simple type theory, STT. This came to be known as Takeuti’s fundamental conjecture.[18] The particular sequent calculus he introduced was called a generalized logic calculus, GLC (Takeuti 1953). \(\bZ_2\) can be viewed as a subtheory of GLC. In the setting of GLC the comprehension principle CA is encapsulated in the right introduction rule for the existential second-order quantifier and the left introduction rule for the universal second-order quantifier. In order to display these rules the following notation is convenient. If \(F(U)\) and \(A(a)\) are formulae then \(F(\{v\mid A(v)\})\) arises from \(F(U)\) by replacing all subformulae \(t\in U\) of \(F(U)\) (with U indicated) by \(A(t)\). The rules for second order quantifiers can then be stated as follows:[19]

\[\begin{array}{lcclc} (\forall_2\,\bL) & \displaystyle \frac {{ F(\{v\mid A(v)\}) },\Gamma\Rightarrow \Delta} {\forall X\,{F(X),\Gamma} \Rightarrow \Delta } & & (\forall_2\,\rR) & \displaystyle\frac {\Gamma\Rightarrow {\Delta, F(U)}} {\Gamma\Rightarrow \Delta, {\forall X\,F(X) }} \\[2ex] (\exists_2\,\bL) & \displaystyle \frac{F(U),\Gamma\Rightarrow \Delta} {{ \exists X\,F(X)},\Gamma\Rightarrow \Delta } & & (\exists_2\,\rR) & \displaystyle \frac{\Gamma\Rightarrow \Delta,F(\{ v\mid A(v)\})} {\Gamma\Rightarrow \Delta,{\exists X\,F(X) }} \end{array} \]

To deduce an instance \(\exists X\,\forall x\,[x\in X \leftrightarrow A(x)]\) of CA just let \(F(U)\) be the formula \(\forall x\,[x\in U \leftrightarrow A(x)]\) and observe that

\[F(\{v\mid A(v)\})\equiv \forall x\,[A(x)\leftrightarrow A(x)], \]

and hence

\[ \begin{gather} \vdots \\ \displaystyle \frac{\Gamma\Rightarrow \Delta, \forall x\,[A(x)\leftrightarrow A(x)]} {\Gamma\Rightarrow \Delta, \exists X\,\forall x\,[x\in X \leftrightarrow A(x)]} \tag{\(\exists_2\,\rR\)} \end{gather} \]

As the deducibility of the empty sequent is ruled out if cut elimination holds for GLC (or just the fragment GLC 2 corresponding to \(\bZ_2\)), Takeuti’s Fundamental Conjecture entails the consistency of \(\bZ_2\). However note that it does not yield the subformula property as in the first-order case since the minor formula \(F(\{x\mid A(x)\})\) in \((\exists_2\,\rR)\) and \((\forall_2\,\bL)\) may have a much higher (quantifier) complexity than the principal formula \(\exists XF(X)\) and \(\forall XF(X)\), respectively. Indeed, \(\exists XF(X)\) may be a proper subformula of \(A(x)\) which clearly exhibits the impredicative nature of these inferences and shows that they are strikingly different from those in predicative analysis where a proper subformula property obtains.

In 1960a Schütte developed a semantic equivalent to the (syntactic) fundamental conjecture using partial or semi-valuations. He employed the method of search trees (or deduction chains) to show that a formula F that cannot be deduced in the cut-free system has a deduction chain without axioms which then gives rise to a partial valuation V assigning the value “false” to F. From the latter he inferred that the completeness of the cut-free system[20] is equivalent to the semantic property that every partial valuation can be extended to a total valuation (basically a Henkin model of STT). In 1966 Tait succeeded in proving cut-elimination for second order logic using Schütte’s semantic equivalent for that fragment. Soon afterwards, Takahashi (1967) and Prawitz (1968) independently proved for full classical simple type that every partial valuation extends to a total one, thereby establishing Takeuti’s fundamental conjecture. These results, though, were somewhat disappointing as they were obtained by highly non-constructive methods that provided no concrete method for eliminating cuts in a derivation. However, Girard showed in 1971 that simple type theory not only allows cut-elimination but that there is also a terminating normalization procedure.[21] These are clearly very interesting results, but as far as instilling trust in the consistency of \(\bZ_2\) or SST is concerned, the cut elimination or termination proofs are just circular since they blatantly use the very comprehension principles formalized in these theories (and a bit more). To quote Takeuti:

My fundamental conjecture itself has been resolved in a sense by Motoo Takahashi and Dag Prawitz independently. However, their proofs rely on set theory, and so it cannot be regarded as an execution of Hilbert’s Program. (Takeuti 2003: 133)

Takeuti’s work on his conjecture instead focused on partial results. A major breakthrough that galvanized research in proof theory, especially ordinal-theoretic investigations, was made by him in 1967. In Takeuti 1967 he gave a consistency proof for \(\Pi^1_1\)-comprehension and thereby for the first time obtained an ordinal analysis of an impredicative theory. For this Takeuti vastly extended Gentzen’s method of assigning ordinals (ordinal diagrams, to be precise) to purported derivations of the empty sequent. It is worth quoting Takeuti’s own assessment of his achievements.

… the subsystems for which I have been able to prove the fundamental conjecture are the system with \(\Pi^1_1\) comprehension axiom and a slightly stronger system, that is, the one with \(\Pi^1_1\) comprehension axiom together with inductive definitions.[…] I tried to resolve the fundamental conjecture for the system with the \(\Delta^1_2\) comprehension axiom within our extended version of the finite standpoint. Ultimately, our success was limited to the system with provably \(\Delta^1_2\) comprehension axiom. This was my last successful result in this area. (Takeuti 2003: 133)

The subsystems of \(\bZ_2\) that are alluded to in the above discussion are now to be described. We consider the axiom schema of \(\cC\)-comprehension for formula classes \(\cC\) which is given by

\[ \tag*{\(\cC\Hy\bCA\)} \Gamma\Rightarrow \exists X\forall u(u\in X\leftrightarrow{F}(u)) \]

for all formulae \({F}\in\cC\) in which X does not occur. Natural formula classes are the arithmetical formulae, consisting of all formulae without second order quantifiers \(\forall X\) and \(\exists X\), and the \(\Pi^1_n\)-formulae, where a \(\Pi^1_n\)-formula is a formula of the form \(\forall X_1\ldots Q X_n\,A(X_1,\ldots,X_n)\) with \(\forall X_1\ldots Q X_n\) being a string of n alternating set quantifiers, commencing with a universal one, followed by an arithmetical formula \(A(X_1,\ldots,X_n)\). Note that in this notation the class of arithmetical formulae is denoted by \(\Pi^1_0\).

Also “mixed” forms of comprehension are of interest, e.g.,

\[\tag*{\(\Delta^1_n{\Hy}\bCA\)} \Gamma\Rightarrow \forall u\,[F(u)\leftrightarrow G(u)] \to \exists X\,\forall u\,[u\in X \leftrightarrow F(u)] \]

where \(F(u)\) is in \(\Pi^1_n\) and \(G(u)\) in \(\Sigma^1_n\).

One also considers \(\Delta^1_n\) comprehension rules:

\[ \frac{\emptyset \Rightarrow \forall u\,[F(u)\leftrightarrow G(u)]} {\Gamma\Rightarrow \exists X\,\forall u\,[u\in X \leftrightarrow F(u)]} \quad \quad\quad \begin{split} \textrm{if } & F(u)\in\Pi^1_n,\\ & G(u)\in\Sigma^1_n \end{split} \tag*{\(\Delta^1_n{\Hy}\bCR\)} \]

For each axiom scheme \(\mathbf{Ax}\) we denote by \((\mathbf{Ax})_0\) the theory consisting of the basic arithmetical axioms plus the scheme \(\mathbf{Ax}\). By contrast, \((\mathbf{Ax})\) stands for the theory \((\mathbf{Ax})_0\) augmented by the scheme of induction for all \(\cL_2\)-formulae. An example for these notations is the theory \((\bPi^1_1-\bCA)_0\) which has the comprehension schema for \(\bPi^1_1\)-formulae.

In PA one can define an elementary injective pairing function on numbers, e.g., \((n,m)\coloneqq 2^n\times3^m\). With the help of this function an infinite sequence of sets of natural numbers can be coded as a single set of natural numbers. The \(n^{th}\) section of set of natural numbers U is defined by \(U_n\,\coloneqq \,\{m:\,(n,m)\in U\}\). Using this coding, we can formulate a form of the axiom of choice for formulae \({F}\) in \(\cC\) by

\[\tag*{\(\cC{\Hy}\bAC\)} \Gamma\Rightarrow \forall x\exists Y{F}(x,Y)\rightarrow\exists Y\forall x{F}(x,Y_x). \]

The basic relations between the above theories are discussed in Feferman and Sieg 1981a.

5.2 Predicative Theories

A major stumbling block for proving Takeuti’s fundamental conjecture is that in \((\forall_2\bL)\) and \((\exists_2\rR)\) inferences the minor formula \(F(\{v\mid A(v)\})\) can have a much higher complexity than the principal (inferred) formula \(QX\,F(X)\). If, instead, one allowed these inferences only in cases where the ‘abstraction’ term \(\{v\mid A(v)\}\) had (in some sense) a lower complexity than \(QX\,F(X)\), cut elimination could be restored. To implement this idea, one introduces a hierarchy of sets (formally represented by abstraction terms) whose complexity is stratified by ordinal levels \(\alpha\), and a pertaining hierarchy of quantifiers \(\forall X^{\beta}\) and \(\exists X^{\beta}\) conceived to range over sets of levels \(<\beta\). This is the basic idea underlying the ramified analytic hierarchy. The problem of which ordinals could be used for the transfinite iteration led to the concept of autonomous progressions of theories. The general idea of progressions of theories is very natural and we shall consider it first before discussing the autonomous versions.

5.2.1 Progressions of theories: Completing the incomplete

As observed earlier, Hilbert attempted to overcome the incompleteness of first-order arithmetic by introducing as axioms \(\Pi^0_1\)-statements all of whose instances had been finitistically proved (Hilbert 1931a). In a way he modified the concept of a “formal” theory by invoking finitist provability. Bernays, in his letter to Gödel of January 18, 1931 (Gödel 2003: 86–88), proposed a rule of a more general form. He indicated also that it would allow the elimination of the induction principle—in exchange for dealing with infinite proofs.

These considerations among others raised the issue of what constitutes a properly formal theory. Gödel paid very special attention to it when giving his Princeton Lectures in 1934. At the very end he introduced the general recursive functions. This class of number theoretic functions was shown to be co-extensional with Church’s λ-definable ones by Church and Kleene. In Church 1936 an “identification” of effective calculability and general recursiveness was proposed, what is usually called Church’s thesis. Turing, of course, proposed his machine computability for a very similar purpose and proved its equivalence to λ-definability in an appendix to his 1936. Church and Turing used their respective notion to establish the undecidability of first-order logic. For Gödel, this was the background for formulating the incompleteness theorems in “full generality” for all formal theories (containing a modicum of number or set theory); see the Postscriptum to the Princeton Lectures Gödel wrote in 1964:

In consequence of later advances, in particular of the fact that, due to A.M. Turing’s work, a precise and unquestionably adequate definition of the general concept of formal system can now be given, the existence of undecidable arithmetical propositions and the non-demonstrability of the consistency of a system in the same system can now be proved rigorously for every consistent formal system containing a certain amount of finitary number theory. (Gödel 1986: 369).

The first incompleteness is proved for any such theory T, by explicitly producing an unprovable yet true statement \(G_\bT\). That formula can then be added to T making \(\bT+G_\bT\) a “less incomplete” theory. Von Neumann had already established the equivalence of \(G_\bT\) with the consistency statement for T, \(\Con (\bT)\); the latter expresses that there is no proof in T of a blatantly false statement such as \(0=1\). This gives then rise to an extension procedure leading from T to \(\bT'\), namely (R1) \(\bT'=\bT+G_\bT\).

Thus one might try to address the incompleteness of T by forming a sequence of theories \(\bT=\bT_0\subset \bT_1\subset\bT_2\subset\ldots\) where \(\bT_{i+1}=\bT_i'\) and to continue this into the transfinite. The latter can be achieved by letting \(\bT_{\lambda}=\bigcup_{\alpha<\lambda}\bT_{\alpha}\) for limit ordinals λ and \(\bT_{\alpha+1}=\bT_{\alpha}'\) for successor ordinals \(\alpha+1\). However, the consistency statement for \(\bT_{\lambda}\), thus the provability predicate for the theory, has to be expressed in the language of \(\bT_{\lambda}\), and one cannot simply use set theoretic ordinals. Furthermore, the extensions of T are all supposed to be formal theories, i.e., the axioms have to be enumerable by recursive functions. To deal with both issues at once, one has to deal with ordinals in an effective way.

That is what Turing did in his Princeton dissertation (1939) concerning, what he called, ordinal logics. There he considers two ways of achieving the effective representation of ordinals. The first way is via the set \(\rW\) of numbers e for recursive well-orderings \(\leq_e\), the second is provided by the class of Church-Kleene notations for ordinals (Church and Kleene 1936) that used expressions in the λ-calculus to describe ordinals. The latter approach was then modified in Kleene 1938 to an equivalent recursion-theoretic definition that uses numerical codes to denote countable ordinals and is known as Kleene’s \({\cO}\).

Definition 5.1 A computable or recursive function on the naturals is one that can be computed by a Turing machine. The program of a Turing machine M can be assigned a Gödel number \({\Corner{M}}\). For natural numbers \(e,n\), to convey that the Turing machine with Gödel number e computes a number m on input n, we use the notation \(\{e\}(n)\) for m. Kleene uses \({\rsuc}(a)\coloneqq 2^a\) as notations for successor ordinals and and \({\rlim}(e)\coloneqq 3\cdot 5^e\) for limit ordinals. The class \({\cO}\) of ordinal notations, the partial ordering relation \(\relLTcO\) between such notations, and the ordinal \({\mid}{a}{\mid}\) denoted by \(a\in {\cO}\) are defined simultaneously as follows: \(0\in{\cO}\), and \({\mid}{0}{\mid} =0\). If \(a\in{\cO}\) then \({\rsuc}(a)\in{\cO}\), \(a \relLTcO {\rsuc}(a)\) and \({\lvert {\rsuc}(a)\rvert}={\lvert a\rvert}+1\). If e is an index of a total recursive function and \({\{e\}(n)} \relLTcO{\{e\}(n+1)}\) holds for all \(n\in{\bbN}\), then \({\rlim}(e)\in{\cO}\), and \({\lvert {\rlim}(e)\rvert}=\sup\{{\lvert {\{e\}(n)}\rvert}\mid n\in{\bbN}\}\). If \(a \relLTcO b\) and \(b \relLTcO c\) then \(a \relLTcO c\).

The first ordinal \(\tau\) such that there is no recursive well-ordering of order type \(\tau\) is usually denoted by \(\omega^{CK}\) in honor of Church and Kleene. It can be shown for the above definition of \({\cO}\) that the recursive ordinals are exactly those that have a notation in \({\cO}\).

When it comes to theories T, quite unlike to other areas of logic (e.g., model theory), results as those presented in this section depend not only on the set of axioms of T, but also on the way they are presented. When talking about a theory T we assume that T is given by a \(\Sigma^0_1\)-formula \(\psi(v_0)\) such that F is an axiom of T just in case \(\psi({\Corner{F}})\) holds; a \(\Sigma^0_1\)-formula is of the form \(\exists y_1\ldots\exists y_n\,R(y_1,\ldots y_n)\) with R primitive recursive. This consideration together with Kleene’s \({\cO}\) allows us to build a transfinite hierarchy of theories based on any suitable theory T. A consistency progression based on T is a primitive recursive function \(n\mapsto \psi_n\) that associates with every natural number n a \(\Sigma^0_1\)-formula \(\psi_n(v_0)\) that defines \(\bT_n\) such that PA proves: (i) \(\bT_0=\bT\); (ii) \(\bT_{{\rsuc}(n)}=\bT_n'\), and (iii) \(\bT_{{\rlim}(n)}=\bigcup_x\bT_{\{n\}(x)}\). So, finally we can formulate Turing’s completeness result.

Theorem 5.2 For any true \(\Pi^0_1\) sentence F a number \(a_{F}\in{\cO}\) can be constructed such that \({{\mid}\,{a_{F}}\,{\mid}}=\omega+1\) and \(\bT_{a_{F}}\vdash F\). Moreover, the function \(F\mapsto a_{F}\) is given by a primitive recursive function.

At first glance Turing’s theorem seems to provide some insight into the nature of true \(\Pi^0_1\)-statements. That this is an “illusion” is revealed by the analysis of its simple proof which is just based on the trick of coding the truth of F as a member of \({\cO}\). The proof also shows that the infinitely many iterated consistency axioms \(\Con (\bT_0),\Con (\bT_1),\ldots\) of \(\bT_{{{\rsuc}}({\rlim}(e))}\) are irrelevant for proving F. As it turns out, the reason why one has to go to stage \(\omega+1\) is simply that only at stage \(\omega\) a non-standard definition of the axioms of \(\bigcup_{n<\omega}\bT_n\) can be introduced. More details and other results on recursive progressions are discussed in Appendix B. Here let us just mention that one has considered other progressions based on various extension procedures \(\bT \m