Groklaw member PolR has written an explanation for lawyers of computation theory to try to fill a gap in their knowledge that he has observed from reading legal briefs.



A brief extract:

Lawyers and judges know about modern electronics in general and computers in particular. They know about code and the compilation process that turns source code into binary executable code. Computation theory is something different. It is an area of mathematics that overlaps with philosophy. Computation theory provides the mathematical foundations that make it possible to build computers and write programs. Without this knowledge several of the fundamental principles of computer science are simply off the radar and never taken into consideration. The fundamentals of computation theory are not obvious. A group of the greatest mathematicians of the twentieth century needed decades to figure them out. If this information is not communicated to lawyers and judges they have no chance to understand what is going on and mistakes are sure to happen. All the errors I have found result from this omission. The purpose of this text is to help fill the gap. I try to explain all the key concepts in a language most everyone will understand. I will underline the key legal questions they raise or help answer as they are encountered. I will conclude the article with examples of common mistakes and how the knowledge of computation theory helps avoid them.

Feel free, then, those of you who are also programmers and engineers, to amplify, correct, explain, or if there is another topic you feel lawyers and judges need to understand, email me about it, and let's see if we can help them out.

Similarly, if you are a lawyer or judge, feel free to ask questions or ask that further information be provided.

What I get from the paper is that there isn't really a difference between what a human does with a paper and pencil and what a computer does, except speed, and that neither method of computation should be patentable subject matter. Also, it's now clear that just as you wouldn't go into litigation without a lawyer, if you are wise, because that is their area of expertise, lawyers also shouldn't assume they understand software and patents and how they relate unless they get help from technology experts who know how software and computers, and the underlying math, really work.

Update: If you'd like to print the article without the introduction, I've placed a copy here. And here's a PDF.

And for those of you who are interested in digging a bit more into computational theory, with an emphasis on the theory as opposed to the consequences with regard to patents, here's a reference for you, A Problem Course in Mathematical Logic, by Stefan Bilaniuk. Here's a description of what you'll find:

Parts I and II, Propositional Logic and First-Order Logic respectively, cover the basics of these topics through the Soundness, Completeness, and Compactness Theorems, plus a little on applications of the Compactness Theorem. [...] Part III, Computability, covers the basics of computability using Turing machines and recursive functions; it could be used as the basis of a one-term course. Part IV, Incompleteness, is concerned with proving the Gödel Incompleteness Theorems.

*****************************

An Explanation of Computation Theory for Lawyers

By PolR

[This article is licensed under

a Creative Commons License.]

I am a computer professional with over 25 years of experience. I have a Master's degree in computer science and no legal expertise beyond what I acquired reading Groklaw. Yet I have developed an interest in understanding US patent law and how it applies to software. I have read a few court cases and legal briefs. Based on my professional knowledge I can say that in some precedent-setting cases the legal understanding of computers and software is not technologically correct.

I see the situation like this. The authors of legal briefs and court rulings have enough of an understanding to feel confident they can write meaningful arguments on the topic. But yet they do not understand computers and software well enough to reach technically correct conclusions. The unfortunate result is legal precedents that do not connect with reality. Computers don't work the way some legal documents and court precedents say they do1.

I think I can identify the root cause of this. It seems that nobody has ever taught the legal community the basic concepts of computation theory. At the very least I don't find any evidence that computation theory is known in the cases and legal briefs I have read so far. Lawyers and judges know about modern electronics in general and computers in particular. They know about code and the compilation process that turns source code into binary executable code. Computation theory is something different. It is an area of mathematics that overlaps with philosophy2. Computation theory provides the mathematical foundations that make it possible to build computers and write programs. Without this knowledge several of the fundamental principles of computer science are simply off the radar and never taken into consideration.

The fundamentals of computation theory are not obvious. A group of the greatest mathematicians of the twentieth century needed decades to figure them out. If this information is not communicated to lawyers and judges they have no chance to understand what is going on and mistakes are sure to happen. All the errors I have found result from this omission.

The purpose of this text is to help fill the gap. I try to explain all the key concepts in a language most everyone will understand. I will underline the key legal questions they raise or help answer as they are encountered. I will conclude the article with examples of common mistakes and how the knowledge of computation theory helps avoid them.

Let's illustrate the importance of computation theory with one of the legal issues computation theory will help resolve. Consider the following list of statements.

All software is data.

All software is discovered and not invented.

All software is abstract.

All software is mathematics.

If my understanding of the US patent law is correct, whether or not any of these statements is true could determine whether or not software is patentable subject matter. The resolution of this issue has serious consequences to the computer electronics and software industries.

When you know computation theory, you know without a shred of a doubt that each of these statements states a fact that is grounded in well-established mathematics. If you don't know computation theory, these statements will probably look to you like debatable issues.

Effective methods

Historically, modern computation theory started in the 1920s and 1930s with research on "effective methods". In a totally untypical manner, the definition of this mathematical concept is written in plain English as opposed to some mathematical notation that would be impenetrable to laymen. Here it is3:

A method, or procedure, M, for achieving some desired result is called 'effective' or 'mechanical' just in case M is set out in terms of a finite number of exact instructions (each instruction being expressed by means of a finite number of symbols); M will, if carried out without error, always produce the desired result in a finite number of steps; M can (in practice or in principle) be carried out by a human being unaided by any machinery save paper and pencil; M demands no insight or ingenuity on the part of the human being carrying it out.

The terms 'effective' or 'mechanical' are terms of art in mathematics and philosophy. They are not used in their ordinary, everyday meaning. The phrase "effective method" is a term of art. This term has nothing to do with the legal meaning of "effective" and "method". The fact that these two words also have a meaning in patent law is a coincidence.

An effective method is a procedure to perform a computation that is entirely defined by its rules. The human that carries out the computation must follow the rules strictly; otherwise the answer he gets is not the one that is intended. Examples of effective methods were taught to us in grade school when we learned how to perform ordinary arithmetic operations like additions, subtractions, multiplications and divisions. We were taught procedures that are to be carried out with pencil and paper. You have to perform these operations exactly how they were taught, otherwise the result of the calculation is wrong.

This definition was written in the early part of the 20th century, before the invention of the electronic computer. This was before the word "algorithm" came to be used in computer science. Effective methods are precursors to computer algorithms. The definition works from the assumption that you need a human to do a computation4. Clause 3 explicitly mentions that the computation must be one that it is possible for a human to carry out with pencil and paper. Aid from machinery is explicitly forbidden. In effect this definition spells out what kind of human activity would qualify as an algorithm, but without using that terminology. The term "effective method" was used instead.

Of course this concept has since been superseded with the modern concept of computer algorithm. Mathematical discoveries from the 1930s have shown the human requirement may be dropped from the definition without changing the mathematical meaning. Effective methods and computer algorithms are the same thing, except that computers are allowed to execute algorithms without running afoul of the definition. This is known as the Church-Turing thesis. This is one of the most important results of computation theory and theoretical computer science. A big part of this text will be devoted at explaining exactly what this thesis is, how mathematicians discovered it and what is the evidence we have of its truth.

There are a few fine points in this definition that deserve further explanations. Clause number 3 states that an effective method can be carried out "in practice or in principle" by a human being. This language is there because mathematics is about abstract concepts. You don't want an effective method to stop being an effective method when a human runs out of pencils or paper. You don't want the method to stop being an effective method when a human dies of old age before being done with the calculation. This is the sort of thing that may happen if, for example, you try to calculate a ridiculously large number of decimals of π5. By stating the method may be carried out "in principle" the definition makes abstraction of these practical limitations.

Clause number 4 states that no demands are made on insight or ingenuity. This clause implies that all the decisions that are required to perform the calculation are spelled out explicitly by the rules. The human must make no judgment call and use no skill beyond his ability to apply the rules with pencil and paper. The implication is that the human is used as a biological computing machine and serves as a reference for defining what a computation is.

Effective methods and the Church-Turing thesis are relevant to an issue that has legal implications. What is the relationship between an algorithm executed by a computer and a human working with pencil and paper? Abstract methods made of mental steps are not patentable subject matter6. When is software such a method? When is it a physical process? With knowledge of computation theory and the Church-Turing thesis, lawyers and courts will be able to tap on the discoveries of mathematics to answer these questions with greater accuracy.

Formal Systems

Why were mathematicians so interested in effective methods in the 1920s? They wanted to solve a problem that was nagging them at the time. Sometimes someone discovered a theorem and published a proof. Then some time passes, months, years or decades, and then someone discovered another theorem that contradicted the first and published a proof. When this happens, mathematicians are left scratching their heads. Both theorems cannot be true. There must be a mistake somewhere. If the proof of one of the theorems is wrong then the problem is solved. They revoke the theorem with a faulty proof and move on. But what if both proofs are sound and stand scrutiny?

These kinds of discoveries are called paradoxes. They are indicative that there is something rotten in the foundations that define how you prove theorems. Mathematical proofs are the only tool that can uncover mathematical truth. If you never know when the opposite theorems will be proven, you can't trust your proofs. If you can't trust your proofs, you can't trust mathematics. This is not acceptable. The solution is to look at the foundations, find the error and correct it so paradoxes don't occur any more. In the early twentieth century, mathematicians were struggling with pretty nasty paradoxes. They felt the need to investigate and fix the foundations they were working on7.

The relationship between algorithms and mathematical proofs is an important part of computation theory. It is part of the evidence that software is abstract and software is mathematics. The efforts to fix the foundations of mathematics in the early twentieth century are an important part of this story.

One of the most prominent mathematicians of the time, David Hilbert, proposed a program that, if implemented, would solve the problem of paradoxes. He argued that mathematical proofs should use formal methods of manipulating the mathematical text, an approach known as formalism. He proposed to base mathematics on formal systems that are made of three components.

A synthetic language with a defined syntax An explicit list of logical inference rules An explicit list of axioms

Let's review all three components one by one to see how they work. Mathematics uses special symbols to write propositions like a+b=c or E=mc2. There are rules on how you use these symbols. You can't write garbage like +=%5/ and expect it to make sense. The symbols together with the rules make a synthetic language. The rules define the syntax of the language. Computer programmers use this kind of synthetic language every day when they program the computer. The idea of such a special language to express what can't be easily expressed in English did not originate with computer programmers. Mathematicians thought of it first8.

How do you test if your formula complies with the syntax? Hilbert required that you use an effective method that verifies that the rules are obeyed. If you can't use such a method to test your syntax then the language is unfit to be used as a component of a formal system9.

Inference rules are how you make deductions10. A deduction is a sequence of propositions written in the language of mathematics that logically follows. At each step in the sequence there must be a rule that tells you why you are allowed to get there given the propositions that were previously deduced. For example suppose you have a proof of A and separately you have a proof of B. Then the logical deduction is you can put A and B together to make "A and B". This example is so obvious that it goes without saying. In a formal system you are not allowed to let obvious things be untold. All the rules must be spelled out before you even start making deductions. You can't use anything that has not been spelled out no matter how obvious it is. This list of rules is the second component of a formal system.

It was known since the works of philosopher Gottlob Frege, Bertrand Russell, and their successors that all logical inference rules can be expressed as syntactic manipulations. For example, take the previous example where we turn separate proofs of A and B into a proof of "A and B" together. The inference consists of taking A, taking B, put an "and" in the middle and we have "A and B". You don't need to know what A and B stand for to do this. You don't need to know how they are proved. You just manipulate the text according to the rule. All the inferences that are logically valid can be expressed by rules that work in this manner11. This opens an interesting possibility. You don't use a human judgment call to determine if a mathematical proof is valid. You check the syntax and verify that all inferences are made according to the rules that have been listed12. This check is made by applying an effective method13. You may find a computerized language for writing and verifying proofs in this manner at the Metamath Home Page.

If there is no human judgment call in verifying proofs and syntax, where is human judgment to be found? It is when you find a practical application to the mathematical language. The mathematical symbols can be read and interpreted according to their meanings. You don't need to understand the meaning to verify the proof is carried out according to the rules of logic, but you need to understand the meaning to make some real-world use of the knowledge you have so gained. For example when you prove a theorem about geometry, you don't need to know what the geometric language means to verify the correctness of the proof, but you will need to understand what your theorem means when you use it to survey the land.

Another place where human judgment is required is in the choice of the axioms. Any intuitive element that is required in mathematics must be expressed as an axiom. Each axiom must be written using the mathematical language. The rules of logic say you can always quote an axiom without having to prove it. This is because the axioms are supposed to be the embodiment of human intuition. They are the starting point of deductions. If some axiom doesn't correspond to some intuitive truth, it has no business in the formal system. If some intuitive truth is not captured by any axiom, you need to add another axiom14.

The list of axioms are the third component of a formal system. Together the syntax, the rules of inferences and the axioms include everything you need to write mathematical proofs. You start with some axioms and elaborate the inferences until you reach the desired conclusion. Once you have proven a theorem, you add it in the list of propositions you can quote without proof. The assumption is that the proof of the theorem is included by reference to your proof. And because all the rules and axioms have been explicitly specified from the start and meticulously followed, there is no dark corner in your logic where something unexpected can hide and eventually spring out to undermine your proof.

To recapitulate, effective methods play a big role in this program. They are used (1) to verify the correctness of the syntax of the mathematical language and (2) to verify that the mathematical proofs comply with the rules of inferences. The implication is that there is a tie between formal methods and abstract mathematical thinking. If you consider the Church-Turing thesis, there is a further implication of a tie between computer algorithms and abstract mathematical thinking. This article will elaborate on the nature of this tie.

Let's return to Hilbert's program. When you make explicit all part of the mathematical reasoning in a formal system, the very concept of mathematical proof is amenable to mathematical analysis. You can write proof about how proofs are written and analyze what properties various formal systems have15. Hilbert intended to use this property of formal systems to bring the paradoxes to a permanent end. Hilbert's program was to find a formal system suitable to be the foundations of all of mathematics that would meet these requirements:

The system must be consistent. This means there must be no proposition that could be proven both true and false. Consistency ensures there is no paradoxes. The system must be complete. This means every proposition that could possibly be written in the system could be either proven true or proven false. Completeness ensures that every mathematical question that could be formulated with mathematical language could be solved within the confine of the system. The system must be decidable. This means that there must be an effective method to find at least one actual proof of a proposition when it is true or at least one actual refutation of the proposition when it is false16. Decidability ensures that when confronted with a mathematical problem we always know how to solve it.

If the formal system is amenable to mathematical analysis then, according to Hilbert, it would be possible to find a mathematical proof that the system meet the requirements of consistency, completeness and decidability17. Completing such a program would be a formidable intellectual accomplishment. All of mathematics would be contained in a system based on axioms, inference rules and effective methods. Every mathematical question would be solved by carrying out a decision method where human judgment plays no part. In this approach, the role of human intuition is limited to the selection of the axioms.

From Effective Methods to Computable Functions

Hilbert's program was successful in part and failed in part. Useful formal systems that appear to be free of paradoxes were developed18. This is the successful part. The failed part is there is no formal system that can be used as the foundation of the entirety of mathematics and is also consistent, complete and decidable. Such formal systems are not possible. Theorems that show that these properties are mutually incompatible have been found. But before the mathematicians reached this conclusion, substantial research was done. It is in the course of this research that the fundamental concepts of computation theory were developed. Ultimately these findings have led to the development of the modern programmable electronic computer19.

When working on Hilbert's program, mathematicians were struggling with another nagging issue. Hilbert's program called for every mathematical concept to be defined using a special mathematical language. How do we write the definition of effective method in such a language? The reference to a human being calculating with pencil and paper is not easily amenable to this kind of treatment. This is why the original definition is written in plain English. Some specific effective methods could be defined mathematically. Effective methods in general were something a mathematician would know when he sees one but couldn't be defined with mathematical formulas.

The work on Hilbert's program is what led to the discovery of how to define mathematically the concept of effective methods. Several different but equivalent definitions were found. To reflect the definitions being used, effective methods are called "computable functions" when we refer to the mathematical definitions. When effective methods are considered for use with a computer instead of being carried out by a human, they are called "computer algorithms". When translated in a programming language for use on an actual computer, they are called "computer programs".

These multiple definitions of computable functions led to very different methods of how to actually make the calculations. A human being that would use one of these system would write on paper something very different from someone using the other system. But still they would all do the same calculations. The reason is that algorithms are abstractions different from the specific steps of how the calculation is done20.

For example, consider a sorting algorithm written in a programming language like C. You give the program a list of names in random order and the program prints them sorted in alphabetic order. The program may be compiled for Intel x86, for Sun SPARC, for ARM or any other make and brand of CPUs. The binary instructions generated by the compiler for one CPU will be very different from the instructions for the other CPUs because this is how CPUs are made. If you trace the hardware instructions that are actually executed with a debugger, you will verify that the execution of the code is different from CPU to CPU because of the difference in the instructions. But still it is the same C program that executes. The names will be printed out in sorted order on all CPUs. The abstract sorting algorithm is the same on all CPUs.

This difference between specific instructions and abstract algorithm has its basis in computation theory. By the time you will be done reading this article you will know what that basis is.

There are three of the definitions of computable functions that stand as the most important ones. The Church-Turing thesis states that computable functions as defined by these definitions are the same functions as those one may compute with effective methods.

Kurt Gödel's recursive functions Alonzo Church's lambda-calculus Alan Turing's Turing machines

One needs to know all three definitions to understand computation theory. All three definitions played a role in the discovery that Hilbert's program is impossible to accomplish. All three definitions bring considerable insights as to what an algorithm is. All three definitions are equivalent to both effective methods and computer algorithms. Together they explain why the work of a human with pencil and paper and the work of a computer manipulating bits follow the exact same abstract processes.

Now let's look each of these three definitions one by one and flesh out the most legally relevant concepts of computation theory.

Kurt Gödel's Recursive Functions

In mathematics the word "function" is a term of art. A function is a correspondence between a pair of mathematical objects that is such that the second object is determined by the first. For example the area of a circle is a function. When you know the circle, you can find out the area. Addition is a function. You know a pair of values21 and you add them up to get the result. Often functions are associated with a method or formula that state how to calculate the value of the function.

This terminology issue being settled, let's go back to the main topic. Kurt Gödel wasn't looking to define computable functions. This happened almost by accident. He was trying to do something very different and a few years later other mathematicians found out the implications of what he did.

Kurt Gödel was working on the Hilbert program. He was working on a formal system called "Peano arithmetic" that defines the arithmetic of positive integers. This is the numbers 0, 1, 2, 3 .... to infinity. The basic principle of Peano arithmetic is to base the entirety of arithmetic on the ability to count. This is a foundation made of two basic components. The first one is the number zero (0). The other one is the successor function that increases a number by one. The successor of 0 is 1, the successor of 1 is 2 etc. But the language of Peano arithmetic is too rudimentary to write numbers this way. It can only write the successor by appending an apostrophe. Zero is written 0, but 1 is written 0' and two is written 0'' and you continue like this appending one apostrophe every time you increment by one. You can identify a number by counting the apostrophes that are appended to 0.

Why are mathematicians bothering to do this? It is because this is research on the foundations. The goal is to find out what are the most minimal concepts that could serve as the basis of arithmetic. Do we need addition? Do we need subtraction? Do we need Arabic numbers? Or can we do everything with zero and successors alone? If the foundational concepts are kept very minimal, it reduces the risks that some complexity hides paradoxes that will pop up after a decade or two of research. It also makes proofs about the formal system simpler.

Remember the Hilbert program. There is a requirement to analyze the formal system to prove consistency, completeness and decidability. If the formal system is complex, these proofs will be hard to write. The hope is to make these proofs easier by keeping the formal system simple.

Gödel was studying a method of defining arithmetic functions called recursion. When you write a recursive definition22 you must specify two rules.

You define the value of the function for the number zero Assuming you know the value of the function for an arbitrary number n, you define how you compute the function for n'.

With these two rules you know how to compute the value of the function for all integers. Rule 1 tells you the value of the function for 0. Then you use rule 2 to compute the value of the function for 1 given that you know the value for 0. Then you proceed to compute the value for 2 and then 3 etc. As an example here is how you define addition using recursion.

m + 0 = m m + n' = (m + n)'

For the sake of example assume you want to compute 5+3. Starting with 5=0''''' and 3=0''' the steps go as follow:

5 + 3 = 0''''' + 0''' Translation into the Peano arithmetic language 0''''' + 0''' = (0''''' + 0'')' by rule no 2 (0''''' + 0'')' = (0''''' + 0')'' by rule no 2 (0''''' + 0')'' = (0''''' + 0)''' by rule no 2 (0''''' + 0)''' = (0''''')''' by rule no1 (0''''')''' = 0'''''''' because the parentheses can be removed at this point 0'''''''' = 8 Translate back into Arabic numbers

This illustrates how recursion works as an effective method. Every time you define something using recursion, you actually define an effective method to perform the calculation.

This example also shows why no one will want to actually make additions in this manner. The process is way too cumbersome. The goal is not to provide an alternative to the usual methods of doing arithmetic. The goal is to provide a foundation that is sound and free of paradoxes.

The purpose of the recursive definition of addition is to provide a reference. Other methods to add numbers are allowed but only if we can prove a theorem that proves these methods will yield the same answer as the recursive definition. Without the theorem we have no guarantee that the alternative calculation will yield the correct answer. The point of the theorem is to provide the guarantee that the procedures we use to calculate additions rest on the foundations.

Formal systems work like an edifice. You have a small number of concepts and associated axioms and rules of inference that provide a foundation. But the foundation is not used directly because it is too minimal to be usable in a practical situation. You must build theorems and theorems over theorems until you have a body of knowledge that is usable. Part of the task required by the Hilbert program was to build the edifice to show the proposed foundations are suitable for the intended purpose. This is what Gödel's concept of recursion is contributing to.

Gödel Numbers

When working on recursion, Gödel got a revolutionary insight. He noticed something useful about prime numbers.

I suppose you remember prime numbers from your high school arithmetic. But if you don't let's recapitulate. A prime number is a number greater than 1 that can be divided evenly only by 1 or itself. There is an infinite sequence of prime numbers: 2, 3, 5, 7, 11, 13 ... etc. Every number can be factored into the product of prime numbers. Here are a few examples.

12 = 2 2 3 1

3 1125 = 203253

When writing a prime number factorization it is customary to omit the multiplication symbols. They are implicit. 2231 actually means 22 x 31.

Look at the exponents. Sometimes you need to multiply the same prime number multiple times to get the result. 2231 actually means 2 x 2 x 3. An exponent of 1 means you multiply one times. An exponent of 0 means you don't need to multiply the prime number. 203253 actually means 3 x 3 x 5 x 5 x 5.

What if you assign numbers to an alphabet? For instance suppose you assign a=1, b=2 until z=26? Then you can use prime numbers to encode text into numbers. You can turn "Pamela" into something like 2p3a5m7e11l13a. Replace the letters with their corresponding numbers and the text becomes a number. You can also do the reverse. When you want to read the text you factor the number into primes and then turn the exponents back into letters. This works because there is only one way to factor a number into prime numbers; therefore a number can only translate into one text. There can be no ambiguity.

Numbers that are used to encode text in this manner are called Gödel numbers.

Ordinary text editing operations people can do with pencil and paper can also be defined using recursion. For example you can use arithmetic to put 2g3r5o7k and 2l3a5w together to make 2g3r5o7k11l13a17w. This corresponds to the use of pencil and paper to put "grok" and "law" together to make "groklaw". Arithmetic operations that manipulate the exponents of prime numbers are the mirror images of the symbolic manipulations on text.

This is time to go back to formal systems. Put this idea together with what we have just said on prime numbers. We get that formal systems themselves have a mirror image in arithmetic. This is the case for Peano arithmetic. There the logic goes:

All the symbols can be translated into numbers.

The effective methods that test when a mathematical proposition made of symbol is syntactically correct can be translated into a recursive function that makes the same test on exponents of prime numbers.

The effective method that tests when a proof is made according to the inference rules can also be translated into a recursive function that makes the same test on exponents on prime numbers.

All the axioms can be translated into exponents on prime numbers by translating their symbols.

Then the entire formal system has been translated into arithmetic23. This is called the "arithmetization of syntax". It is one of the greatest mathematical discoveries of the twentieth century. It has many far reaching consequences24, some of them are germane to patent law.

Symbols and Reducibility

Gödel numbers are evidence that the symbols of the mathematical language need not be visual patterns written on paper. They can also be something abstract like the exponents of prime numbers. Or if we take a more modern look, they can be 0s and 1s stored as electromagnetic signals in electronic devices. In all cases the symbols have the same meaning and when we use them to write mathematical statements they allow to write the same logical inferences. This observation raise more issues. What do we mean, "have the same meaning"? What do we mean "allow to write the same logical inferences"? This asks for a human intelligence to look at the symbols and make such determination. This is in apparent conflict with the principle of a formal system where the validity of mathematical proofs is verified without using human judgment calls.

Gödel's work brings a solution to this apparent conflict. Syntactic translations between the different representations of the formal system can be defined mathematically. Do you remember when Hilbert pointed out that formal systems may be subject to mathematical analysis? Here we are. Syntactic translations are such mathematical analysis. Once you have shown mathematically that there is a translation you can conclude that whatever the meaning is in the original it will be preserved in the translation. This is called reducibility25. We say syntax is reduced to arithmetic when a translation from the syntax to arithmetic is available.

Reducibility is a concept that should be important to patent law. Abstract ideas are not patentable subject matter. In order to prevent abstract ideas from being indirectly patented, there are exceptions in the law that will not allow patenting an "invention" made of printed text based on the content of the text. The implication of Gödel's discovery is that the symbols that represent an abstract idea need not be printed text. The symbols can be something abstract like exponents of prime numbers. They can be something physical like electromagnetic signals or photonic signals. Symbols can take a variety of forms26. Does the law exempt abstract ideas from patenting only when they are written in text or does it exempt all physical representations? In the latter case what is the test that can tell a physical representation of an abstract idea from a patentable physical invention? Mathematicians know of such a test and it is reducibility.

Enumerability

Gödel numbers are also used to define computable enumerability. This is a term of art for a mathematical concept of computation theory. Imagine a list of all numbers that have a specific properties. For example these lists are computable enumerable:

The even numbers: 0, 2, 4, 6 ...

The odd numbers: 1, 3, 5 ,7 ...

The prime numbers: 2, 3, 5, 7, 11 ...

Computable enumerability occurs when you can test with a recursive function whether the number has the property or not. You build your list by running all the numbers 0, 1, 2 in sequence and run them against the test. The first number that passes the test is the first on the list, The next number that passes the test is the next one etc.

Thanks to Gödel numbers, it is possible to define an arithmetical test for the syntax of the definition of a recursive functions. Then we can enumerate all the possible definitions. More specifically each recursive function corresponds to the Gödel number of its definition. What we enumerate is the Gödel numbers and this is the same thing as enumerating the definitions themselves because the Gödel numbers can be translated back into the corresponding text.

Now consider the Church-Turing thesis that computer programs are the same thing as recursive functions. This is not obvious now, but by the time we are done with this article all the mathematical evidence will have been presented to you. Would this be of consequence to the patentability of software? Developing a computer program that fulfills a specific purpose is mathematically the same thing as finding a number that has some specific properties. Can you patent the discovery of a number?

Another way to look at it is that there is a list of all the possible programs. We can make one by listing all possible programs in alphabetic order27. Any program that one may write is sitting in this list waiting to be written. In such a case, could a program be considered an invention in the sense of patent law? Or is it a discovery because the program is a preexisting mathematical abstraction that is merely discovered?

This idea that programs are enumerable numbers is supported by the hardware architecture of computers. Programs are bits in memory. Strings of 0s and 1s are numbers written in binary notations. Every program reads as a number when the bits are read in this manner. Programming a computer amounts to discovering a number that suits the programmer's purpose.

Alonzo Church's Lambda-Calculus

Lambda-calculus is the second definition of effective method that uses mathematical language, recursive functions being the first. I follow the plan of explaining each of these definitions one by one and now is the turn of lambda-calculus.

It is the brainchild of Alonzo Church. Like Gödel he was doing research in the foundations of mathematics but he used a very different angle. Instead of working on the arithmetic of positive integers he was trying to use the abstract concept of mathematical function as his foundation.

Functions occur in all areas of mathematics: arithmetic, geometry, boolean algebra, set theory etc. Church was concerned with the properties of functions that belonged to all types of functions independently of the branch of mathematic. When Peano arithmetic uses zero and the successor function as the most elementary concept, lambda-calculus gives this role to the operations of abstraction and application28.

What is abstraction? Do you remember in high school how shocking it was when the teacher showed you for the first time an algebraic formula that mixed up letters and numbers? How could you add a letter and a number like in x+7? For most students this concept is hard to grasp. People can understand 3+7 or 9+7 but x+7? What kind of number would be the result of such addition? The point that was hard to learn is abstraction. The point of writing x+7 is not to calculate an answer. The point is to define a pattern, the act of adding 7 to something. The pattern must be applicable to any number whatever it is. This is the purpose of using a letter. The formula doesn't want to refer to a specific number. The letter marks the place where a number belongs but this number is unknown and must not be specified. This finding of a pattern is abstraction.

Application is the reverse of abstraction. It is what you do when you know that x is the number 5 and that at last you can turn x+7 into 5+7.

To put it simply, abstractions are when you make functions by stating how they are calculated. Application is the actual use of the rule of the function.

The task Church set to himself is to develop a formal system based on abstraction and application. These two operations require the exercise of human intelligence. A formal system requires rules that can be followed without the use of human ingenuity. How do you reconcile the two? Church found a solution that involved breaking down abstraction and application into more elementary steps.

He handled abstraction with a syntactic trick. He required that the variable being abstracted is explicitly identified with the Greek letter λ. The expression x+7 would have to be written λx.x+7. Then you no longer need to use intelligence to know that it is an abstraction and that x is the marker for what is being abstracted. This information is written in the formula.

Application is expressed by juxtaposition. If you decide that x is 5 you would write (λx.x+7)5. Parentheses are used to group symbols when how they read may otherwise be ambiguous. This example means the abstraction x+7 is applied to the value 5. At this point the formal system would want you to look at the variable after the λ, in this example it is x, and you would replace it with the expression it is applied to, 5 in this example. Then (λx.x+7)5 is transformed into 5+7. This operation is called beta-reduction29. This is a fancy name Church has cooked up to mean what a modern word processor would call a search and replace operation.

This technique allows multiple levels of abstraction. For example consider the concept of addition. It can be expressed as an abstraction of both arguments λyλx.x+y. An expression like this would capture the very concept of addition independently of the values being added. Also each value can be applied separately. For example (λyλx.x+y)7 becomes after beta reduction λx.x+7. Functions like λyλx.x+y that produce another abstraction are called high order functions. A complete computation would require to apply beta reduction repeatedly until no abstractions are left. For example (λyλx.x+y)(7)(5) would turn into 5+7 after two beta reductions.

The combination of the λ notation and the beta reduction operation allowed Church to define his lambda-calculus. That is a formal system that captures the properties of abstraction and application. The repeated use of beta reduction until no abstraction is left is an effective method that is used to perform all computations in this system.

There is a twist. My examples used arithmetic operations because every one understands elementary arithmetic. This helps make the concepts clear. But the goal of Church was to study functions independently of any branch of mathematics. He did not include arithmetic operations in lambda-calculus30. Surprisingly this doesn't matter. Abstraction and application alone are sufficient to let you count. And then you can use recursion to reconstruct arithmetic from this basis.

Here is how you may represent numbers in lambda-calculus.

λsλx.x λsλx.s(x) λsλx.s(s(x)) λsλx.s(s(s(x)))

Remember Peano arithmetic when we identified the numbers by counting the apostrophes? Now we count the occurrences of s. The writing system is different but the idea is the same.

If you find yourself scratching your head at stuff like λsλx.s(x) wondering what it actually means this is normal. You need to be skilled at lambda calculus to make sense out of this kind of formula. I am not giving you a course in lambda calculus. There is no way you can learn how to read this language from the explanations I am giving you.

The point I am making is there is a translation from the concepts of 0 and successor numbers to other concepts that belong to lambda-calculus. The result of the translation is less important than the fact that there is a translation at all. The reason is that if there is a translation then mathematicians may invoke the principle of reducibility. The implication is that any computation31 that can be expressed in the language of Peano arithmetic can also be expressed with the language of lambda-calculus.

The existence of a translation is why mathematicians consider that recursive functions and lambda-calculus are equivalent. All computations that can be done by means of recursion can be translated into an equivalent computation made with lambda calculus. And conversely the language of lambda-calculus can be transformed into arithmetic by means of Gödel numbers. There is no computation that one of the languages can do that the other can't do.

There is a fundamental difference between Gödel numbers and this translation. Gödel numbers are translating the symbols of the formal language one by one. Therefore you have a mirror image of the text of the language hidden in the exponents of prime numbers. The translation into lambda-calculus is different. It translates not the symbols but the numbers themselves into functions. You have a mirror image of the numbers hidden into an enumeration of functions.

This technique of translation is called modeling. The mirror image in lambda-calculus of Peano arithmetic is called a model.

Imagine a geometric pattern like a circle. It remains a circle whether it is a wheel, a plate or a jar cover. You can see the pattern is the same by comparing the shapes. But how do you define "the same shape" mathematically? You don't eyeball the object and call them the same. You verify that all of these objects have a boundary where every point is at the same distance from the center. This is the mathematical definition of a circle and everything that meets the definition is a circle.

Now imaging the whole formal system is an abstract pattern made of relationships between formulas. The axioms and inference rules of the formal system are the equivalent of the definition of the circle. Everything that obeys these same rules is matching the definition of the pattern. The translation is proof that the model obeys the rules of Peano arithmetic. Therefore the abstract pattern that is characteristic of the arithmetic of positive integers is found within lambda-calculus32.

Models work a bit like a mock up. In the olden times, engineers designing a car would build a mock up before building a real car. The examination of the mock up helped them finding faults in their design before incurring the expense of making the real thing. The model of arithmetic is a mathematical mock up except that unlike the car mock up, the model has the full expressive power of the original.

The Church-Turing Thesis

Alonzo Church was so impressed by the power of Gödel numbers that he argued every effective method would be reproducible by some recursive function based on a Gödel number encoding of the effective method. This idea implies that recursive functions, lambda-calculus and effective methods are equivalent concepts.

This is known as the Church Thesis. It didn't convince everyone when it was first formulated. A more convincing formulation had to wait until the discovery of Turing machines. Nowadays a revised version of the Church Thesis that adds Turing machines to this list of equivalent concepts is accepted by mathematicians under the name of the Church-Turing thesis.

A Universal Algorithm

In Peano arithmetic each recursive function makes a different effective method. The rules to carry out the computation are provided by the definition. In lambda-calculus the situation is different. There is only one effective method that is used for all computations. This effective method is to keep doing beta reduction until there are no applications left that could be reduced.

Since lambda-calculus has the capability to perform every possible computation (in the extensional sense) then one algorithm is all one needs to perform every possible computation33. The difference between one computation and another is in the data that is supplied to the algorithm. This is one of the reasons we argue that software is data.

Lambda-calculus can be and has been implemented as a programming language. The language LISP is a prominent example.

This raises an interesting possibility. Suppose I am sued for infringement on a software patent. I may try defending myself arguing I did not implement the method that is covered by the patent. I implemented the lambda-calculus algorithm which is not covered by the patent. And even if it did, lambda-calculus is old. There is prior art. What happens to software patents then?

Anyone that argues in favor of method patents on software must be aware that the current state of technology permits this possibility.

Extensionality

Consider two of these idealized humans that are required to carry out effective methods. One of them is carrying out, say, a multiplication using Peano arithmetic. The other does the same multiplication using lambda-calculus. If we would compare their writings on paper, they will be different. The formal languages are not the same. the symbols are not the same and one follows the rules of recursion directly while the other follows them indirectly through repeated beta reduction. How can it be said the two computations are the same?

This is a very important question because it speaks to what constitutes "is the same" from a mathematician's perspective. For example when one argues software is mathematics, the argument is that the computation done by the computer is the same thing as a mathematical computation, therefore this is mathematics that is being done by the machine. Such argument can be understood only when one knows the mathematical meaning of "being the same" computation.

This question can be answered two different ways. There is the "intensional" definition and the "extensional" definition34.

The intensional definition considers the details of how the computation is done. If you change the details of the computation, from the intensional perspective you get a different computation even though the two computations always produce the same answer. By this perspective recursive functions and lambda-calculus are different.

The extensional definition ignores the details of the calculation and considers only the equivalence by means of reducibility. If the two calculations can be matched with a translation from one language to another, then they are the same from the extensional perspective. From the extensional perspective recursive functions and lambda-calculus are equivalent because every calculation made in one system is matched by a calculation made in the other system..

The law has a similar35 concept when it makes a difference between ideas and expressions of ideas. For example consider I write a program. Some outside party has written a similar program. This party sees my program and thinks I infringe on his proprietary rights. There are two scenarios depending on whether he makes his claim according to copyright law or patent law.

For claims of copyright infringement the text of the source code matters. If I wrote my program independently and I can prove the texts are different, I don't infringe on his rights. This is an intensional point of view.

For claims of patent infringement then differences in the text of the source code won't matter. It won't even matter if the code is written in a different programming language. If my program uses the same method that is covered by the patent according to whatever legal test of "same method" is applicable, I will infringe. This is an extensional perspective.

Which of the two perspectives is correct? It depends on the degree of abstraction that is wanted. The intensional perspective is the less abstract. It is the perspective you need when you study the properties of the written expressions of the symbols. It may be used to answer questions like how many steps are required to produce an answer or how much space in memory is required to store the symbols.

The extensional perspective is more abstract. It is the one you need when you are concerned with the expressive power of the language. It answers questions like: whether you can say the same things in both languages or how to tell when two expressions in different languages have the same meaning.

The intensional vs extensional dichotomy explains the situation of the C sorting program that is the same when run on different hardware architectures. When you compile the code, you produce different instructions depending on which hardware architecture is targeted. These difference are significant only when you look at the code from the intensional perspective. When you take the extensional perspective, all executables produce the same output. Therefore they are the same. The consequence is that software is abstract. The details of the instructions don't matter36.

Denotational Semantics

C is a programming language, not a mathematical language. How do we define its meaning mathematically? This is done using a technique called denotational semantics37.

For example consider the statement i=i+1 that could be written in many programming languages such as C. It looks like an equation but it is not an equation. Here the letter i is called a variable. Imagine a box with a label. The box contains information. The variable is the label for such a box. The statement i=i+1 means open the box labeled i, get the number that is in the box, add 1 to it and put it back in the same box. If there is no number in the box your program has a bug and it will fail. The question is what is the mathematical meaning of this operation.

Suppose you run your program in your favorite debugger38. You can stop the execution of the program just before the i=i+1 statement. Then you can use the debugger to inspect the memory of the computer. What do you see? There are many variables in use. Each of them is a labeled box containing some information. You may have a variable called "FirstName", another called "LastName", another one called "Salary" and one named "HiringDate". The memory is assigning each of these names a value. This is a mathematical function. It doesn't happen to be one you calculate because the values are explicitly stored and not computed with a formula or anything. It is a mathematical function nonetheless.

Now use the debugger to execute the i=i+1 statement and stop the execution again. What do you see in the memory? None of the variables have seen their values changed except i which has been incremented by one. The memory is a new mathematical function that is identical to the previous one except for the change in the value of i. This is the meaning of the statement. When you consider the content of the memory as a function, the statement is a high order function that changes this function into another function.

This is how denotational semantics works. Lambda-calculus is good at handling high order functions. You build a mathematical model of the memory content of the computer using lambda-calculus. Then each statement in the language can be translated into a high order function that manipulates this model. The result is a definition of the mathematical meaning of the program.

This programming statement is discussed in a brief to the Supreme Court of the United States by Professor Hollaar and IEEE-USA. When arguing that software is not mathematics, the brief states39:

But in most instances, the correspondence between computer programs and mathematics is merely cosmetic. For example, the equation E = MC2 expresses a relationship between energy and matter first noted by Einstein, while the computer program statement E = M * C ** 2 represents the calculation of M time C raised to the second power and then assigning the result to a storage location named E. It is unfortunate for purposes here that the early developers of programming language made their calculation-and-assignment statements look like mathematical equations so that they would seem familiar to scientists and engineers. But the common programming statement I = I + 1, which increments the value stored in location I, is essentially nonsense as a mathematical equation. Similarly, a computer program is a series of calculation-and-assignment statements that are processed sequentially, not a set of simultaneous mathematical equations that are solved for their variables.

This statement from the brief is incorrect. The correspondence between a statement in a high level language such as C and mathematics is not defined by syntactic similarity. It is defined by denotational semantic. There is another correspondence that involves machine instructions. This correspondence will be discussed when we reach the topic of Turing Machines.

Programming Directly from Mathematical Languages

If programming languages have a mathematical meaning what stops programmers from writing their programs using a mathematical language in the first place? The answer is it can be done. Languages such as LISP in the functional family of languages implement lambda-calculus and can be used to write programs using a mathematically defined notation. Programmers prefer languages in the imperative family40 of languages because the resulting programs is closer to the hardware architecture of the CPU and gives them more control on how the algorithm will execute. This situation may change. The trend is to increase the number of CPUs that are put on a chip. In presence of multiple CPUs imperative language becomes more difficult to use. The more CPUs, the harder it gets write a program that runs well on the computer. Functional languages are more amenable to work well in this kind of environment.

There is another approach. One may take advantage of the relationship between algorithms and logical proofs. Consider this parable.

This is 1963. It is the heart of the Cold War. The head of CIA rushes into the office of his main scientific advisor. CIA Head: We have received a report from James Bond. The Soviets have developed a new formula that makes our strategic defense obsolete. It will cost us billions to upgrade. But the report says it could be disinformation, that the new theory may be something phony to make us spend billions uselessly. We cannot afford to guess. We need to know. Can you use a computer to find out if the formula is true of false? Chief Scientist: Sure I do. I can write a program that prints "True" whatever the input is and another one that prints "False" whatever the input is. One of the two programs is bound to provide the correct answer. Therefore I can use a computer to print the solution you seek.

There are two major schools of mathematicians. The classical school would say the answer of the chief scientist is logically correct although inappropriate. One of the two programs does indeed print the correct answer and the chief scientist can indeed write it. Therefore the chief scientist has literally answered the question that has been asked.

The intuitionistic school will protest that the chief scientist's answer is illogical. You can't argue you can write the requested program until you can tell which program is the one you seek.

The parable has been designed to make it sounds like the intuitionistic point of view is the correct one. In this context the whole point of the question is to know which of the two programs provides the answer. I wanted to show that intuitionistic logic has merits. Such logic requires building a tie between a proof that a problem is solvable and the construction of an effective method that actually solves the problem. Their point is that if you can't actually solve the problem, how can you argue a solution exists?41

If you work out this tie using the language of lambda-calculus you get something called the Curry-Howard correspondence. It is a mathematical translation that takes a proof and turn it into an algorithm in the language of lambda-calculus or vice-versa. The implication is you can write a mathematical proof and use an automatic process to extract the built-in algorithm and compile it into an executable program. Or conversely you write a program and the compiler verifies that it is bug-free by extracting a proof of its correctness from the code or reporting an error if the proof doesn't work out. This is a topic for research. The mathematical principles are known but their application is not ready for use outside of the research laboratory yet.42 Despite this limitation some specific programs that have been verified in this manner have been developed.43

A question for patent law is what do you do when this research comes to fruition? At this point there will be no actual difference between the work of computer programmer and the work of a mathematician. How would you patent software without directly patenting mathematics?

The merging of programming and mathematical proofs has industrial applications. [See Campbell for an example.] If you can prove a program is mathematically correct you verify that its logic will perform according to the intent of the programmer. This eliminates a large quantity of bugs.44

Several formal verification methods exist outside of the Curry-Howard correspondence. One of the most popular is called Hoare logic45. It is a formal system where every statement in a programming language corresponds to a rule of inference. Programmers annotate their code with mathematical propositions and verify that the rules of inferences are obeyed. If they do, the program is proven mathematically correct.

Alan Turing's Turing Machines

Turing machines are the third of the major mathematical definitions of computable functions. They were proposed specifically to resolve an open question with the first two definitions. Alonzo Church put forth the thesis that Gödel numbers were powerful enough to reproduce every effective method that could be. But at the time the evidence was intuition. It looked like Gödel numbers have that power but how do you prove it? Alan Turing's work brought convincing evidence.

The definition of effective method requires a human being doing the work with pencil and paper. Turing proceeded from an analysis of what such a person is able to do. He eliminated all redundant possibilities until he was left with what is essential.46 Let's look at this elimination process.

Do we need to write things in columns or otherwise arrange symbols in two dimensions? This is not necessary. One can do the same work by writing everything on the same line and visualizing how it fits two-dimensionally. It is inconvenient, but since the task is to find out the essential minimum, writing things in two dimensions can be done away with. This suggests that regular sheet paper is not necessary. The paper may be in the form of a long tape where symbols are written sequentially. The human winds the tape back and forth as he works with the information.

How long the tape must be? Our idealized human who carries out the effective method never runs out of time and paper. The tape has to be infinitely long to reflect this capacity.

A human often needs to refer to information already written. How fast does he need to shuffle paper to find out what he wants? Moving the tape one symbol at the time is sufficient. If you need to go farther away you can repeat moving the tape one symbol until you get where you want to be.

When reading the information, how many symbols do you need to look at at the time? The answer is one. If you need to look at more symbols you can examine them one by one.

When faced with making a choice, on what basis would our person make his decision? Choices are made on the basis of the symbols being read and on his state of mind. He must track where he is in the execution of the effective method. This tracking is done by changing his state of mind. Then, when faced with a choice, the human will consider his state of mind and the symbol being read and make the next move as is required by the method. This implies we don't need to delve into human psychology or the biology of the brain. All we need to know is (1) that there are states of mind and (2) that the human changes his states as he tracks where he is in the execution of the effective methods.

How many states of mind are required? Only a finite number is required. They can be inventoried by analyzing the rules of the method and finding the junctures where choices are made. This means that if we write down the inventory of states of mind we may represent them with a finite number of symbols.

These considerations are summarized in the list of capabilities below. They are the minimal requirements to perform any effective method.

There is an infinitely long tape where symbols can be recorded sequentially.

The tape has a "current" position where a symbol can be read. This is the current symbol.

A symbol may be written at the current position overwriting any symbol already there.

The tape may be moved one symbol position either to the left or to the right. It may also stand still.

The human carrying out the method has a number of distinct possible states of mind.

The states of mind are finite in number and may be represented with a finite number of symbols.

The human changes his state of mind based on the previous state and the current symbol on the tape.

The implication is that it is possible to describe every effective method in these terms. The rules for carrying out the method are written as a list of quintuples. This is a list of line items. Each of them is made of five elements of information.

The current state of mind that is required for the rule to apply. The current symbol that is required for the rule to apply. The symbol that is written on the type when the rule applies. If this is the same symbol as the one mentioned in item 2, then this is a do-nothing operation. Whether the tape moves right, left or stands still after writing the symbol if the rule applies. The state of mind one must adopt after the application of the rule.

This process is transforming the broad scope of effective methods into something that has a standardized form. You analyze the original method rules, translate them into a suitable list of quintuples, and you are all set.

Now imagine that our idealized human carrying out the method chooses to use an idealized typewriter instead of pencil and paper. The typewriter writes on an infinite tape that can be moved left or right one symbol at a time at the press of a key. The user may see the current symbol or overwrite it with a new symbol by the press of a key. There is a blackboard on the wall where the list of quintuples is written. The user uses his current state of mind and the current symbol to select the applicable quintuple, and then he performs the actions dictated by the quintuple. He repeats this process until it is over. This human is doing the same calculation as if he were carrying out the original effective method.

Now imagine you could endow the typewriter with the ability to read the symbols and track its internal state by itself. The quintuples could be mechanized, allowing the typewriter to carry out the calculation all by itself, without the help of an actual human. What you get when you do this is a Turing machine.

The Evidence for the Church-Turing Thesis

Now we have closed the loop. Effective methods are reduced to Turing machines. But by definition a Turing machine is an effective method because the human typing on the keyboard of the idealized typewriter might just as well use pencil and paper. Therefore the equivalence between effective methods and Turing machines is a two-way street.

Turing machines can be translated into recursive arithmetic with Gödel numbers. Recursive arithmetic can be translated into lambda-calculus with the method we already saw. Turing found a set of rules for a Turing machine that can do the beta reduction algorithm of lambda-calculus. The cycle is complete. All three definitions of computable functions may be reduced to each other. The argument in favor of the Church-Turing thesis is now complete.

Quite a lot of evidence of the Church-Turing Thesis has been found beyond what has been presented here. An article on the Church-Turing Thesis on the Turing Archive summarizes it as follow:

Much evidence has been amassed for the 'working hypothesis' proposed by Church and Turing in 1936. Perhaps the fullest survey is to be found in chapters 12 and 13 of Kleene (1952). In summary: (1) Every effectively calculable function that has been investigated in this respect has turned out to be computable by Turing machine. (2) All known methods or operations for obtaining new effectively calculable functions from given effectively calculable functions are paralleled by methods for constructing new Turing machines from given Turing machines. (3) All attempts to give an exact analysis of the intuitive notion of an effectively calculable function have turned out to be equivalent in the sense that each analysis offered has been proved to pick out the same class of functions, namely those that are computable by Turing machine. Because of the diversity of the various analyses, (3) is generally considered to be particularly strong evidence. Apart from the analyses already mentioned in terms of lambda-definability and recursiveness, there are analyses in terms of register machines (Shepherdson and Sturgis 1963), Post's canonical and normal systems (Post 1943, 1946), combinatory definability (Schonfinkel 1924, Curry 1929, 1930, 1932), Markov algorithms (Markov 1960), and Godel's notion of reckonability (Godel 1936, Kleene 1952).

This same article also makes a point of what the Church-Turing thesis does not say. The thesis says Turing machines are equivalent to computations made by human using effective methods. It doesn't say they are equivalent to any possible computation made by a machine. The reason is that the idealized typewriter argument doesn't discuss equivalence with other machines, and therefore we don't have a basis to reach such a conclusion.

The article provides a list of published articles where mathematicians have constructed abstract computing machines that are not equivalent to Turing machines. These calculations cannot be done by a human and are not effective methods. Nobody has constructed a physical device that implements any of these machines. It is unknown whether you can build a physical computing device that can perform a computation a Turing machine can't do.

The Working Principle of the Modern Computer

What if we design an effective method that directs the idealized human to read the quintuples of a Turing machine and do as they tell him to do? What if we make a Turing machine out of this method? You get a universal Turing machine. Here "universal" means that it can do the work of all other Turing machines.

There are two lists of quintuples here. The universal Turing machine uses a list of quintuples to define its activity like any other Turing machine. If you supply this machine with a tape where the quintuples of another machine are written, the universal machine will emulate the behavior of the other machine.

This is a major discovery in the history of computing as the Stanford Encyclopedia of Computing points out:

Turing's construction of a universal machine gives the most fundamental insight into computation: one machine can run any program whatsoever. No matter what computational tasks we may need to perform in the future, a single machine can perform them all. This is the insight that makes it feasible to build and sell computers. One computer can run any program. We don't need to buy a new computer every time we have a new problem to solve. Of course, in the age of personal computers, this fact is such a basic assumption that it may be difficult to step back and appreciate it.

This is the point where computation theory stopped being the exclusive province of mathematicians and became relevant to engineers. The question was how do you build an actual universal Turing machine?47 Today we know the answer. We use electronics. The tape corresponds to the computer memory. The table of quintuples corresponds to the CPU instruction set. The CPU itself has the ability to change states. The Turing machine you want to emulate is programming data that you load in memory for execution. When we assemble the pieces, we have a modern digital computer.

When making an actual computer, engineers had to cross the line that separates the abstract world of the mathematician from reality. The Turing machine is an abstraction suitable to provide a foundation. Do you remember what we said about foundations when discussion Peano arithmetic? They are not meant to be used directly. You need to build an edifice on top of them to obtain a usable result. The engineers had to build the edifice.

Moving a tape one symbol at the time is inefficient. You need to be able to get the information directly. You need to access memory randomly.

There are many types of storage devices with different characteristics: RAM, ROM, Flash, magnetic disks. optical disks etc. You need to mix and match them as required. Where the Turing machine needed one storage device, the tape, in real life computer need many.

The Turing machine can do only one operation: it writes one symbol on the tape based on the appropriate quintuple. This is too rudimentary for real-life computing. An actual computer needs an elaborate instruction set.

You get the idea. The Turing machines provided the starting point but engineers had to elaborate on the concept. The result is what is now known as the Von Neuman architecture.48

Why do we still think computers are Turing machines after these engineering changes? The answer depends on if you look at an intensional definition or an extensional definition. Mathematicians didn't stop working and reducibility didn't go away. They studied the mathematical effect of changes in the architecture of the Turing machine.49 What if the machine has more than one tape? What if the tapes are replaced with random access memory? The answer is these machines can be reduced to Turing machines. You don't bring into existence new computations by doing these changes. From an extensional perspective the modified machines perform the same computations as a plain Turing machine.

There is one difference that has an effect. The idealized human never runs out of time and paper. This is why the Turing machine has an infinite tape. But in the real world, there are physical limits. Memory is not infinite, and time is limited. This restricts the computations that are possible. A real-life computer is an approximation of a Turing machine. Whether or not the calculations it carries out corresponds to one performed by a Turing machine depends on whether or not this calculation will fit within the physical constraints. If it doesn't fit, the calculation will not be carried out to completion even though the Turing machine has the theoretical ability to do it. But when a computation fits within the constraints, it is always the same as one that is done by a Turing machine.

The Nature of Software

Let's recapitulate. A universal Turing machine is one that, when given the quintuples of another Turing machine, will do as instructed. This principle is implemented in an actual computer as follows:

The tape of a Turing machine corresponds to the memory of a computer.

The read/write capability of a Turing machine corresponds to the memory bus of the computer.

The changing states of the Turing machine corresponds to the changing states in the CPU electronics.

The quintuples of a universal Turing machine correspond to the CPU electronics.

The quintuples of the other Turing machine correspond to the computer program.

The last bullet is an important one for patent law. It says software is the information you need to give to the universal Turing machine to make it perform the desired computation. This simple fact has consequences.50

Software is data. This is because information is synonymous with data. When you look at a physical computer, software is stored in a data file like any other form of data. It is loaded in memory and stored in exactly the same chips of RAM as data. The CPU manipulates the software with the same instructions as it uses to manipulate other forms of data. The fact that software is data is the mathematical principle that makes it possible to build and sell modern digital computers.

Software is abstract. This is a consequence of how Turing machines are constructed. A computer program is equivalent to the work of an idealized human being carrying out an effective method. By definition this work is an abstraction written down on paper. Another way to look at it is that a computer program is equivalent to some Turing machine. If we apply the discoveries of Alan Turing, we never need to physically build this Turing machine. It is sufficient to find a description of it and pass it to a universal Turing machine. The only device we need to actually build is the universal Turing machine. The computer is the universal Turing machine. The software is a description of an abstract machine that is never physically implemented.

Software is mathematics. This is because all the instructions the CPU is able to execute are mathematical operations. Physical computers are implementations of universal Turing machines. Universal Turing machines are part of mathematics. Therefore, whatever they do is mathematics. Another way to look at it is software is the description of some abstract Turing machine. Since all Turing machines are part of mathematics software is mathematics

Software is discovered as opposed to invented. This is a consequence of the fact that software is abstract and software is mathematics. You can't invent abstract ideas and mathematics. You can only discover them.

What is Not Computable?

Computable functions are defined through an exercise in reducibility. Whatever is reducible to a Turing machine is computable. The flip side of the question is, what cannot be reduced to a Turing machine?

Alan Turing discussed this question in his doctoral thesis.51 He studied what he called an oracle machine, which is a Turing machine augmented with the capability to ask a question to an oracle when the computation reaches a certain point. Then the oracle answers the question and the computation resumes, taking this information into consideration.

The oracle could be anything.52 For example, imagine our idealized human typing on his idealized typewriter doing calculations about sunspots. Every now and then, he picks up the phone and asks an astronomer the percentage of the surface of the sun that is currently covered with sunspots. The astronomer looks into the telescope and provides the answer. Then our idealized human resumes typing on the idealized typewriter carrying out the rest of the computation. The astronomer is the oracle. Observing the sun is not a computation even if the result of the measurement is a number. Therefore the whole process including the astronomical observation is not something that could be done by a Turing machine alone.

A Turing oracle machine brings the concept of computations relative to the oracle. This means part of the process is a computation and part is not. It is possible to draw a line between the two by sorting out what is the oracle and what is the Turing machine.

Consider that instead of a human being typing at the typewriter, we have a computer that interfaces directly to the telescope. It takes photographs of the sun and computes the surface of the sunspots from the photos. This computer is doing mathematical calculations relative to the photographs, but taking the picture is not a mathematical operation. This kind of integration with non-computational devices is where the boundaries of computation theory lie.

This very issue has been contemplated by the Supreme Court of the United States in two celebrated cases, Parker v. Flook and Diamond v. Diehr.

In Parker v. Flook the court ruled on how mathematical algorithms should be treated by patent law:

Respondent's process is unpatentable under § 101 not because it contains a mathematical algorithm as one component, but because once that algorithm is assumed to be within the prior art, the application, considered as a whole, contains no patentable invention. Even though a phenomenon of nature or mathematical formula may be well known, an inventive application of the principle may be patented. Conversely, the discovery of such a phenomenon cannot support a patent unless there is some other inventive concept in its application.

In Diamond v. Diehr the court looked at the flip side of the situation:

In contrast, the respondents here do not seek to patent a mathematical formula. Instead, they seek patent protection for a process of curing synthetic rubber. Their process admittedly employs a well-known mathematical equation, but they do not seek to preempt the use of that equation. Rather, they seek only to foreclose from others the use of that equation in conjunction with all of the other steps in their claimed process. These include installing rubber in a press, closing the mold, constantly determining the temperature of the mold, constantly recalculating the appropriate cure time through the use of the formula and a digital computer, and automatically opening the press at the proper time. Obviously, one does not need a "computer" to cure natural or synthetic rubber, but if the computer use incorporated in the process patent significantly lessens the possibility of "overcuring" or "undercuring," the process as a whole does not thereby become unpatentable subject matter.

Why Lawyers Need to Know Computation Theory

What is a mathematical formula when implemented in digital electronics? What is an application of the formula as opposed to the formula itself? These questions have confounded lawyers for decades for good reason. These are hard questions. But they are the questions they have to answer when software patents are discussed.

For this type of question, guidance from mathematicians is appropriate. This is why patent lawyers should learn computation theory. This is especially important in software because the relevant mathematical concepts are hard. It took decades for the best mathematicians of the first half of the twentieth century to figure them out. Their findings are among the greatest intellectual achievements of their time. It is appropriate to stand on the shoulders of giants because this is not the sort of thing anyone can figure out by himself merely by pondering dictionary definitions and thinking things through. But many courts have been trying to do just that in software patents cases. For example consider this sentence from In re Alappat:

We have held that such programming creates a new machine, because a general purpose computer in effect becomes a special purpose computer once it is programmed to perform particular functions pursuant to instructions from program software.

In a single sentence the court tosses out the fundamental principle that makes it possible to build and sell digital computers. You don't need to create a new machine every time you perform a different computation; a single machine has the capability to perform all computations. This is what universal Turing machines are doing.

In the 1940s, the ENIAC used to be programmed by physically reconnecting circuits. This design was awkward. Engineers took pains to eliminate this need and designed the next generations of computers by using the ideas of universal Turing machines as guidance. The explicit goal was to make sure you don't need to make new circuitry, much less make a new machine, when you program the computer. This history is documented on the Stanford Encyclopedia of Philosophy.53 See also this article on Groklaw.

How could the court make such a holding? It is because it didn't know about computation theory. And I suspect it didn't know because no one ever put an explanation of computation theory into the court record.

Let's look at another example. This one is from Application of Walter D. BERNHART and William A. Fetter. It discusses a patent on a software algorithm to plot lines based on Cartesian coordinates. In this case, several issues where computation theory is relevant show very clearly. The first one is about determining what is a method made of mental steps:

Looking first at the apparatus claims, we see no recitation therein of mental steps, nor of any element requiring or even permitting the incorporation of human faculties in the apparatus. These claims recite, and can be infringed only by, a digital computer in a certain physical condition, i. e., electro-mechanically set or programmed to carry out the recited routine. The claims also define the invention as having plotting means for drawing lines or for illustrating an object. When such functional language is used in a claim, 35 U.S.C. § 112 states that "such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof." The specification here mentions only mechanical drafting machines. The claims therefore cover, under section 112, only such mechanical drafting machines and their equivalents. We know of no authority for holding that a human being, such as a draftsman, could ever be the equivalent of a machine disclosed in a patent application, and we are not prepared to so hold in this case. Accordingly, we think it clear that applicants have not defined as their invention anything in which the human mind could be used as a component. ... In the case now before us, the disclosure shows only machinery for carrying out the portrayal process. In fact it is the chief object of the invention to eliminate the drudgery involved in a draftsman's making the desired portrayals. Accordingly, a statutory process is here disclosed. Looking then to method claim 13, we find that it in no way covers any mental steps but requires both a "digital computer" and a "planar plotting apparatus" to carry it out. To find that the claimed process could be done mentally would require us to hold that a human mind is a digital computer or its equivalent, and that a draftsman is a planar plotting apparatus or its equivalent. On the facts of this case we are unwilling so to hold. We conclude that the method defined by claim 13 is statutory, and its patentability must be judged in light of the prior art.

This Bernhart case is really fascinating. It is a concentrate of some of the hardest issues where computation theory is relevant. Here is another extract that speaks to what constitutes a symbol:

Nor are the "printed matter" cases, cited by the board, supra, controlling as to these apparatus claims either on the facts or in principle. On their facts, those cases dealt with claims defining as the invention certain novel arrangements of printed lines or characters, useful and intelligible only to the human mind. Here the invention as defined by the claims requires that the information be processed not by the mind but by a machine, the computer, and that the drawing be done not by a draftsman but by a plotting machine. Those "printed matter" cases therefore have no factual relevance here.

Let's take this from another angle. What kind of files contain instructions that cause the computer to perform calculations? Here are a few examples. This list is by no means complete:

Executable binaries: these files contain instructions that are executed directly by the CPU. These files are the result of the compilation of source code. When legal cases discuss the programming of a computer, they think almost exclusively of these files. I never see an analysis of how the case reasoning would extend to other kinds of instructions.

Interpretable code: these files contain programs in human-readable form that may also be executed directly by the computer without an intermediate compilation step. This requires a special program called an interpreter that reads the code and executes the instruction. Several programming languages are interpreted. This court didn't consider the possibility that the patented algorithm could be implemented in such a language. Would such code infringe? But in this case it is an invention made of a particular arrangement of characters that is intelligible to the human mind. Would this change the assessment of the court regarding the factual relevance of the "printed matter" cases?

PDF files: these files contain instructions to be executed by a virtual machine that are specialized to the visual rendering of written documents. How do the printed matter court cases apply to patents written to algorithms made of the execution of such instructions?

The distinction between symbols readable by humans and machine-readable information is artificial. Computation theory teaches us that symbols remain symbols whether they are ink on paper or recorded in electronic form. In both cases they can be used to implement the same abstract processes.

Let's take this same idea from still another angle. Suppose a court determines that software in its executable binary form is patentable because it is turned into something physical. This is an argument we see frequently. For purpose of this discussion it doesn't matter if the court thinks the physical something is a machine, a process or whatnot. The question is what happens to the patentability of code in forms other than executable binaries?

The court doesn't want to patent data like a novel or music. Therefore the physical something that makes binaries patentable will not be sufficient to make data patentable.

The court doesn't want to patent textual data in a PDF file no matter how novel and nonobvious the text may be. Such files are made of executable instructions. Therefore it must take more than the presence of novel and non obvious instructions that are executed to make something patentable.

How about patenting source code in textual form? This will raise free speech issues. How can you discuss code if you can't write the text without infringing? The text of source code should not be patentable.

How about code written in an interpreted language? This code is never compiled into executable binaries. the only executable binary is the interpreter. This is the same code that runs for all programs. It does not implement the patented algorithm and by itself it doesn't infringe. The interpreter implements the language. The text of the source code (which is not patentable) is given to the interpreter. The interpreter reads the text and does what it is told. All data is manipulated by the interpreter on behalf of the program. On what basis should this code be patentable? The physical something that is the basis of patentability never comes into existence. The interpreter doesn't infringe. The data is not patentable. The source code is not patentable. Executing instructions doesn't make patentable something that isn't.

This analysis is meant to communicate that software is about manipulating symbols. It is not about physical circuitry or processes. Whatever perceived circuitry is used to claim software is patentable, programmers have the means to make an end-run around this alleged circuitry. It is possible to arrange the manipulation of the symbols in such a way that the circuitry that is the alleged basis of patenting software never comes into existence.

We have already encountered a theme like this when we discussed the beta reduction algorithm of lambda-calculus. One algorithm can do the work of every other algorithm. We can make sure the method patents are never infringed because the claimed methods are never physically implemented. Interpreted languages are built on this basis.

From the same Benrhart case, there is a section that is reminiscent of the "software makes a new machine" from Alappat:

There is one further rationale used by both the board and the examiner, namely, that the provision of new signals to be stored by the computer does not make it a new machine, i. e. it is structurally the same, no matter how new, useful and unobvious the result. This rationale really goes more to novelty than to statutory subject matter but it appears to be at the heart of the present controversy. To this question we say that if a machine is programmed in a certain new and unobvious way, it is physically different from the machine without that program; its memory elements are differently arranged. The fact that these physical changes are invisible to the eye should not tempt us to conclude that the machine has not been changed. If a new machine has not been invented, certainly a "new and useful improvement" of the unprogrammed machine has been, and Congress has said in 35 U.S.C. § 101 that such improvements are statutory subject matter for a patent. It may well be that the vast majority of newly programmed machines are obvious to those skilled in the art and hence unpatentable under 35 U.S.C. § 103. We are concluding here that such machines are statutory under 35 U.S.C. § 101, and that claims defining them must be judged for patentability in light of the prior art.

Machines have moving parts. Moving the moving parts as they are intended to be moved is not an invention. What are the moving parts of a computer? This information is in the definition of the Turing machine. The moving parts include the ability to read and write symbols in memory. More, the operating principle of the computer requires that programs are stored in memory and are modifiable by the computer as the execution of the program progresses. The changes to the memory are not an improvement of the computer. They are the operation of the computer as it is intended to be operated. The knowledge of the Turing machine enables one to make this determination.

What if we hold the other view? Suppose we agree with this court that writing something in memory is a patentable improvement of the computer? Then we get absurdities. Modern computers change the state of their memory billions of times per second. Do we have billions of improvements per second, each of them being patentable subject matter? How does one perform due diligence to avoid patent infringement then?

Looking only at memory changes that concern programming instructions doesn't help much. Such instruction is data held in memory and may be changed as fast as the speed of memory permits.

Let's take a look at some of the instances where the actions of a user may cause a new program to be loaded in memory.

You click on a menu item.

You visit a web site and Javascript is embedded in the page.

You visit a web site and a Flash or Java applet is downloaded.

Your operating system downloads a security patch.

You use the add/remove application menu option of your Ubuntu system to get application from the web.

You open a PDF file. The PDF document is a series of instructions written for execution by a virtual machine.

You open a text document that contains macros.

You open a spreadsheet that contains formulas. Solving formulas in a spreadsheet counts as an algorithm, doesn't it?

You use a program that generates and executes (or interprets) code on the fly. This includes many circumstances when your program accesses a relational database using SQL, because this language does not specify the algorithm in the code. The algorithm is determined dynamically by the language interpreter. Since SQL is the dominant standard for databases this situation covers most programs that use databases.

From a user perspective these activities are the normal operation of the computer. It is not reasonable to ask the users to conduct a patent search for due diligence against infringement every time they perform one of these actions. This cannot be the correct interpretation of patent law.

Software is Mathematics

The preceding section was a series of case studies explaining why lawyers should know computation theory. Now it is time to return to the questions that started this section. What is a mathematical formula when implemented in digital electronics? What is an application of the formula as opposed to the formula itself? We already know the answer. Software is mathematics. Everything that is executed by a CPU is mathematics. The boundaries of mathematics lie in the interaction of the computer with the outside world.

This idea is a hard one to understand unless you know computation theory. When you don't know computation theory, all sorts of misconceptions occur. For example patent attorney Gene Quinn has written an argument against software being mathematics. Computation theory is conspicuous by its absence in this article. He is an eloquent man. Someone who doesn't know computation theory will likely be convinced. If you bring computation theory into the picture, however, his argument falls apart.

I think this quote summarizes the core of his position:

I think the disagreement between me and my critics is largely due to the way patent law views mathematics, which seems admittedly quite different from the way that many computer programmers view mathematics. It is impossible to argue that software code does not employ mathematical influences, because it does. I think Dijkstra's explanation that a "programmer applies mathematical techniques" is completely true and accurate. Having said this, the fact that mathematical techniques are employed does not as a matter of fact mean that software is mathematical. Under the US patent laws you cannot receive a patent that covers a mathematical equation or a law of nature. You can certainly use mathematical equations and laws of nature as the building blocks to create something that is new and nonobvious that is patentable. So even if software used mathematical equations there would be no prohibition against the patenting of software under a true and correct reading of the US patent laws.

The error is that the mathematics of computation theory is not the mathematics Quinn is pointing to. You can't understand what is meant by "software is mathematics" unless you understand computation theory.

What does it take for the law to find as a matter of fact that software is mathematical? Do the opinions of mathematicians and computer scientists trained in mathematics counts? Or only lawyers?

Since Quinn mentioned Dijkstra, let's explain what are the mathematical methods he advocated.54 He wanted the programmers to use Hoare logic to formally prove the programs before they are compiled and run for the first time. As we have said, Hoare logic is a formal system wrapped around an imperative programming language. Here is what Hoare himself says in the introduction of the seminal paper where he first proposed his logic:

Computer programming is an exact science in that all the properties of a program and all the consequences of executing it in any given environment can, in principle, be found 