Ocean of Awareness

Mon, 28 May 2018

But to my mind, though I am native here

And to the manner born, it is a custom

More honor’d in the breach than the observance.

Chomsky's "Three Models" paper

Important papers produce important mistakes. A paper can contain a great many errors, and they will have no effect if the paper is ignored. On the other hand, even the good methods of a great paper can go badly wrong when its methods outlive the reasons for using them.

Chomsky's "Three Models" paper is about as influential as a paper can get. Just 12 pages, it's the paper in which the most-cited scholar of our time first outlined his ideas. Even at the time, linguists described its effect on their field as "Copernician". Bringing new rigor into what had been seen as a "soft" science, it turned lots of heads outside linguistics. It belongs on anyone's list of the most important scientific papers ever.

Given its significance, it is almost incidental that "Three models" is also the foundation paper of computer Parsing Theory, the subject of these blog posts. Chomsky does not consider himself a computer scientist and, after founding our field, has paid little attention to it. But in fact, the Chomskyan model has been even more dominant in computer parsing than in Chomsky's own field of linguistics.

"Three Models" places Chomksy among the great mathematicians of all time. True, the elegance and rigor of Chomsky's proofs better befit a slumming linguist than they would a professional mathematician. But at its heart, mathematics is not a technical field, or even about problem-solving -- at its most fundamental, it is about framing problems so that they can be solved. And Chomsky's skill at framing problems is astonishing.

A brilliant simplification

In 1956, Chomsky had a new approach to linguistics, and wanted to prove that his approach to language did things that the previous approach, based on finite-state models, could not. ("Finite-state" models, also known as Markov chains, are the predecessors of the regular expressions of today.) Brilliantly, Chomsky sets out to do this with extremely minimal definition of what a language is.

By a language then, we shall mean a set (finite or infinite) of sentences, each of finite length, all constructed from a finite alphabet of sysbols. If A is an alphabet, we shall say that anything formed by concatenating the symbols of A is a string in A. By a grammar of the language L we mean a device of some sort that produces all of the strings that are sentences of L and only these.

Yes, you read that right -- Chomsky uses a definition of language which has nothing to do with language actually meaning anything. A language, for the purposes of the math in "Three Models", is nothing but a list of strings. Similarly, a grammar is just something that enumerates those strings. The grammar does not have to provide any clue as to what the strings might mean.

For example, Chomsky would require of a French grammar that one of the strings that it lists be

(42) Ceci n'est pas une phrase vraie.

But for the purposes of his demonstration, Chomsky does not require of his "grammar" that it give us any guidance as to what sentence (42) might mean.

Chomsky shows that there are English sentences that his "grammar" would list, and which a finite-state "grammar" would not list. Clearly if the finite-state grammar cannot even produce a sentence as one of a list, it is not adequate as a model of that language, at least as far as that sentence goes.

Other considerations

Chomsky shows that there is, in fact, a large, significant class of sentences that his "grammars" can list, but which the finite-state grammars cannot list. Chomsky presents this as very strong evidence that his grammars will make better models of language than finite-state grammars can.

In addition to simplifying the math, Chomsky has two other good reasons to avoid dealing with meaning. A second reason is that semantics is a treacherously dangerous field of study. If you can make your point, and don't have to drag in semantics, you are crazy to do otherwise. Sentence (42), above, is just one example of the pitfalls that await those tackle who semantic issues. It echoes a famous Magritte and translates to "This is not a true sentence".

A third reason is that most linguists of Chomsky's time were Bloomfieldians. Bloomfield defined language as follows:

The totality of utterances that can be made in a speech community is the language of that speech-community.

Bloomfield says "totality" instead of "set" and "utterances" instead of "strings", but for our purposes in this post the idea is the same -- the definition is without regard to the meaning of the members of the set.

Bloomfield's omission of semantics is not accidental. Bloomfield wanted to establish linguistics as a science, and for Bloomfield claiming to know the meaning of a sentence was dangerously close to claiming to be able to read minds. You cannot base your work on mind-reading and expect people to believe that you are doing science. Bloomfield therefore suggested avoiding, totally if possible, any discussion of semantics. Most readers of Chomsky's paper in 1956 were Bloomfieldians -- Chomsky has studied under a Bloomfieldian, and originally was seen as one. By excluding semantics from his own model of language, Chomsky was making his paper maximally acceptable to his readership.

Semantics sneaks back in

But you did not have to read Chomsky's mind, or predict the future, to see that Chomsky was a lot more interested in semantics than Bloomfield was. Already in "Three Models", he is suggesting that his model is superior to its predecessors, because his model, when an utterance is ambiguous, produces multiple derivations to reflect that. Even better, these multiple derivations "look" like natural representations of the difference between meanings. These insights, which dropped effortlessly out of Chomsky's grammars, were well beyond what the finite-state models were providing.

By itself, Chomsky's argument, that his grammars were better because they could list more sentences, might have carried the day. With the demonstration that his grammars could do more than list sentences, but also could proivde insight into the structure and semantics of sentences, Chomsky's case was compelling. Young linguists wanted theoretical tools with this kind of power and those few older linguists not convinced struggled to find reasons why the young linguists could not have what they wanted.

In later years, Chomsky made it quite clear what his position was:

[...] it would be absurd to develop a general syntactic theory without assigning an absolutely crucial role to semantic considerations, since obviously the necessity to support semantic interpretation is one of the primary requirements that the structures generated by the syntactic component of a grammar must meet.

The statement of meanings is therefore the weak point in language-study, and will remain so until human knowledge advances very far beyond its present state. In practice, we define the meaning of a linguistic form, wherever we can, in terms of some other science.

Compare this to Bloomfield:It is easy to see why linguists found Chomsky's expansion of their horizons irresistable.

The tradition

Given the immense prestige of "Three models", it is unsurprising that it was closely studied by the pioneers of parsing theory. Unfortunately, what they picked up was not Chomsky's emphasis on the overriding importance of semantics, but the narrow definition of language that Chomsky had adopted from Bloomfield for tactical purposes. In the classic Aho and Ullman 1972 textbook, we have

A language over an alphabet Σ is a set of strings over an alphabet Σ. This definition encompasses almost everyone's notion of a language.

If this "encompasses" my notion of a language, it does so only in the sense that an avalanche encompasses a skier.

From 1988, thirty years after Chomsky, here is another authoritative textbook of Parsing Theory defining "language":

A set V is an alphabet (or a vocabulary) if it is finite and nonempty. The elements of an alphabet V are called the symbols (or letters or characters) of V. A language L over V is any subset of the free monoid V*. The elements of a language L are called sentences of L.

The language is now that of abstract algebra, but the idea is the same -- pure Bloomfield.

The problem

Interesting, you might be saying, that some textbook definitions are not everything they could be, but is there any effect on the daily practice of programming?

The languages human beings use with each other are powerful, varied, flexible and endlessly retargetable. The parsers we use to communicate with computers are restrictive, repetitive in form, difficult to reprogram, and prohibitively hard to retarget. Is this because humans have a preternatural language ability?

Or is there something wrong with the way we go about talking to computers? How the Theory of Parsing literature defines the term "language" may seem of only pedantic interest. But I will argue that it is a mistake which has everything to do with the limits of modern computer languages.

What is the problem with defining a language as a set of strings? Here is one example of how the textbook definition affects daily practice. Call one grammar SENSE :

SENSE ::= E E ::= E + T E ::= T T ::= T * P T ::= P P ::= number

And call another grammar STRING :

STRING ::= E E ::= P OP E OP ::= '*' OP ::= '+' P ::= number

If you define a language as a set of strings, both SENSE and STRING recognize the same language. But it's a very different story if you take the intended meaning as that of traditional arithmetic expressions, and consider the meaning of the two grammars.

SENSE recognizes the associativity and precedence of the two operators -- the parse tree it produces could be used directly to evaluate an arithmetic expression and the answer would always be correct. The parse tree that STRING produces, if evaluated directly, will very often produce a wrong answer -- it does not capture the structure of an arithmetic expression. In order to produce correct results, the output of STRING could be put through a second phase, but that is the point -- STRING left crucial parts of the job of parsing undone, and either some other logic does the job STRING did not do, or a wrong answer results.

It is much easier to write a parser for STRING than it is for SENSE . Encouraged by a theory that minimizes the difference, many implementations attempt to make do with STRING .

But that is not the worst of it. The idea that a language is a set of strings has guided research, and steered it away from the most promising lines. How, I hope to explain in the future.

Comments, etc.

The background material for this post is in my Parsing: a timeline 3.0, and this post may be considered a supplement to "Timelime". To learn about Marpa, my Earley/Leo-based parsing project, there is the semi-official web site, maintained by Ron Savage. The official, but more limited, Marpa website is my personal one. Comments on this post can be made in Marpa's Google group, or on our IRC channel: #marpa at freenode.net.

Footnotes

posted at: 19:59 | direct link to this entry

§ § §