Page of PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS (oxfordre.com/linguistics). (c) Oxford University Press USA, 2020. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice). date: 25 September 2020

Noam Chomsky

Summary and Keywords Noam Avram Chomsky is one of the central figures of modern linguistics. He was born in Philadelphia, Pennsylvania on December 7, 1928. In 1945, Chomsky enrolled in the University of Pennsylvania, where he met Zellig Harris (1909–1992), a leading Structuralist, through their shared political interests. His first encounter with Harris’s work was when he proof-read Harris’s book Methods in Structural Linguistics, published in 1951 but completed already in 1947. Chomsky grew dissatisfied with Structuralism and started to develop his own major idea that syntax and phonology are in part matters of abstract representations. This was soon combined with a psychobiological view of language as a unique part of the mind/brain. Chomsky spent 1951–1955 as a Junior Fellow of the Harvard Society of Fellows, after which he joined the faculty at MIT under the sponsorship of Morris Halle. He was promoted to full professor of Foreign Languages and Linguistics in 1961, appointed Ferrari Ward Professor of Linguistics in 1966, and Institute Professor in 1976, retiring in 2002. Chomsky is still remarkably active, publishing, teaching, and lecturing across the world. In 1967, both the University of Chicago and the University of London awarded him honorary degrees, and since then he has been the recipient of scores of honors and awards. In 1988, he was awarded the Kyoto Prize in basic science, created in 1984 in order to recognize work in areas not included among the Nobel Prizes. These honors are all a testimony to Chomsky’s influence and impact in linguistics and cognitive science more generally over the past 60 years. His contributions have of course also been heavily criticized, but nevertheless remain crucial to investigations of language. Chomsky’s work has always centered around the same basic questions and assumptions, especially that human language is an inherent property of the human mind. The technical part of his research has continuously been revised and updated. In the 1960s phrase structure grammars were developed into what is known as the Standard Theory, which transformed into the Extended Standard Theory and X-bar theory in the 1970s. A major transition occurred at the end of the 1970s, when the Principles and Parameters Theory emerged. This theory provides a new understanding of the human language faculty, focusing on the invariant principles common to all human languages and the points of variation known as parameters. Its recent variant, the Minimalist Program, pushes the approach even further in asking why grammars are structured the way they are. Keywords: competence, grammar, nativism, philosophy of language, phrase structure

1. Introduction This article will present an overview of some of Noam Chomsky’s most important contributions to linguistics. The presentation will mostly focus on a set of themes suitable for organizing Chomsky’s ideas and scholarly impact. We will also provide a bit of history and briefly touch on ways in which his ideas have developed across time. Chomsky’s intellectual contributions and history are just as much the intellectual history of the field of generative grammar. Obviously, many scholars have contributed to this field, making it a collective enterprise and not a single man’s work. Nevertheless, Chomsky has had a unique impact, as his ideas and work have shaped the development far more than any other single individual. For that reason, and given that the topic of this article is Noam Chomsky, our focus will be on him in what follows, although the reader should bear in mind that many ideas have been initiated, developed, or modified by a large cohort of scholars. The focus in this essay will be on Chomsky’s contributions to the study of syntax. Early on he also did work on the sound systems of human language, most notably a ground-breaking book coauthored with Morris Halle (Chomsky & Halle, 1968). And Chomsky’s MA thesis was on the morphophonemics of Modern Hebrew (Chomsky, 1951). One caveat is in order: We will not explore Chomsky’s political views or any connection that there may or may not be between his linguistics and politics. For extensive discussion of this, see Smith and Allott (2015). This article is structured as follows. Section 2 provides some biographical information about Chomsky. In Section 3, we focus on Chomsky’s earliest work, namely his work on formal/mathematical models of natural language. Foundational issues regarding Chomsky’s approach to language are presented in Section 4.

2. Biographical Sketch Noam Avram Chomsky was born in Philadelphia, Pennsylvania on December 7, 1928. In 1945, Chomsky enrolled in the University of Pennsylvania, where he met Zellig Harris (1909–1992), a leading Structuralist, through their shared political interests. His first encounter with Harris’ work was when he proofread Harris’s book Methods in Structural Linguistics, published in 1951 but completed already in 1947. Chomsky grew dissatisfied with Structuralism and started to develop his own major idea that syntax and phonology are in part matters of abstract representations. This was soon combined with a psychobiological view of language as a unique part of the mind/brain. Chomsky spent 1951–1955 as a Junior Fellow of the Harvard Society of Fellows, after which he joined the faculty at MIT (Massachusetts Institute of Technology) sponsored by Morris Halle. Since then, MIT has been his intellectual home. He was promoted to full professor of Foreign Languages and Linguistics in 1961, appointed Ferrari Ward Professor of Linguistics in 1966, and Institute Professor in 1976. Although he has officially retired and become an Institute Professor Emeritus, Chomsky is still remarkably active, publishing, teaching, and lecturing across the world. In 1967, both the University of Chicago and the University of London awarded him honorary degrees, and since then he has been the recipient countless honors and awards. In 1988, he was awarded the Kyoto Prize in basic science, created in 1984 in order to recognize work in areas not included among the Nobel Prizes. These honors are all a testimony to Chomsky’s influence and impact in linguistics, analytic philosophy, and cognitive science more generally over the past 70 years. See Chomsky’s public lecture on analytic philosophy in Oslo, Norway, in 2011.

3. The Early Years: Formal Grammars As mentioned, Chomsky was Zellig Harris’s student and thus he knew the details of structural linguistics. His own first works were also attempts to extend Harris (1951), e.g., in Chomsky (1951). Harris introduced the concept of a transformation, but for Harris, transformations were relations between sentences. An active sentence would be transformed into a passive, just to give one example. Chomsky soon discovered that there are data that such a method cannot capture. Chomsky (1957, 1963) demonstrates this and presents an alternative: sentences have an abstract hierarchical structure that is generated via phrase structure grammars and transformations are relations between abstract structures. This alternative is the main topic of Chomsky’s two most famous and groundbreaking works: The Logical Structure of Linguistic Theory (LSLT) (Chomsky, 1955) and Syntactic Structures (Chomsky, 1957). LSLT was completed in 1955, while Chomsky was a junior fellow of the Society of Fellows at Harvard University. The 1975 version contains a comprehensive introduction that also explains how the manuscript developed. Both LSLT and Syntactic Structures contain very little explicit discussion of what Chomsky later became famous for and which we will discuss below, namely an innate language faculty. Rather, they are concerned with developing a formal framework for describing the syntactic structure of human languages. Chomsky (1956, 1963) describes various classes of formal grammars and organizes them into a hierarchy, today known as the Chomsky hierarchy or sometimes the Chomsky–Schützenberger hierarchy (Chomsky & Schützenberger, 1963). Research since, including Chomsky (1955, 1957), has mostly been devoted to developing the class which is suitable for human languages. In his work, Chomsky demonstrated how context-free phrase-structure (PS) grammars can be applied to language. PS grammars consist of: (1) A procedure for how a sentence is generated, a derivation, then consists of a series of lines. The first line has to start with a designated initial symbol, followed by lines that can be rewritten according to F. The procedure/derivation stops when there are no more symbols that can be rewritten. An illustration is given in (2). (2) These rules give us the derivation in (3) among several other “equivalent” derivations. (3) Constituent structure is captured in PS grammars by introducing nonterminal, i.e., unpronounced symbols, which is a novelty in Chomsky’s work. Later, in Chomsky (1965), rules such as the last two in (2) were called lexical insertion rules as they inserted lexical material into the resulting phrase marker. Chomsky presented a range of evidence in favor of a sentence having more than just a superficial structure closely resembling the way in which it is pronounced, but that there also is an abstract representation which can potentially be very different from the superficial one. In addition, there can be intermediate structures between the two. Throughout Chomsky’s work, this aspect concerning levels of representation is fundamental.

4. Foundational Work and Ideas Whereas Chomsky’s earliest work was concerned with the formal nature of grammars, he soon turned towards more general issues. Chomsky (1959), a review of Verbal Behavior by B. F. Skinner, focuses on issues regarding language use and the creative ability all humans have when it comes to language. The review attracted significant attention, not least because it pointed out fundamental problems with behaviorism. Chomsky argues that language acquisition happens so quickly that there is simply no way a stimulus–response mechanism can account for the knowledge that a young child has. Furthermore, such a mechanism does not do justice to the linguistic creativity that children display, namely that we can use our language ability to create new words and sentences that we have not heard before. Rather, what is needed is a nativist perspective on language, whereby humans have a biological blueprint for developing language. The task for the linguist is then to investigate this ability from a linguistic point of view. Questions concerning language acquisition and the nature of humans’ linguistic competence quickly became Chomsky’s main interest. 1965 and 1966 saw the appearance of two very important publications in Chomsky’s scholarship. Aspects of the Theory of Syntax (henceforth, Aspects) was published in 1965, and in 1966 he published Cartesian Linguistics (recently reissued as Chomsky, 2009). Whereas Aspects mainly presents an overall framework within which to think about language, Cartesian Linguistics is arguably the best nontechnical presentation of Chomsky’s overall philosophy of language. In this latter book, Chomsky traces aspects of the history of his approach to language, drawing connections to Descartes and the Port-Royal tradition. He puts forward a strong defense of a nativist approach to language, that is, arguing that humans are born with a special ability to acquire language. This accounts for the great speed with which humans come to possess language, it accounts for their linguistic creativity (making “infinite use of finite means,” to use a much-cited phrase from Wilhelm von Humboldt which Chomsky often has emphasized), and it accounts for certain aspects of the structure of human languages that children immediately latch onto. Chomsky also makes the point that whereas we can seek to understand the system underlying human language, we probably will never be able to fully understand why we come to say the things we do, as the latter relates to issues of free will that we still do not understand. Bracken (1984) and McGilvray’s introduction to Chomsky (2009) provide discussions of the significance of Cartesian Linguistics, whereas Salmon (1969) offers an important critical discussion. Returning to Aspects, chapter 1 in this book introduces a number of important concepts in Chomsky’s approach to language. The general goal of the chapter is to define a distinct, scientific project for linguistics. It is “scientific” because its goal is to explain what underlies the linguistic abilities of an individual, and it is “distinct” because human language appears to have special properties. In developing this project, a number of notions are proposed. Let us review them briefly. One distinction is the one between competence and performance. Chomsky argues that linguists need to study competence, i.e., the grammatical tacit knowledge that any native speaker has of his/her language(s). Competence can only be studied through its outputs, i.e., performance, which can be any expression, be it spoken, written, signed, or nonnatural experimental data. The latter is used to probe more subtly and precisely for specific aspects of competence while controlling for as many outside factors as possible. One such method is to ask a native speaker to judge sentences via what is now called acceptability judgments. Much later, in Chomsky (1986a), the distinction is refined and now Chomsky distinguishes between E-language and I-language, E for external and I for internal, individual, and intensional. I-language is the object of study in linguistics according to Chomsky, whereas E-language is the sum of totally externally manifested I-language, i.e., all performances of linguistic knowledge regardless of the individual speaker who has produced it. The intensional part of I-language highlights the fact that the goal is to investigate the nature of the computational mental system making it possible for humans to speak, sign, and understand an unlimited number of new sentences. An important methodological issue was also introduced in Aspects: the distinction between acceptability and grammaticality (and correspondingly unacceptability and ungrammaticality). Acceptability involves a judgment made by a native speaker concerning how natural a given set of sentences seem. Typically, a speaker will be presented with two contrasting sentences and the job is to rate them. For example, a native speaker of English will, when comparing Norbert likes cookies and Norbert cookies likes, say that the former is acceptable whereas the latter is unacceptable. Grammaticality, on the other hand, involves a claim made by the linguist as to whether or not the grammar allows a given structure or not. In the present example, the linguist will conclude that the structure underlying Norbert likes cookies is grammatical in English, whereas the structure underlying Norbert cookies likes is ungrammatical in English. Linguists often speak of “grammaticality judgments”, although strictly speaking, this is wrong per Chomsky (1965). Adequacy is a crucial notion in Aspects. Chomsky separates it into descriptive adequacy and explanatory adequacy. A grammar that is descriptively adequate is one that correctly describes the set of grammatical sentences and correctly rules out the ungrammatical sentences. As such, descriptive adequacy is a basic requirement for any grammatical analysis. Even scholars who do not adopt the generative approach, but who, for instance, seek to analyze linguistic production as witnessed in corpora, need to account for the fact that certain patterns do not occur and that the grammar of English is different from that of Japanese. Chomsky, however, puts the bar higher by emphasizing that the goal of linguistic theory should be to achieve explanatory adequacy. This is defined as follows: To the extent that a linguistic theory succeeds in selecting a descriptively adequate grammar on the basis of primary linguistic data, we can say that it meets the condition of explanatory adequacy. That is, to this extent, it offers an explanation for the intuition of the native speaker on the basis of an empirical hypothesis concerning the innate predisposition of the child to develop a certain kind of theory to deal with the evidence presented to him. (Chomsky, 1965, pp. 25–26) This means that the analysis also should account for how a child could acquire the given grammatical system within the short time span that he or she does. Aspects also introduces a revised formalism for the description of natural language, to which we turn next.

5. Grammatical Architecture, 1965–1980 In Chomsky (1955, 1957), PS grammars only construct monoclausal structures. These structures can be merged into e.g., embedded clauses by way of a mechanism called generalized transformations. The recursive component is thus to be found in transformations. In Chomsky (1965), this is changed and recursion is incorporated into “the base.” A rule such as (4) was added to analyze sentences such as (5). (4) (5) With a rule such as (4), the PS component now has a recursive character, and, in this model, generalized transformations are eliminated. Another related innovation in Chomsky (1965) is the notion of Deep Structure (later called D-structure). D-structure and recursion in the base serve two purposes in the theory: (i) They make the overall theory simpler, and (ii) in connection with a principle of cyclic application of transformations, they rule out certain derivations that do not appear to occur. The earlier 1955 model had no constraints on the interaction between the generalized transformations that combine separate phrase markers and the singulary transformations that manipulate both simple phrase markers and the complex ones that result from generalized transformations. Thus, there could be operations on embedded sentences after they have been embedded. But no such derivations seem to be needed for the description of human languages. In Chomsky (1965), such derivations are excluded by the elimination of generalized transformations and the imposition of cyclicity on (singulary) transformational derivations. Importantly, D-structure also played a role in Chomsky’s approach to how syntax relates to semantics. He develops the following model: The syntactic component consists of a base that generates deep structures and a transformational part that maps them into surface structures. The deep structure of a sentence is submitted to the semantic component for semantic interpretation, and its surface structure enters the phonological component and undergoes phonetic interpretation. The final effect of a grammar, then, is to relate a semantic interpretation to a phonetic representation—that is, to state how a sentence is interpreted. (Chomsky, 1965, pp. 135–136) Chomsky follows Katz and Postal (1964) in severely restricting the phrase structural information available for interpretation. Their slogan was that “transformations do not change meaning.” The model can be depicted as in (6), where Surface Structure is typically abbreviated as S-structure. (6) The framework was soon challenged by what became known as Generative Semantics. This approach built on Katz and Postal (1964) in arguing that meaning is represented by a more abstract representation than Chomsky’s D-structure (Lakoff, 1971) and that very powerful transformations worked to derive surface representations. Even within the Chomskyan approach, there were questions concerning D-structure being the sole locus of semantic interpretation. Already Chomsky (1957) observed that sentences containing quantifiers are interpreted partly based on the surface position of the quantifiers. Consider the examples in (7). (7) (7a) may be true at the same time as (7b) is false, for example in a case where one person in the room knows Japanese and Chinese, and another one Norwegian and Spanish. Chomsky (1965) acknowledges that (7) is problematic in a framework where D-structure is the input to semantic interpretation. He speculates that the difference may be due to discourse effects. However, it was soon shown that the problem is far more general, leading to a revised framework whereby both D-structure and S-structure contribute to semantic interpretation (Jackendoff, 1969; Chomsky, 1970b). This framework is known as the Extended Standard Theory (see also Chomsky, 1970a). Here D-structure only contributed information about grammatical relations, such as subject and object, whereas more or less all other aspects of meaning (scope, anaphora, focus, presupposition, etc.) are derived from S-structures. Another innovation in the Extended Standard Theory concerns a new encoding of transformations. For movement transformations leaving a gap, it was now suggested that this gap actually consists of a trace (Wasow, 1972; Chomsky, 1973). For all intents and purposes, this trace acts like a placeholder for the lexical content. Given traces, the motivation for D-structure as a level of representation is reduced, but it took some more time until it was eventually dissolved (Chomsky, 1995). Instead of the labels semantic and phonetic interpretation in (6), the former was labeled LF for “Logical Form” and the latter labeled PF for “Phonetic Form”. Crucially, both are grammatical levels of representation and not the actual semantic logical forms or the phonetic encoding. (8) This grammatical architecture became the cornerstone of what is known as Government and Binding, to which we turn next.

6. Principles and Parameters Theory, 1980–Today Chomsky and Lasnik (1977) were concerned with restricting the grammar so that it would rule out options that should not be available. A major problem with earlier models was that they let in far too many structures and rules that did not occur. Constraining the grammar is important in order to get closer to the goal of Aspects, namely to provide explanations rather than just descriptions. Only that way it is possible to account for language acquisition and how grammatical competence develops and reaches its target state. Following some ideas in Chomsky and Halle (1968), Chomsky and Lasnik argued that something along the lines of a theory of markedness should also apply to syntax, not just phonology. Concretely, they suggested a theory of core grammar with highly restricted options with a few choice points (parameters). Filters were the mechanism that accounted for constraints, and most of them applied to surface structures. However, some filters will have to be language-specific or even dialect-specific, such as blocking for to constructions in most dialects of English. (9) (10) (10) illustrates the surface filter in question. Chomsky (1981) improves on this framework by replacing language/dialect-specific and construction-specific rules with rules that are highly general and constrained by universal principles. This is the Principles and Parameters model. It represents “a radical break from the rich tradition of thousands of years of linguistic inquiry” (Lasnik & Otero, 2004, p. 207). This model proposes a solution to the fundamental problem of language acquisition by proposing that the language faculty consists of universal principles, and parameters that encode grammatical variation. The child, then, has to set the parameters for the language in question, which in the early days was argued to be a set of binary options—much like a “switchboard,” to use James Higginbotham’s metaphor. The assumption was that parameters linked several properties together where at least one property had to be easily observable. This way, by observing something easy (say, whether or not a language has null subjects like Spanish or Italian), you can set some other property that is harder to observe (say, whether or not the language obeys the that-trace filter, cf. Perlmutter, 1968; Chomsky & Lasnik, 1977; Rizzi, 1982). The principles were assumed to be universal and much work has gone into investigating the nature and format of these principles. Principles and Parameters Theory consists of two different models (Freidin, 2007; Lasnik & Lohndal, 2010, 2013). The first is Government and Binding (GB; Chomsky, 1981, 1986b; Chomsky & Lasnik, 1993) and the second is the Minimalist Program (MP; Chomsky, 1995, 2000a, 2005, 2007). We will briefly describe both of them. A fundamental aspect of GB, in addition to the incorporation of principles and parameters, is its modular architecture: Modules governing various parts of the grammar were postulated, and phenomena such as the passive were analyzed by recourse to interacting modules that work together to derive the properties of the passive. The modules were binding (largely concerned with anaphora), case, theta (argument structure), control (the construal of the missing embedded subject in, e.g., Mary tried __ to win), and bounding (locality of movement), with the relation of “government” applying across these modules (see Lasnik & Lohndal, 2010, for an accessible presentation). Notably, this approach denied the theoretical relevance of constructions; rather, constructions are epiphenomenal, as they follow from more basic and abstract properties of grammar. The basic architecture of GB is as depicted in (8) at the end of the previous section. During the late 1980s, questions started emerging concerning the levels in this model as D- and S-structure became less and less prominent in the theory. This suggests that just two levels are actually required levels of representation. What is required in order for language to relate sound to meaning is an interface with the articulatory-perceptual system (PF) and the conceptual-intentional system (LF). Conceptually, PF and LF enjoy a more privileged status than D- and S-structure in the theory. As such, there really has to be overwhelming empirical evidence justifying the latter two levels, which research concluded was no longer the case. Chomsky then returned to his original proposal from the 1950’s, with no D-structure and structure-building also being done by generalized transformations. A derivation starts out with a numeration, which is a selection of items from the lexicon. These lexical items are then inserted as the derivation proceeds, starting from the bottom, with argument structure and adding functional layers as need be. This, then, became the approach to grammar in the Minimalist Program, or just Minimalism, outlined in great detail in Chomsky (1995). The Minimalist Program pursues the hypothesis that language meets the requirements imposed by the external systems in a “perfect” way. The goal is to provide explanations for why the grammar has the structure and organization that it has, which Chomsky (2004) later dubbed going “beyond explanatory adequacy.” Essentially it is an extremely challenging why-question, seeking to provide a more fundamental understanding of the computational system for language. In the 2000s, this was contextualized in an important paper by Chomsky (Chomsky, 2005) where he says that there are three factors involved in understanding language: (i) the genetic component, (ii) experience from input, and (iii) principles not specific to the language system. The latter has become known as “third-factors,” and much research is going into understanding the properties of these third-factors (see Lohndal & Uriagereka, 2016). This research again connects to some of Chomsky’s earliest work, namely Aspects, where he says that many properties of the language faculty may follow from “principles of neural organization that may be even more deeply grounded in physical law” (Chomsky, 1965, p. 59). It should be noted that with Minimalism, the concept of parameter has changed quite significantly. Chomsky (1995) endorsed what Baker (2008) has labeled the Borer–Chomsky conjecture (due to Borer, 1984), whereby parameters are reduced to features on lexical and functional elements. Acquiring variation is thereby a question of acquiring any element of the lexicon. This shift has also been triggered by the empirical inadequacy of the view of parameters developed in GB (see Newmeyer, 2005, and Biberauer, 2008, for much discussion). Recently, a different view of parameters has emerged, one in which there are hierarchies of different types of parameters (see Biberauer & Roberts, 2012, 2016). Chomsky is still contributing to the theoretical development of Minimalism. His recent ideas revolve around the importance of labeling of phrases—as NP, VP, etc.—and its place in the architecture of the language faculty (Chomsky, 2013, 2015). Remarkably, even after more than 70 years, he is still setting the agenda in terms of defining important research questions and problems.