« previous post | next post »

I've heard many interesting papers here at AACL 2009. Here's one of them: Bridget Jankowski, from the University of Toronto, "Grammatical and register variation and change: A multi-corpora perspective on the English genitive". She was kind enough to send me a copy of her slides, from which I've taken (most of) the graphs below.

In order to study the history of choices like "Ontario's government" (s-genitive) vs. "the government of Ontario" (of-genitive), she created two small historical corpora, sampling Maclean's magazine and the Hansard transcripts of debates of the Ontario Provincial Legislature at three time points: 1906, 1956, and 2006. She picked three authors or three speakers from each source at each time point. All of the speakers and authors were men aged 30-60 at the time of the sample.

Her first result is a replication of the observation that the s-genitive has been gaining ground:

Compare, for example, this figure from Hinrichs and Szmrecsanyi, "Recent changes in the function and frequency of standard English genitive constructions: a multivariate analysis of tagged corpora", English Language and Linguistics 11(3): 437–474, 2007:

Jankowski then broke the trends down further by coding the possessors as

Human: a student’s schoolwork, Mrs. Hale’s reaction Organizations (animate “collectivities of humans which display some degree of groupidentity”): the local school board’s ruling; the federal government’s plan Places: Canada’s foreign language press, Ontario’s roads, the streets of Rome, the raw edge of the world, the people of this American continent Inanimate objects, activities, units of time, states

This made it clear that the increase in use of s-genitives has been especially strong in the case of organizations, and even stronger in the case of places:

Her category 4 ("Inanimate objects, activities, units of time, states") was realized overall with of-genitive 96% in Maclean’s and 99% in Hansard. So her results are generally consistent with Otto Jespersen's observation in A Modern English Grammar on Historical Principles: Part VII (1949) that

In poetry and in higher literary style, the genitive of lifeless things is used in many cases where of would be used in ordinary speech. […] During the last few years the genitive of lifeless things has been gaining ground, (especially among journalists)…

but only if "lifeless things" is taken to include organizations and places, and not "inanimate objects, activities, units of time, states".

She also compared her results to data from a corpus of conversational speech collected recently in Toronto, using speaker age to create two "apparent time" collections comparable to the 1956 and 2006 samples. This suggests that in the spoken language, human possessors have almost always gotten the s-genitive, consistently across time, while inanimate possessors (in this graph including her categories 3 and 4) have consistently gotten the of-genitive:

On this analysis, the increase in s-genitives for human possessors in Maclean's magazine makes the journalistic prose more and more like the spoken language; but the parallel increase in s-genitives for inaminate possessors makes the journalistic prose less and less speech-like.

Her presentation also considered the effect of the length of the possessor (a shorter possessor is more likely to take an s-genitive) and the possessum ("shorter possessum will be more likely to take an of-genitive and so appear first in the construction, while a longer possessum is more likely to take the s-genitive"), as well as other relevant features such as "lexical density":

and "thematicity":

and did a multivariate analysis of all the various factors taken together.

You'll have to read her (I trust forthcoming) paper to learn how it all comes out — I have a 6:40 a.m. plane to catch — but I hope that this much is enough to convince you that there's a rich and interesting pattern of variation to be untangled here. It certainly convinced me.

And it also increased my general feeling that the time is right for the application of automatic or semi-automatic methods of analysis (here in assigning her four categories of possessors, in determining the lengths of the possessor and possessum constituents, in counting local phrases co-referential with the possessor, etc.) to the study of syntactic variation across time, genre, register and so on. Because she had to annotate everything by hand, Jankowski's sample was fairly small — 50K words of Maclean's, and 100K words of Hansards. With automatic or semi-automatic annotation, she could look at larger collections with denser time samples of more sources, and easily add other features, like various word and phrase frequencies, grammatical role and phrasal position of the whole genitive construction, etc.

Permalink