« previous post | next post »

In today's newspapers and magazines:

"Newborns cry in their native language".

"Babies cry with an accent within the first week of life".

"Babies cry wiith the same 'prosody' or melody used in their native language by the second day of life".

"Newborn babies mimic the intonation of their native tongue when they cry".

"French babies cry in French, German babies cry in German and, no doubt, the wail of an English infant betrays the distinct tones of a soon-to-be English speaker".

The science behind these statements is in a paper released yesterday: Birgit Mampe, Angela D. Friederici, Anne Christophe and Kathleen Wermke, "Newborns' Cry Melody Is Shaped by Their Native Language", Current Biology, in press. Does it support these journalistic generalizations? Before reading the paper, I give ten-to-one odds against, on the general principle that journalistic statements involving generic plurals are almost never true. Mesdames et messieurs, faites vos jeux. Let's spin the wheel.

Mampe et al. recorded and analyzed the crying of 30 French babies (11 female, 19 male; mean age 3.1 days, range 2–5 days) and 30 German babies (15 female, 15 male; mean age 3.8 days, range 3–5 days). They recorded 2500 cries, of which they selected 1254 "simple cries containing single rising-and-then-falling melody arcs" — an average of 21 per French baby, range 3–54; an average of 18 per German baby, range 10–38.

Cries of the type selected had pitch contours that the authors schematize like this:

They found the location of the pitch peak in each cry, and expressed it in terms of normalized time, where the overall duration of the cry is set to 1; and the same analysis was performed for amplitude.

It was by no means true that French babies cried in one consistent way, and German babies in another. As the experimenters write:

Only simple cries containing single rising-and-then-falling melody arcs were analyzed. These cry types were selected because they predominate in the crying of healthy newborns. These melody arcs can be assigned to four basic melody types: (1) quickly rising and slowly decreasing melody: left-accentuated type—“falling contour”; (2) slowly rising and quickly decreasing melody: right-accentuated type—“rising contour”; (3) symmetrical rising-and-then-falling melody: symmetric type; and (4) a relatively stable fundamental frequency with a rising or falling trend: plateau type. […]

Newborns of both groups generated all four basic melody types typical at that age. This observation reflects a general aptitude for generating melodies with varying contours and explains the observed partial data overlap in Figure 1.

They don't actually give the proportions of French and German cries in their four categories, but they do quantify their results as follows:

[A] marked difference in the median values of t norm (F0max) points to group-specific preferences for produced melody contours (French group, 0.60 s; German group, 0.45 s). The arithmetic means of t norm (F0max) were significantly different in French (0.58 ± 0.13 s) and German (0.44 ± 0.15 s) newborns (Mann-Whitney test, p < 0.0001). Whereas French newborns preferred to produce rising melody contours, German newborns more often produced falling contours. These results show a tendency for infants to utter melody contours similar to those perceived prenatally. A significant difference was also found for the intensity maxima of melody arcs [ t norm (Imax): mean 0.59 ± 0.12 versus 0.47 ± 0.12 for French group versus German group; Mann-Whitney test, p < 0.001].

Regular readers of Language Log, and other sensible people, will now be estimating the effect size — and this is a fairly large one. A difference of 0.58-0.44 = 0.14 in time-normalized F0 peak location, with a pooled standard deviation of 0.14, is an effect size of d = 1.0. That is, the mean values for the two groups are separated by one standard deviation. And the same is true for the difference in mean amplitude peak locations.

Despite this large and impressive difference, the authors indulge in a bit of rhetorical exaggeration by presenting the following picture of "typical" French and German cries:

The F0 peak of the "typical" German cry is at about 0.25, seconds in a vocalization whose expiration phase lasts about 0.95 seconds, for a normalized peak time of about 0.26. This is more than a standard deviation below the reported German mean value of 0.44, and thus is hardly "typical". The F0 peak of the French cry is at 0.68 seconds, in a cry whose expiration lasts about 0.9 seconds, for a normalized time of about 0.75. Again, this is more than a standard deviation greater than the French average of 0.58.

This technique of cherry-picking atypical "typical" values for rhetorical effect is fairly common in scientific writing, but it should alert us to the fact that the authors are perhaps not trying quite as hard as they should to disprove their own hypothesis.

Though I'd bet that their findings are basically valid, there are a few reasons to wonder whether equally large differences would be found in a replication.

For one thing, the babies were recorded in different places, apparently under different circumstances, by different experimenters. Specifically, the French babies were recorded in the Cochin Hospital in Paris. The authors write that "For the German newborns, existing digital sound files of cries that were recorded as part of the German Language Development Study (http://glad-study.cbs.mpg.de) were used", but they don't say whether these were also hospital recordings or were recorded at home. (So it's possible that the headlines should have read "Babies in hospital cry differently from babies at home"…) In any case, the German recordings were apparently made by a different set of people.

Given the procedures described in the paper, there are also several ways that unconscious experimenter bias might creep into the analysis phase. The process of choosing roughly half of the recorded cries as suitable for analysis is one place that this might happen. In doing this sort of selection, there are always a lot of marginal cases, where the decision as to whether an observation fits the criterion (here, whether the cry is a "single rising-and-then-falling melody arc") could go either way. These marginal cases can be influenced by the experimenter's perception that a particular exemplar is "typical" or "not typical" of expected patterns.

And another place where experimenter expectations could affect the data is in marking the time-pints of amplitude and pitch maxima. This was apparently done by hand on narrow-band spectrograms, and again was apparently not done "blind" — that is, the experimenters knew whether they were coding French babies or German babies. And again, there would have been many cases with flat-topped peaks, or with multiple peaks, where choosing the location of the maximum point would have been a somewhat ambiguous task.

Finally, the choice about where the cry begins and (especially ends) is often ambiguous, and the national difference in mean F0-peak location was only about 140 milliseconds.

As a matter of intellectual hygiene, it's a good idea either to make all such decisions "blind" to the independent variables of interest, or else to use automated techniques of measurement with parameters chosen before the data is analyzed. (It's a good idea to publish the data as well, but that's another story.)

So let's sum up. This is a really interesting and suggestive study, which needs to be replicated to be entirely convincing. It finds a fairly large difference in the distribution of pitch and amplitude profiles of French and German neonates, with the French babies tending to produce cries with later peaks that the German babies. The effect size in the reported data is a large one (d=1.0), which is large enough that (if the estimates generalize to new data) a randomly selected French or German baby would be correctly classified as French or German, on the basis of one cry, about 2/3 of the time.

The authors attribute this difference to the typical differences between French and German intonation patterns, through exposure in the womb. It's certainly true that the proportion of final rises in French speech is much greater than in German. But French non-terminal intonational patterns — the ones that generally involve final rises — are not at all like the pitch contour of the "typical" French neonate shown in this paper. The adult phrases will typically involve a large rise on the first or second syllable — to a peak that will often be the highest point in the phrase — with a subsequent fall and then a rise at the very end of the very last syllable, so that there is no final fall. (Today's breakfast hour is over for me, but I promise to give more information on this question within the next few days.)

So if the differences in this experiment's data are caused by pre-natal experience with adult intonations, the explanation must be a somewhat indirect one. And it's not clear to me why the authors reject without explicit consideration the hypothesis that the babies were responding to a few days of French vs. German "motherese".

Oh, and the journalistic generalizations were false as an expression of the authors' findings. Of course.

Permalink