« previous post | next post »

According to Jennifer Viegas, "New Written Language of Ancient Scotland Discovered", Discovery News, 3/31/2010:

Once thought to be rock art, carved depictions of soldiers, horses and other figures are in fact part of a written language dating back to the Iron Age.

The ancestors of modern Scottish people left behind mysterious, carved stones that new research has just determined contain the written language of the Picts, an Iron Age society that existed in Scotland from 300 to 843.

The "new research" is described in Rob Lee, Philip Jonathan, and Pauline Ziman, "Pictish symbols revealed as a written language through application of Shannon entropy", Proceedings of the Royal Society A, in press.

The authors use an argument of the same general shape as the one used by Rao et al. in arguing for linguistic structure in inscriptions from the Indus Valley civilization ("Conditional entropy and the Indus Script", 4/26/2009). They calculate certain statistical measures for some known writing systems, for things that are clearly not writing, and for the inscriptions in question, and they find that in terms of these measures, the inscriptions look more like the writing sytems than like the non-writing sets.

The trouble with this form of argument is that it's heavily dependent on the particular combination of statistical measure and comparison sets that we choose. And the argument becomes especially unconvincing when there's an obvious alternative choice of comparison set — generated by a simple random process — that would fall squarely on the side of the line that allegedly identifies "written language".

That's what Cosma Shalizi, Richard Sproat and I (independently) argued in the case of the Rao et al. article (see here for details). And it looks to me as if the Lee et al. article on Pictish has got similar problems.

Let's take the first part of their argument, summarized in their Figure 2:

This shows convincingly that the Pictish petroglyph symbols are not drawn randomly from a uniform distribution. But symbols in writing systems are hardly the only phenomena whose statistical distribution is non-uniform. For example, if we plot the outcome of rolling 7 6-sided dice on the same graph, we get the red x shown below:

There are 36 possible outcomes (sums from 7 to 42), so that the x-axis value for the dice will be log2(36), or about 5.17. And these outcomes are not equally likely, since there's only one way to roll 7, but 7 ways to roll 8, 28 ways to roll 9, etc. — so if we calculate the entropy of the 36 probabilities, we get about 4.22.

I certainly don't mean to suggest that the ancient Picts generated their petroglyphs using throws of 7d6. The point is just that any process that is (in effect) sampling from a distribution with the right number of alternative outcomes (about 35 to 40) and the right amount of non-uniformity (around 20% relative redundancy for unigrams) will look similar on this measure. And we don't need to look very far to find a (non-writing-related) random process with these characteristics.

Lee et al. go on to repeat the same form of argument using a number of more sophisticated (or at least more complicated) measures. I haven't evaluated these in detail. But the way that they present Fig. 2 is not a good sign, in my opinion.

Permalink