It is sad, but inevitable, that the TIOBE index of programming language “popularity” (sic) would be gamed.

Once you start measuring something, and advertising the results, people with an interest in particular outcomes naturally start to look for ways to influence those results. (It’s the Observer Effect writ large.)

The fact that TIOBE’s methodology, which I’ve discussed previously here and here, is simplistic makes it particularly open to gaming. Anyone, or any community, with access to many web pages can simply add the magic phrase “foo programming”, where foo is their language of choice, to get counted.

And it seems that’s exactly what the Delphi community did at the end of 20081. They made a concerted effort and it seems to have paid off. (I’d be very interested in hearing about similar behaviour in other language communities.)

Is that behaviour gaming? The author of the post who exhorted is readers to “Update your Delphi related blog or site to say Delphi programming on every page in visible text (update the template). Stand up and be counted. You can make a difference!” doesn’t seem to think so, as he also said “I am not suggesting we game the system, just that we help TCPI get an accurate count.”

An accurate count of what, exactly? That’s always been the fundamental question with TIOBE. It should be obvious that most web pages that talk about “delphi programming” wouldn’t actually contain the phrase “delphi programming”. The same applies to every other language. That’s the paradox at the heart of the TIOBE Index. And yet, somehow, TIOBE seem to think that counting pages containing the phrase “delphi programming” lets them claim that:

The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.

Eh? How can they possibly defend that claim? Certainly their documented definition doesn’t support it, or even mention it.

I presume they’re thinking that CV’s, job postings, and adverts are most likely to contain the magic phrase. It should be obvious, again, that the number of CV’s, job postings, and adverts referring to a given programming language would naturally only be a small fraction of the total web pages referring to the language. (And only distantly related to the “popularity” of a language.) Yet that “small fraction” is what TIOBE measure and make bold claims about.

The fact that TIOBE is making a comparison based on a small fraction makes it even more troubling that TIOBE CEO Paul Jansen appears to support language communities changing their pages to include the magic “foo programming” phrase. In an email quoted on delphi.org he says:

For your information, I think your action has already some effect. Tonight’s run shows that Delphi is #8 at this moment. There is a realistic chance that Delphi will become “TIOBE’s Language of the Year 2008″

He’s endorsing the artificial insertion of the magic phrase. Clearly this distorts the TIOBE index in favour of language communities that infect as many pages as possible with the magic phrase.

That sure seems like an invitation to game the system! It’s likely to lead to other language communities doing the same, and so to further devaluation of the TIOBE Index.

(For alternatives to TIOBE you could look at sites like http://www.langpop.com/, James Robson’s Language Usage Indicators, or my popular comparison of job trends blog post with ‘live’ graphs.)

I have, on a couple of occasions, used the phrase “perl programming” in blog posts for my own amusement, and linked it to my original TIOBE or not TIOBE – “Lies, damned lies, and statistics” post. I haven’t suggested that others do the same. TIOBE’s endorsement of artificial insertion changes that. Now it seems like we’re going to get a dumb “race to the bottom” to see which language community controls the most web pages.

If, as a result, the TIOBE Index is affected significantly, then I simply hope they’ll drop their pretentious claims and state clearly exactly what they’re counting, how they’re doing it, and what it means: not much.

1. Many thanks to Barry Walsh for his blog post that alerted me to this.