Is TIOBE Fatally Flawed?

update: As Bogdy mentions in the comments, my reasoning here was based on false assumptions. It still seems clear that ranking APL above Haskell, along with other anomalies, disqualifies TIOBE for any serious purpose, at least past the top ten or so languages. My rankings should be ignored, though.

During a debate at work about using Haskell for a project, a coworker pointed out that Haskell is ranked #41 on the TIOBE. On further investigation, things look really fishy. Common interpretations of TIOBE include the amount of “community”, “buzz”, or “excitement” around a language. By none of these standards can APL reasonably edge out Haskell. I dug further.

Summary of findings: the TIOBE is severely broken. It is falling victim to the fact that search engines grossly overestimate their number of results. For example, if I search Google for “haskell programming”, as TIOBE does, the resulting page proudly estimates 44,500 results. However, if I click through the results, I hit the end of the list after only 652. Nice for marketing Google, perhaps, but it seems the estimate was rather poor. Similar things happen with other languages.

TIOBE, despite using several search engines, seems to correlate well with Googles estimated (i.e., phony) number of results. It correlates very badly with the actual number of results. Here’s my corrected TIOBE list, built only from the top 50 languages in the original list. In order to comply with Google’s terms of service, I painstakingly did this by hand; so I didn’t go any further.

There are some things that are initially surprising; but some thought indicates they may be reasonably expected. Languages near the top tend to be those that are somewhat old (more time to write about them) or commonly used – past or present – in business and/or the academic world. That’s because these languages have a reason to have a lot of web pages written about them. One example: Prolog clearly isn’t a commonly used language nor one with a lot of community, but it’s taught by just about every computer science department in the world’s “programming languages” intro courses, because they feel better including something besides imperative and functional languages. Hence, it’s been written about a lot. One can see the effect of the “big community” effect though, if only in languages that appear above where you’d expect to see them.

I also split Lisp/Scheme into Lisp and Scheme separately, and dropped Natural because Googling for “natural programming” turned up more irrelevant results than relevant ones.

Without further delay, the “Chris” update to the TIOBE list.

Fortran COBOL C Logo JavaScript MATLAB Prolog RPG ML Pascal Lingo Scheme LISP REXX C++ Forth Smalltalk Icon SAS ABAP Tcl IDL FoxPro Haskell Bash Java CL APL ColdFusion Delphi Perl BASIC Objective C Erlang Lua Ada Awk ActionScript VBScript Ocaml D Dylan C# Python Ruby Transact-SQL PHP LabView S-lang PL/SQL