The Linked In Languages (Popularity?) Index

Abstract

I have never been a fan of query / search engine based programming languages popularity index so I created my own without using any search data

Programmers are competitive people. In the programming languages world there are many religious wars —such as the braces one— and many competitions. One of the most famous form of competition is the one involving the popularity of the programming languages. The form of satisfaction involved in such competition is to loudly state that my favourite languages is "bigger" —in popularity!— than the one of the other fellow programmers.

The most famous index of popularity is the TIOBE index which has been created by TIOBE, a company involved into software quality tool and processes. According to the explanation that they give on the web page, the ranking / scores are creates analysing the amount of results returned for the query «+"language" programming», as described here.

On the same line, and to try to address some of the weaknesses of the approach above, the PYPL Index has been created. This last one is based on a different query with respect to the TIOBE one, as described in their FAQ:

“C programming” is used much more than “PHP programming,” because PHP does not need the qualifier. Tutorial is a word used frequently by developers learning any new language: it makes a good leading indicator. What is a “python tutorial,” if not a tutorial on the programming language?

Someone already pointed out that there are some language names that can be tricky to find with a "tutorial" query, e.g., "go tutorial" could well refer to the board game with the same name.

My more general concern about the query based methodologies is that they are just tracking

1. The amount of hot air around something — i.e., a lot of people want to know about some new language and copycat pages pop out (do search engines dedup them?)

2. The difficulty in finding resources for the language — e.g., if I type "perl" in the italian version of Google, at position 2 and 4 there are two tutorials in Italian about Perl and I do not need to specity "tutorial"

3. Resources could already be well known without the need of writing a lot of new ones, e.g., http://www.cplusplus.com and https://en.cppreference.com/w/ for C++

In my opinion, a better way to understand which languages are used, or if you prefer popular, among the programmers, it could be possible to do that analysing what people state to know in their CVs. While it is true that many people "inflate" their CV, the majority of them —speaking from personal experience here— actually write what they really know and what they can have interview about. As it is impossible to sift through millions of CVs, I decided to use the People Search of Linked In. For this reasons, I refer to this as the Linked In Languages Index.

Notice that also other methodology got proposed, e.g., counting the languages in GitHub, job offers around etc., but none of them, in my opinion, catches that because, for instance, ① lines of code advantage verbose languages, ② the projects on GitHub measure the free-time of people who can code a lot outside working hours, and ③ the job postings only the need of something more than its popularity even by similarity (if you look for a Java programmer, a C++ or C# one could need little training to be OK). A good list of, as usual, can be found on Wikipedia.

Methodology (and Caveats)

The following ones are the steps that I used to obtain the numbers in the next section

1. Pointing to the search at https://www.linkedin.com/search/results/people/.

2. Use the «Filters» to get results only from the following Industries: «Information Technology and Services», «Computer Software», «Internet». This step, when closing the «Filters» panel gives me a baseline of 28,843,942 people (I will refer to this a the "Empty Query" or EQ from now on).

3. Search for each one of the following terms in the search box and keep track of the total number of results: «Ada»♠, «AWK», «Bash», «C»♠, «C#», «C++», «Clojure», «COBOL», «D»♠, «Delphi», «Erlang», «Fortran», «Go», «Java», «Javascript», «Kotlin», «Ksh», «LabView», «Lisp», «Matlab», «Objective-C», «Pascal»♠, «Perl», «Perl6», «PHP», «Prolog», «Python», «R»♠, «REXX», «Ruby»♠, «Rust», «Scala», «SQL», «Swift»♠, «VBA», «Visual Basic», and «Zsh».

For the caveats part:

• The languages marked with ♠ were matching quite a bit of middle names, family names etc. and the number of results could be inflated by those false positives.

• The best approach could be done only from inside Linked In matching only the «Skills» field of the profiles and applying some form of normalization.

• Perl6 was included only because of the write up that triggered me writing this one: How Viable is Perl?

• If some language is missing, that is a disgrace but I don't think I care more than enough 😊

Index for July 2018

The results are shown and sorted by "popularity". The value in percentage is as a percentage of the Empty Query and those percentages are not supposed to sum up to 100%.

Language Number of Results % of EQ SQL 5,526,715 19.1607% Java 4,273,672 14.8165% Javascript 3,392,551 11.7617% C 2,629,608 9.1167% C++ 2,291,531 7.9446% PHP 1,934,397 6.7064% C# 1,899,394 6.5851% Python 1,355,091 4.6980% Go 758,820 2.6308% R 656,641 2.2765% Visual Basic 542,413 1.8805% Perl 438,712 1.5210% D 377,904 1.3102% Matlab 368,685 1.2782% Ruby 289,329 1.0031% Objective-C 234,419 0.8127% Bash 214,957 0.7452% COBOL 202,698 0.7027% VBA 175,719 0.6092% Swift 160,846 0.5576% Delphi 122,093 0.4233% Scala 97,916 0.3395% Pascal 79,763 0.2765% Fortran 45,299 0.1570% LabView 40,671 0.1410% Ada 34,866 0.1209% Prolog 28,837 0.1000% Kotlin 25,159 0.0872% Lisp 22,525 0.0781% REXX 21,493 0.0745% AWK 18,584 0.0644% Ksh 14,279 0.0495% Erlang 13,755 0.0477% Clojure 12,967 0.0450% Rust 9,018 0.0313% Zsh 1,966 0.0068% Perl6 48 0.0002%

Some of my considerations follow:

• For «Visual Basic» and «VBA» I would have expected more but it is possible that it is bacause ① a lot of people aoutside the selected industires uses them more than the programmers and ② it could have been referred in other ways, e.g., «VB.net», «VB6» etc. all of them returning results.

• «SQL» is the top one, with little surprise as it is used in conjunction with almost all the other languages or even alone. I am surprised by the low number but that could be explained by the presence of other ways of formulating it, e.g., «PL/SQL» with 824,222 results, «SQL Server» with 3,066,329 results or «Oracle» with 2,497,223 (which should be closer to «PL/SQL»!).

• Aside from «Java» in very high positions (for TIOBE at first and for PYPL at second, at the moment of writing), the index seems to have its own ranking different from the other two. PYPL puts on the top spot «Python», while that is in position 9 in Linked In and TIOBE places «Javascript» in position 9, way lower than what I would have expected and found on Linked In.

• «Python», given my entourage of friends, seems a little too low to me. Could that be referred in other ways? I am not really a «Python» person. The same consideration goes to «Objective-C» and «Swift» as I thought that iOS development could have been more popular.

• «Perl6» is the only language that permits you to know easily all the community 😊

• I expected «COBOL» to be even higher in the list given the amount of code written in such a language still around.

Conclusions

The method proposed above is just yet-another-programming-languages-popularity index based on a different metric: the number of people claiming a certain piece of knowledge on Linked In.

The methodology is far from being perfect but it could be refined, maybe with the help of someone in Linked In.

The list is intuitively correct but the numbers seem, in general a little low in comparison with the EQ (this needs some more investigation).

It was fun and I enjoyed it 😊