Update 8/21: I’ve gotten a lot of feedback about issues with these rankings from comments, and have tried to address some of them here. The data there has been updated to include confidence intervals.

———————————————————————————————————

A few weeks ago I described how I used Git commit metadata plus the Rapleaf API to build aggregate demographic profiles for popular GitHub organizations (blog post here, per-organization data available here).

I was also interested in slicing the data somewhat differently, breaking down demographics per programming language instead of per organization. Stereotypes about developers of various languages abound, but I was curious how these lined up with reality. The easiest place to start was age, income, and gender breakdowns per language. Given the data I’d already collected, this wasn’t too challenging:

For each repository I used GitHub’s estimate of a repostory’s language composition. For example, GitHub estimates this project at 75% Java.

For each language, I aggregated incomes for all developers who have contributed to a project which is at least 50% that language (by the above measure).

I filtered for languages with > 100 available income data points.

Here are the results for income, sorted from lowest average household income to highest:

Language Average Household Income ($) Data Points Puppet 87,589.29 112 Haskell 89,973.82 191 PHP 94,031.19 978 CoffeeScript 94,890.80 435 VimL 94,967.11 532 Shell 96,930.54 979 Lua 96,930.69 101 Erlang 97,306.55 168 Clojure 97,500.00 269 Python 97,578.87 2314 JavaScript 97,598.75 3443 Emacs Lisp 97,774.65 355 C# 97,823.31 665 Ruby 98,238.74 3242 C++ 99,147.93 845 CSS 99,881.40 527 Perl 100,295.45 990 C 100,766.51 2120 Go 101,158.01 231 Scala 101,460.91 243 ColdFusion 101,536.70 109 Objective-C 101,801.60 562 Groovy 102,650.86 116 Java 103,179.39 1402 XSLT 106,199.19 123 ActionScript 108,119.47 113

Here’s the same data in chart form:

Most of the language rankings were roughly in line with my expectations, to the extent I had any:

Haskell is a very academic language, and academia is not known for generous salaries

PHP is a very accessible language, and it makes sense that casual / younger / lower paid programmers can easily contribute

On the high end of the spectrum, Java and ActionScript are used heavily in enterprise software, and enterprise software is certainly known to pay well

On the other hand, I’m unfamiliar with some of the other languages on the high/low ends like XSLT, Puppet, and CoffeeScript. Any ideas on why these languages ranked higher or lower than average?

Caveats before making too many conclusions from the data here:

These are all open-source projects, which may not accurately represent compensation among closed-source developers

Rapleaf data does not have total income coverage, and the sample may be biased

I have not corrected for any other skew (age, gender, etc)

I haven’t crawled all repositories on GitHub, so the users for whom I have data may not be a representative sample

That said, even though the absolute numbers may be biased, I think this is a good starting point when comparing relative compensation between languages.

Let me know any thoughts or suggestions about the methodology or the results. I’ll follow up soon with age and gender breakdowns per language in a similar fashion.