I just discovered the Github Archive, a dataset of Github events queryable using Google BigQuery. What fun! So I decided to count how many repositories have been created this year by language.

SELECT repository_language , count ( repository_language ) AS repos_by_lang FROM [ githubarchive : github . timeline ] WHERE repository_fork == "false" AND type == "CreateEvent" AND PARSE_UTC_USEC ( repository_created_at ) >= PARSE_UTC_USEC ( '2013-01-01 00:00:00' ) AND PARSE_UTC_USEC ( repository_created_at ) < PARSE_UTC_USEC ( '2013-08-30 00:00:00' ) GROUP BY repository_language ORDER BY repos_by_lang DESC LIMIT 100

The results:

Top 20 Languages for 2013

By # of repositories created on Github so far this year:

Rank Language # Repositories Created 1 JavaScript 264131 2 Ruby 218812 3 Java 157618 4 PHP 114384 5 Python 95002 6 C++ 78327 7 C 67706 8 Objective-C 36344 9 C# 32170 10 Shell 28561 11 CSS 17813 12 Perl 15412 13 CoffeeScript 11133 14 VimL 7857 15 Scala 6918 16 Go 6884 17 Prolog 5829 18 Clojure 4904 19 Haskell 4681 20 Lua 4048

Commentary

Hey, Clojure cracked the top 20! It’s neck-and-neck with Haskell, too.

The top 10 are no surprise at all, although there are definitely some specifics from Github’s early popularity with the Ruby crowd, and a general skew towards web languages.

The high positions of Shell and VimL are pretty odd, but can be explained by people putting their dotfiles on github.

Prolog is a big surprise here. If anyone can explain that, I’d be interested.

Maybe we could learn more if we had the 2012 rankings for the same period (Jan 1 - Aug. 30). So here are those:

SELECT repository_language , count ( repository_language ) AS repos_by_lang FROM [ githubarchive : github . timeline ] WHERE repository_fork == "false" AND type == "CreateEvent" AND PARSE_UTC_USEC ( repository_created_at ) >= PARSE_UTC_USEC ( '2012-01-01 00:00:00' ) AND PARSE_UTC_USEC ( repository_created_at ) < PARSE_UTC_USEC ( '2012-08-30 00:00:00' ) GROUP BY repository_language ORDER BY repos_by_lang DESC LIMIT 100

Top 20 in 2012

By # of repositories created on Github from Jan. 1 through Aug. 30, 2012

Rank Language # Repositories Created 1 Ruby 344825 2 JavaScript 296564 3 Java 265223 4 C 212393 5 PHP 173938 6 Python 173727 7 C++ 93764 8 Shell 72006 9 Perl 48620 10 C# 43665 11 Objective-C 41536 12 VimL 18077 13 Go 16224 14 CoffeeScript 15722 15 Scala 14262 16 Haskell 10402 17 Clojure 9748 18 Tcl 9633 19 Emacs Lisp 8567 20 Groovy 6973

I’m not sure if I trust the raw numbers here being so much less than in 2013, but the rankings are hopefully accurate.

Some highlights:

Perl appears to have suffered a drop in 2013 compared to 2012

Tcl appears out of nowhere in 2012. Maybe a quirk of the language recognition Github applies?

Groovy went away in 2013 (actually, dropped to 22)

Go was more popular than Scala in 2012, but less in 2013. I compare those two because I think people are using them to solve similar problems.

CSS showed up nowhere in 2012

Well, that’s all the analysis I care to do today, but I submit this data for discussion. Who else has opinions?

Oh, before I go:

The Full Results (i.e. the top 100)

2013

Rank Language # Repositories Created 1 JavaScript 264131 2 Ruby 218812 3 Java 157618 4 PHP 114384 5 Python 95002 6 C++ 78327 7 C 67706 8 Objective-C 36344 9 C# 32170 10 Shell 28561 11 CSS 17813 12 Perl 15412 13 CoffeeScript 11133 14 VimL 7857 15 Scala 6918 16 Go 6884 17 Prolog 5829 18 Clojure 4904 19 Haskell 4681 20 Lua 4048 Rank Language # Repositories Created 21 Puppet 3437 22 Groovy 3372 23 R 2980 24 Emacs Lisp 2919 25 ActionScript 2413 26 Matlab 2395 27 Arduino 2238 28 Erlang 2061 29 OCaml 2049 30 Visual Basic 1854 31 ASP 1268 32 Processing 1207 33 Common Lisp 1153 34 Assembly 1051 35 Logos 1027 36 TypeScript 972 37 Dart 950 38 D 936 39 Delphi 901 40 Scheme 882 Rank Language # Repositories Created 41 FORTRAN 794 42 PowerShell 771 43 XML 632 44 Racket 610 45 Elixir 573 46 ColdFusion 507 47 XSLT 496 48 Apex 484 49 F# 473 50 Haxe 455 51 Verilog 444 52 Julia 387 53 Tcl 338 54 AutoHotkey 338 55 Vala 321 56 VHDL 313 57 Rust 282 58 LiveScript 192 59 SuperCollider 151 60 Standard ML 139 Rank Language # Repositories Created 61 AppleScript 121 62 DOT 118 63 Ada 109 64 Coq 99 65 OpenEdge ABL 86 66 Gosu 76 67 Pure Data 73 68 Smalltalk 63 69 Kotlin 61 70 Lasso 57 71 Eiffel 55 72 Io 53 73 M 53 74 XQuery 52 75 Nemerle 49 76 Scilab 44 77 Objective-J 43 78 Awk 42 79 Slash 38 80 XProc 35 Rank Language # Repositories Created 81 Xtend 33 82 Nimrod 31 83 CLIPS 24 84 Boo 24 85 Ceylon 23 86 ooc 22 87 MoonScript 22 88 DCPU-16 ASM 19 89 Rebol 17 90 Factor 17 91 Ragel in Ruby Host 15 92 Bro 14 93 Dylan 13 94 Monkey 12 95 Nu 11 96 Arc 10 97 Augeas 9 98 PogoScript 8 99 Turing 6 100 XC 5

2012

Rank Language # Repositories Created 1 Ruby 344825 2 JavaScript 296564 3 Java 265223 4 C 212393 5 PHP 173938 6 Python 173727 7 C++ 93764 8 Shell 72006 9 Perl 48620 10 C# 43665 11 Objective-C 41536 12 VimL 18077 13 Go 16224 14 CoffeeScript 15722 15 Scala 14262 16 Haskell 10402 17 Clojure 9748 18 Tcl 9633 19 Emacs Lisp 8567 20 Groovy 6973 Rank Language # Repositories Created 21 Lua 6474 22 Erlang 5784 23 ActionScript 4777 24 Puppet 3926 25 R 3386 26 Matlab 2828 27 D 2740 28 Common Lisp 2529 29 Arduino 2459 30 Assembly 1882 31 Visual Basic 1821 32 Vala 1614 33 Scheme 1565 34 Delphi 1370 35 OCaml 1330 36 Smalltalk 1313 37 FORTRAN 1269 38 Dart 1174 39 ASP 1042 40 HaXe 983 Rank Language # Repositories Created 41 ColdFusion 966 42 Prolog 956 43 F# 670 44 PowerShell 652 45 Racket 614 46 CSS 530 47 Verilog 523 48 VHDL 473 49 Eiffel 406 50 Parrot 270 51 Apex 265 52 AutoHotkey 258 53 Rust 234 54 Scilab 230 55 DCPU-16 ASM 229 56 XML 206 57 Elixir 189 58 Ada 182 59 Coq 174 60 XQuery 155 Rank Language # Repositories Created 61 Julia 151 62 Pure Data 147 63 SuperCollider 131 64 Standard ML 127 65 XSLT 102 66 Kotlin 98 67 Powershell 93 68 Io 92 69 Objective-J 84 70 TypeScript 81 71 OpenEdge ABL 76 72 Nemerle 61 73 AppleScript 57 74 Haxe 54 75 Gosu 47 76 Factor 44 77 Logos 43 78 Processing 40 79 Logtalk 34 80 Dylan 34 Rank Language # Repositories Created 81 Nimrod 32 82 Ceylon 32 83 ooc 30 84 Opa 30 85 Boo 27 86 Fancy 26 87 Turing 26 88 Mirah 22 89 Max/MSP 21 90 Bro 17 91 Xtend 14 92 Rebol 13 93 LiveScript 12 94 Lasso 11 95 Arc 11 96 Augeas 8 97 DOT 6 98 Fantom 5 99 Awk 5 100 Max 4

Disclaimer

Here are a lot of reasons why analysing Github data might not be accurate: