Open Source .NET – 1 year later

A little over a year ago Microsoft announced that they were open sourcing large parts of the .NET framework. At the time Scott Hanselman did a nice analysis of the source, using Microsoft Power BI. Inspired by this and now that a year has passed, I wanted to try and answer the question:

How much Community involvement has there been since Microsoft open sourced large parts of the .NET framework?

I will be looking at the 3 following projects, as they are all highly significant parts of the .NET ecosystem and are also some of the most active/starred/forked projects within the .NET Foundation:

Roslyn - The .NET Compiler Platform (“Roslyn”) provides open-source C# and Visual Basic compilers with rich code analysis APIs.

- The .NET Compiler Platform (“Roslyn”) provides open-source C# and Visual Basic compilers with rich code analysis APIs. CoreCLR - the .NET Core runtime, called CoreCLR, and the base library, called mscorlib. It includes the garbage collector, JIT compiler, base .NET data types and many low-level classes.

- the .NET Core runtime, called CoreCLR, and the base library, called mscorlib. It includes the garbage collector, JIT compiler, base .NET data types and many low-level classes. CoreFX the .NET Core foundational libraries, called CoreFX. It includes classes for collections, file systems, console, XML, async and many others.

Available Data

GitHub itself has some nice graphs built-in, for instance you can see the Commits per Month over an entire year:

Also you can get a nice dashboard showing the Monthly Pulse

However to answer the question above, I needed more data. Fortunately GitHub provides a really comprehensive API, which combined with the excellent Octokit.net library and the brilliant LINQPad, meant I was able to easily get all the data I needed. Here’s a sample LINQPad script if you want to start playing around with the API yourself.

However, knowing the “# of Issues” or “Merged Pull Requests” per/month on it’s own isn’t that useful, it doesn’t tell us anything about who created the issue or submitted the PR. Fortunately GitHub classifies users into categories, for instance in the image below from Roslyn Issue #670 we can see what type of user posted each comment, an “Owner”, “Collaborator” or blank which signifies a “Community” member, i.e. someone who (AFAICT) doesn’t work at Microsoft.

Results

So now that we can get the data we need, what results do we get.

Total Issues - By Submitter

Project Owner Collaborator Community Total Roslyn 481 1867 1596 3944 CoreCLR 86 298 487 871 CoreFX 334 911 735 1980 Total 901 3076 2818 6795

Here you can see that the Owners and Collaborators do in some cases dominate, e.g. in Roslyn where almost 60% of the issues were opened by them. But in other cases the Community is very active, especially in CoreCLR where Community members are opening more issues than Owners/Collaborators combined. Part of the reason for this is the nature of the different repositories, CoreCLR is the most visible part of the .NET framework as it encompasses most of the libraries that .NET developers would use on a day-to-day basis, so it’s not surprising that the Community has lots of suggestions for improvements or bug fixes. In addition, the CoreCLR has been around for a much longer time and so the Community has had more time to use it and find out the parts it doesn’t like. Whereas Roslyn is a much newer project so there has been less time to use it, plus finding bugs in a compiler is by its nature harder to do.

Total Merged Pull Requests - By Submitter

Project Owner Collaborator Community Total Roslyn 465 2093 118 2676 CoreCLR 378 567 201 1146 CoreFX 516 1409 464 2389 Total 1359 4069 783 6211

However if we look at Merged Pull Requests, we can see that that the overall amount of Community contributions across the 3 projects is much lower, only accounting for roughly 12%. This however isn’t that surprising, there’s a much higher bar for getting a pull request accepted. Firstly, if the project is using this mechanism, you have to pick an issue that is “up for grabs”, then you have to get any API changes through a review, then finally you have to meet any comparability/performance/correctness issues that come up during the code review itself. So actually 12% is a pretty good result as there is a non–trivial amount of work involved in getting your PR merged, especially considering most Community members will be working in their spare time.

Update: I was wrong about the “up for grabs” requirement, see this comment from David Kean and this tweet for more information. “Up for grabs” is a guideline and meant to help new users, but it is not a requirement, you can submit PRs for issues that don’t have that label.

Finally if you look at the amount per/month (see the 2 graphs below, click for larger images), it’s hard to pick up any definite trends or say if the Community is definitely contributing more or less over time. But you can say that over a year the Community has consistently contributed and it doesn’t look like that contribution is going to end. It is not just an initial burst that only happened straight after the projects were open sourced, it is a sustained level of contributions over an entire year.

Issues Per Month - By Submitter

Merged Pull Request Per Month - By Submitter

Top 20 Issue Labels

The last thing that I want to do whilst I have the data is to take a look at the most popular Issue Labels and see what they tell us about the type of work that has been going on since the 3 projects were open sourced.

Here are a few observations about the results:

Discuss on /r/programming and Hacker News