Forking instead of fighting

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Bradley Kuhn is widely known for his GPL-enforcement efforts. He has spoken about them at many different conferences along the way, but his talk at LinuxCon North America in Chicago was on a different tack entirely. Instead of trying to enforce the GPL, he and others routed around a violation of the license by writing code—forking the project rather than fighting the violation.

"Sometimes it makes sense not to enforce the GPL", Kuhn said, which many may find a surprising thing for him to say. He normally talks about violations in the embedded Linux space, since they are the most "prevalent and insidious", but violations come in all shapes and sizes. This one was different than many others, which led to a different kind of resolution.

Kuhn then took a short detour to ensure that the audience was up to speed on the GPL. For the purposes of the talk, there were just a few things that the audience should be willing to agree to—at least for 45 minutes or so. The GPL requires that the source code for a covered work (the "whole work") to be released with the "complete corresponding source" (CCS) for that work. There are differences of opinion about what, exactly, goes into the whole work and he could give an entire talk on just that topic. There is limited guidance from either the laws or courts on what constitutes the whole work, but he believes that will change in his lifetime—we will see a court case that makes a judgment about the reach of "whole work".

There is a fundamental assumption in the software industry, he said, that proprietary software makes more money than free software. The veracity of the assumption is immaterial, as that is the perception which causes companies to try to keep as much of their code proprietary as they can. Developers, on the other hand, would share their code, "all things being equal".

Developers will share code when it is convenient to do so, but even sometimes when it is not so convenient. A Japanese developer once told him about early code-sharing in that country by way of floppies placed on a sushi counter when meeting for lunch. Developers are not all "freedom zealots" like he is, but they tend to err on the side of code-sharing. That is the backdrop for this tale, he said.

The birth of hg-app

In January 2010, a developer noticed that there were no free-software code-sharing sites supporting the Mercurial revision control system. There were a variety of proprietary solutions (e.g. GitHub), but most of those only supported Git. So, the developer scratched an itch and created something called "hg-app" that was released in June 2010 under the MIT license. That immediately led to a flame war about the license, since Mercurial is released under the GPL and hg-app is based on Mercurial (in a GPL sense). Some pointed out that the MIT license is GPL-compatible, thus it might not make any practical difference. But in the end, hg-app's developer switched to the GPLv2+.

Sometime later, the project was renamed to RhodeCode; Mercurial developers also started to contribute to it under the GPL. Some developers were also paid to work on improvements to RhodeCode. This is all pretty standard fare for a free software project, he said, but things were about to change.

The company

The primary author of RhodeCode formed a company, RhodeCode GmbH (which Kuhn said he would refer to as "the company" to avoid confusion with the software of the same name). The company announced a license change and added a 20-user maximum into the Python code for RhodeCode. That led to complaints, threats, and ultimately a patch to remove the 20-user restriction. The company then threatened the author of that patch.

Mercurial is a member of the Software Freedom Conservancy (SFC), which Kuhn is the president of, so SFC got involved in the dispute at that point as a mediator, more or less. Some Mercurial developers and other community members sought aggressive action against the company. But, Kuhn said, SFC's goal was to have a calm conversation about the issue with the company.

That conversation broke down quickly. The company claimed to have 100% of the copyright in RhodeCode, even though patches had been accepted from others under the GPL. There is also the question whether the whole work includes Mercurial itself, which would then also require a GPL release for RhodeCode.

The company cannot revoke the GPL on earlier releases of the code under that license (and, he stressed, it has not disputed that). It is a question of the future copyrights: can those be licensed under non-GPL terms. Kuhn believes the company does have an obligation to release under the GPL going forward, both because of the GPL patches accepted and due to the whole work question.

But the company did not agree. When friendly negotiations break down, he said, you have to look at the options. In most cases, the only option is litigation. For example, in the embedded space, there is typically no code at all, or it is so far from the CCS that it is not useful. So, a lawsuit has to be filed to force the release of the CCS.

But this case is a bit different. The company is not a completely bad actor; in fact, it spent a "long time as a good actor", he said. The reason that this situation exists at all is because the company did a good thing in the past (released its code). It won't be doing that going forward, which is lamentable, but we can take the code that was released and move on, he said.

The company will still be violating the GPL, he and others believe, but a lawsuit to pursue that will take far too long. The famous USL v. BSDi lawsuit essentially shut down development on free BSDs for 18 months, he said. That is actually a fairly short time frame; it could have been much longer.

Fork

Rather than do that, the developers decided to fork the code base and move forward. The original code is under the GPL, but some of the newer additions are as well. Done carefully, some of that newer code could be pulled into the fork.

The company's license is complex and self-contradictory, Kuhn said. The code is split into two parts, with Python and HTML code being licensed under GPLv3 and everything else, including CSS, images, and design, released under a separate proprietary license.

Under GPLv3, that second part could be considered "non-permissive additional terms" that could be removed based on section 7 of GPLv3. But the company would likely fight that interpretation, so to avoid conflict, SFC used a conservative reading of what the license said and followed it, Kuhn said.

SFC decided not to take any action on behalf of the Mercurial project. It had also gotten the ball rolling on the idea of a fork, but any fork would not automatically be a member of SFC. There was a big debate in the membership committee, Kuhn said, about whether to take on the fork as a member. As part of the decision to do so, the committee came up with a set of conditions that would clarify the provenance of the fork's code, so that the risk to SFC and its other member projects was low.

That led to a four-step process that would be followed before releasing the fork. First, find the last version of the code base without the new license. Second, extract useful patches of Python and HTML code from the post-license-change versions. Third, rebrand the project to a new name. And, finally, ensure "beyond reproach" compliance with the license.

The work was largely done by Kuhn, as part of his SFC work, and by Mads Kiilerich, who is a volunteer, with the assistance of a few others. The first step was fairly easy. Using the Mercurial repository for the project, identifying the changeset where the license change was made was straightforward.

Even with a "hyper-conservative reading" of the license, Python and HTML files are still clearly released under GPLv3. Pulling those kinds of changes out of the post-license-change versions was a bit tricky. In many cases, the changesets also touched other kinds of files. They came up with Mercurial commands to pull out what they wanted and Kuhn vetted all of the changes. Any edge cases were "discussed carefully with legal counsel", he said.

Rebranding was rather painful, overall. RhodeCode is the company's name and trademark, so the fork could not use that name except in the usual ways that anyone can (i.e. nominative use). They came up with the name "Kallithea", which is a location on the Greek island of Rhodes. But there was more to it than just renaming the project, as the string rhodecode_ was used throughout the code. While it was probably unnecessary to do so, he said, they wrote 300 lines of sed and Perl to replace all of the uses.

JavaScript and the GPL

In order to be "beyond reproach" in its license compliance, Kallithea needed to ensure that it was providing the CCS for its code. Even if the company was violating GPLv3, that doesn't give the project (which is using a large chunk of code that the company holds copyright to) the right to do so. The biggest problem for providing the CCS turned out to be JavaScript under the GPL.

It is a problem that other projects have, he said, but they may not know it. Typically, you publish a .js file at some URL and it gets downloaded as part of an HTTP request. Under the GPLv3, you have distributed the code at that point, so you must provide the CCS. For RhodeCode/Kallithea, though, there is bunch of JavaScript code from all over, some of it was written for RhodeCode, but lots of it was from elsewhere.

The first problem was tracking down what version of the external code is being used (and what license it is under) so that the license text accompanying Kallithea could be kept up to date. That part was fairly straightforward (if tedious), but the real problem came from "minified" JavaScript. Under the GPL, that is considered to be "object code", so the source JavaScript had to be tracked down to be added to the CCS of Kallithea.

For example, YUI 2.9 is a deprecated Yahoo user interface library written in JavaScript that can be found in many places in minified form. That's fine, since the library is BSD licensed, but it is not fine for a GPL-licensed package to release it that way. It would hypocritical for Kuhn to release code without the CCS, he said, given that he has spent many years fighting for the CCS to various other programs in court. They were eventually able to figure out how to get the source and to build it into the minified version, so the instructions to do so are now part of the license file for Kallithea.

There are some minutia to the GPL, he said, but they are normally easily met. The first release of Kallithea was done on the same day (August 22) as Kuhn's talk. A late-breaking problem that they ran into before the release was the license notification that appeared in the HTML of each page of the interface. RhodeCode has an incorrect one, he said, but getting that right is not necessarily easy. He would rather maintain a single page (like "About") rather than something on each page. It is, he said, the first time he has felt burdened by the GPL—it is something he may try to get Richard Stallman to change down the road.

Kuhn cited three lessons for developers that resulted from this episode. Don't just grab JavaScript from anywhere and incorporate it into a web application, he said. Part of the problem is that Python programmers (in this case) don't really take JavaScript very seriously, which can lead to problems as it did here.

When contributing to a new project, immediately check to see who holds the domain name and trademark—if it is only one person, start talking to an organization like SFC. Kuhn "pre-announced" a new "Conservancy-lite" program that SFC is offering to projects who are just looking for a place to park their domain name and trademark.

Lastly, he suggested that developers keep their own copyrights to their code. The code can be contributed to a project under the same license the project releases its code under. Developers should make it clear they expect the license that the contribution was made under to be upheld.

Meanwhile, Kallithea is an early success. It has released version 0.1, which supports both Mercurial and Git, and anyone can run their own instance. And all of it is developed in the open under GPLv3.

[I would like to thank the Linux Foundation for travel assistance to Chicago for LinuxCon North America.]

