On branching

Back in 2008 I posted some rambling thoughts on distributed version-control systems, largely in response to the huge amount of hype surrounding such tools (particularly git). Then at PyCon last year, amid even more hype as it was announced that Python will switch from Subversion to a distributed tool, I gave a lightning talk reflecting on what I’d picked up and seen during the intervening eight months.

It’s now been a year and a half since that original blog post; in that time I’ve switched all of my personal projects from using Subversion and Google Code Hosting to using Mercurial and Bitbucket, and I’ve started using Mercurial to interact with other projects — Django and things I deal with at work — which still use Subversion (or, in one case, git). I’m pretty happy overall: Mercurial’s simple and straightforward, has good merging for when I need it and decent enough interop with other tools. And Bitbucket’s far nicer than Google Code ever was.

So as far as the DVCS wars are concerned, I’ve made my choice and it works for me. I do, however, see a lot of questions from people who don’t know me or my workflow very well (people who do know better than to bait me into this argument) who wonder how I can live without “git-style branches”. So I’d like to take a moment to explain, first, why I don’t much care for git’s branching model and, second, to walk through the various branching options Mercurial offers, since a lot of git users don’t seem to actually know how that works.

My problem with git

Well, two problems actually. One is that I really don’t like “git-style” branches very much, and whenever possible I tend to prefer a separate clone of a repository over a branch. Partly this is a workflow issue: I often have multiple lines of development going simultaneously, and I like being able to pluck out the relevant files and see them side-by-side in Emacs without doing gymnastics in my version-control tool. And partly this is just a trust issue: I’ve worked with enough different tools over enough time to know that they all fail sooner or later, and using one which can fail and leave my code hidden away in places only the tool knows how to reach is a bit of a scary thought for me.

Of course, this isn’t quite as much of an issue since you can do multiple logically-separate repositories quite easily in git (and anyone you’re working with can just use remotes to deal with that). Having a bunch of branches in the same repository is just the standard workflow, not a requirement. But if the standard workflow is something I don’t like, I tend to stay away and look for another tool.

The other problem I have with git’s branching is that it really overloads multiple meanings of “branch” in a way that isn’t particularly useful. There are two big reasons to branch that I can see:

Quick, self-contained pieces of work: bugfixes, small features and minor refactorings. Even though you only work on them for a short period and then merge back, you want them isolated from mainline development during that time so you can properly write and test them without any unrelated bits getting in the way. Much larger pieces of work: major features and big rewrites/refactorings. These will be long-lived and need to stay isolated because the best way to handle such work is to merge mainline development only occasionally and at points where you’re prepared to deal with the effort of reconciling the two divergent branches of work.

Git’s lightweight branching style is good for the first case, because you’re just looking for a simple way to isolate a few changes for a while, then merge them back. At that point you delete the temporary branch and get on with your life. The problem is that this doesn’t seem to fit the second case nearly so well: a major branch which touches a lot of code should probably be a permanent part of your repository’s history, if only so you can go back later and try to figure out what the hell you were thinking when you were working on it.

Once again I’ll point out that you certainly can have a branch that you keep around for historical purposes once you’ve finished with it and merged the work, but that’s not the mindset encouraged by git’s documentation or by most of its advocates. Branches in git are, we’re always told, ephemeral things to be used and thrown away, and so far as I know git doesn’t have a way to indicate to your colleagues that you’re done with a branch; the only way to do this is to delete it, or to hope they see the final merge commit and understand that the branch is closed to further development (and if there is a way to do this, it’d be worth adding to the git-branch man page, preferably with clear and obvious language explaining how it works).

To be perfectly fair, though, at least it’s no worse than Subversion, which also lacks a way to mark a branch as closed. Every so often this results in someone popping up on one of the Django mailing lists and wondering why they’re having such trouble working with a checkout of an ancient dead branch (we’ve lately adopted the seemingly-standard SVN convention of moving such branches into a directory named attic to try to discourage this).

Mercurial’s three branching models

So let’s talk about Mercurial, which has three major ways to do branching (though really there are more workflows you could have — these are just the ones which seem to be actually used). One of these it shares with git: you can clone an existing repository into some other location, work with it from there and push and pull between the clones as needed. This is what I tend to do most of the time, and I’ve seen others do it as well, but it’s largely a matter of personal preference.

The other two are bookmarks and branches, and the fact that both exist is one reason (among many) why I prefer Mercurial’s approach to version control.

Bookmarks are provided by the bookmarks extension, which is shipped with Mercurial itself. Bookmarks provide a workflow quite close to git’s lightweight branches: a bookmark is simply a named head in the repository, and the bookmark moves as you add commits. A bookmark name is legal in almost any command which takes a changeset identifier, so you can hg update to get to your bookmark, merge based on it, etc. And when you’re done with it and your work’s merged back into mainline, you can delete it. The only thing you can’t do with a bookmark is clone; bookmarks aren’t part of Mercurial’s wire protocol, and hg clone doesn’t know how to look them up and work with them.

Branches, meanwhile, are somewhat heavier, and more like the branches of traditional centralized systems; also, they’re built in to Mercurial, rather than being an extension. Just as with a bookmark, a branch has a name, you can switch to it (using hg branch ), merge, and do all the other things you’d expect. You can also choose the branch or branches you want when cloning a repository. When you’re done with a branch, however, you cannot delete it from the repository; instead, you issue a commit which closes the branch, and Mercurial notes that the branch is closed. It’ll remain a permanent part of your repository history.

By now you can probably see where I’m going with this; Mercurial’s bookmarks and branches map pretty much perfectly onto the two big use cases I outlined above:

Bookmarks provide the lightweight temporary isolation you’d want for small bugfixes and other minor work. Branches provide the larger — collaborative and permanent — structure ideal for major lines of development.

There are, again, other ways you can work with Mercurial, but they tend to be quite a bit more specialized. Patch queues, for example, can be used to emulate branching but shouldn’t; they’re a godsend for cases when you really need them (and occasionally you will really need them), but it’s important to resist the temptation to use them for everything.

The take-away

So often in discussion of distributed version-control tools, the majority of comments are people going on and on (and on, and on, and on…) about “git-style branching”. But while git’s encouraged branching model is good for certain use cases, it has some shortcomings for other important and common cases. Mercurial’s model of distinguishing lightweight, temporary lines of development (bookmarks) from larger extended branches is, to me at least, a more useful approach to branching, and one that git and git users might want to take a more thorough look at.

Meanwhile I think I’m still somewhat on the fence regarding distributed version control, or at least the particular implementations of it we have right now, but most of my reasons are related to my secret superhero identity as Django’s release manager and have to do with the difficulties of tracking and coordinating development across larger numbers of developers, repositories and workflows. One of these days I’ll probably write them up properly, but for now the PyCon talk I linked above has some useful pointers.

Also, comments are closed here and likely will remain so as part of a little experiment; if there’s something you’d really like to say about this article, I’m sure it’ll end up on some aggregation site where I’ll see it.