Almost every discussion about distributed version control systems (DVCS) on the Internet includes at least one post along the lines of “DVCS can’t merge binary files and my project has binary files.” There is a good reason why you might not want to use DVCS for binaries, but contrary to popular belief, DVCS not being able to merge them isn’t it. My purpose here is to try to convince you why, with some exercises you can try yourself at home.

Most of the confusion arises from not understanding the difference between merging binary files and merging branches containing changed binary files. The former no version control software can do, and the latter DVCS can do just as well as any other VCS, if not better.

I can already feel the heat from people preparing their replies. This is one of those things that people will argue about forever even when they’re wrong, because it seems so obvious that they don’t need to bother trying it for themselves. In anticipation of this, I’ve prepared a short demonstration.

Here’s the scenario: Bob is working on updating the documentation on his company’s website for their upcoming software release. Bob branches from the staging branch so he can work on updating the screenshots to move the minimize/maximize/close buttons to the left. It takes Bob longer than the original estimate, because he didn’t realize you could just take new screenshots instead of editing the old ones in Gimp, but he eventually merges his branch back into the staging branch.

At this point we have one branch that says buttons go on the left and one that says buttons go on the right. Obviously a merge conflict, right? Wrong. The merge algorithm knows the screenshots haven’t changed in the staging branch since Bob branched off and does the right thing. Don’t believe me? Try it for yourself. I’ll wait.

Some people get it in their head that this will work for text files, but not binary files, because binary files aren’t “mergeable.” Note that in this scenario, the merge algorithm doesn’t care if the files are mergeable or not.

That’s not to say there aren’t scenarios where mergeability matters, but with binary files you hope you never get into that situation, because no version control can get you out of it. If Alice is changing the same screenshots at the same time as Bob, there’s no way to merge them automatically.

To help out people who don’t like scary solutions like communicating with your coworkers, most centralized version control software lets you place a lock on a file. Because none of the major distributed software has this lock feature yet, people claim it’s because it’s fundamentally impossible with DVCS.

While it’s true locking requires communication with a central lock authority, there’s no need for that to be in the same place as everything else, nor is there a need to be in constant contact with that central authority. If people spent as much time implementing the feature as they do whining about the lack of it, every DVCS implementation would have had locking years ago.

As I mentioned, there is one good reason why you might not want to adopt DVCS for your binary files. Binary diffs tend to be larger than text diffs, and with DVCS all the history gets stored on every developer’s computer. However, you shouldn’t assume that every change will increase your repository size by 100% of the binary file size. In my test for the Bob scenario, it only increased about 36% for Bazaar. You also shouldn’t assume that all that history is being copied every time you make a new branch. All the major software lets you share the diffs between local branches, and although the initial checkout may take a while, after that only the changes are communicated.

In conclusion, if you have been avoiding evaluating DVCS because of the binary myth, you might want to give it a second look and actually try it out on your own files. You may still find CVCS to be a better fit, but at least that decision will be based on evidence. On the other hand, I think you have a good chance of being pleasantly surprised.