The state of the art in merge technology Wednesday, June 26, 2013 Pablo Santos merging , semanticmerge 2 Comments

Software development is a team sport. Each team member develops his part and all parts are frequently glued together. Unless you code in a cave in the dark you’ll be running some sort of merge operation to put all the different parts together.

My goal with this post is to showcase what can be achieved today with modern merge technology.

The reason is that as developers we are normally constrained by the tools we use on a daily basis, and sometimes is good to raise our expectations by looking into what can be achieved, then being able to conclude whether we are interested or not.

My team has been working on source code merge for years and we’d like to share what we have learned. But instead of some sort of deep catalog covering all possible cases, what I’m going to do is to describe a scenario, a quite complex one, and then let you extrapolate what can be achieved by modern merge tech and how it can help your team.

The Ages of Merge (inspired on “the age of empires” game)

Before starting with the story I think it is worth to highlight how source code merge technology evolved during the last decade or two.

The following graphic tries to depict the different ages. The funny thing about it is that technologies from different ages (and teams using them) coexist... so you better watch where your team is :P

Dark Age – or living in the cave. Believe it or not, there are teams out there still not using any version control. Merging is sort of a pain when that happens :-P Zip file based versio control or “you’ve overwritten my changes” are two great examples.

Ancient Age – systems that rely on locking or some other arcane practice. Some teams are true believers and really think merging is some sort of crazy thing that they better avoid… Visual Source Safe, CVS and some other relics are the perfect examples.

Feudal Age – systems can merge but they are not good at it… so merging is perceived as something evil. Subversion and others are still here.

Imperial Age – systems excel doing merging, finding the ancestors, making developer lives easier. Git, Mercurial and Plastic SCM are clearly here.

The next frontier – this is precisely what I’ll be talking about: ok, good merging is becoming mainstream but it is not truly new. What can we expect to be next?

A merge nightmare

I’ll be using the following scenario to explain the kind of things that cutting edge merge systems can do.

There are two branches and the developers working on them will make colliding changes in them.

The developer in branch1 will modify and move a file while the developer in branch2 will also move it (to a different location) and modify it.

Most of the version control systems out there will dramatically fail doing that (check the merging table here), but it will be of great help for coders to have something able to deal with it.

But the scenario is going to be a little bit more difficult since the developer at branch1:

He will not only move the SocketHelper.cs file, he will sort all the methods based on visibility (public first, then protected, internal and finally private).

The developer at branch1 will also modify the method “SetSocketOption()”.

Meanwhile the coder at branch2:

He will move the file too.

He will modify the same method “SetSocketOption()” but in the original location.

The following picture shows how the class was rearranged by the developer at branch1:

Then inside the method “SetSocketOption()” both developers will make changes:

It really looks like a nightmare of a merge... and it would be really easy to take a long time to get it merged and even loose changes in the process.

Resolution – combining existing technologies

The current merge technology is able to deal with the case above and turn it almost into a trivial one.

The steps are as follows:

Basically the merge system will need to deal with the “divergent move” scenario first, then solve the file merge (at the end it is the same file the one that has been modified, so the system must not be fooled by the fact that the file end up with different names and different paths).

Solving the file merge can also be an issue since the methods have been rearranged. So the tool will need to deal with the code at the structure level (which means “understanding the specific source code constructs”).

And finally even if this is possible, we still have a “nice issue” inside the “SetSocketOption()” method.

Step 1 – divergent move

As obvious as it might seem, most of the version control systems are unable to deal correctly with a divergent move. Check the table here.

Git, for instance, despite being one of the best merge systems out there, will keep the two files on disk and then you’ll have to manually delete one of them (with git rm) but you’re in trouble if the file was also modified in parallel. Git doesn't guide you through the resolution problem.

The following table shows how the different version control systems handle the “divergent move” merge scenario:

I defined the simple scenario as the one involving files and the complex one involving also the divergent move of directories (check this for more information).

My goal is to describe how each SCM behaves in more detail in coming blog posts.

Step 2 – merge the file

As I described before this is where some version control systems fail because they don’t help you running the merge tool using the different contributors and the ancestor.

2013/06/27 - Edit:This is what Plastic SCM does when dealing with the divergent move conflict:

And then you can solve the divergent move this way:

Step3- running a “semantic” merge

If you remember the scenario it was rather complex if you think how to solve it with a traditional merge tool which is text based because it will try to match each text block and it will be a nightmare.

But considering all the methods have been just moved and only one of them was modified, the operation should be trivial.

This is what a semantic merge tool can do. And this is what my team has been developing: www.semanticmerge.com. It is fully usable from any version control system.

As you can see in the picture below, even after all the changes, SemanticMerge only asks you to solve one conflict. Everything else is automatic.

Step 4- solving the “SetSocketOption()” conflict

To make things even worse I also made the merge of the conflicting method to be rather difficult.

As you can see, part of the code has been moved to a different location, modified, and meanwhile the other developer changed the code block on its original location...

Prior to develop SemanticMerge we developed Xmerge and it is capable of dealing with scenarios like this as you can see below:

It is only dealing with the text of the method in conflict (so just focused on where the problem is) and you can see there is an “Xmerge” button. This button only shows up when the tool detects an Xmerge scenario: basically it found out that some code has been modified on one contributor but has disappeared on the other, so it will try to find the matching code in some “added block”. In this case it is able to do it and when you press Xmerge the conflict will be the following:

It runs a “sub-merge” focusing only on the conflicting piece of code, and then you see how the developers at branches one and two made conflicting changes on the same line.

The great things about this entire “divide and conquer” approach are:

Changes where done on files with different names on different directories. The merge system was still able to figure out that they were the same file and merge the two contributors.

The conflicting method was located on different positions on the two files. The semanticmerge system was able to figure it out and only proposed a conflict on the method modified in parallel (despite it was on two different locations).

Inside the method, the code was moved, but Xmerge was able to find the moved fragment and merge it correctly.

Conclusion

This is what modern merge technology can achieve TODAY: take an almost impossible to solve scenario, split it in parts and solve each one seamlessly, something that it is simply not possible for many toolsets.

By the way you won’t be facing specially contrived scenarios like the one I described on a daily basis, but it shows “what can be done” and now it is up to you to conclude how it fits in your team!

Edit: Remember you can download SemanticMerge from www.semanticmerge.com and give it a try with your own version control, or you can also give a try to Plastic SCM.