Semantically diffing Java code Wednesday, October 16, 2013 Pablo Santos semanticmerge 0 Comments

Knuth stated that source code is meant to be read by humans, not machines. Code is going to be read by our peers more often than it is going to be modified and that’s why keeping it clean and well organized is key.

But cleaning up code very often means restructuring the sources in such a way that reviewing the changes becomes a nightmare with conventional diff tools (not to mention merging). And at the end of the day the outcome is that clean up is postponed or only performed during “refactor specific” tasks which tend to be much less frequently scheduled… negatively impacting the technical debt.

Uncle Bob to the rescue

Uncle Bob’s famous Clean Code is all about cleaning up and organizing the sources to make them easy to understand.

One of my favorite chapters covers code formatting and it states:

Code formatting is about communication, and communication is the professional developer’s first order of business.

And:

The functionality that you create today has a good chance of changing in the next release, but the readability of your code will have a profound effect on all the changes that will ever be made.

There are a couple of rules of thumb that I consider worth mentioning:

Vertical Distance: concepts that are closely related should be kept vertically close to each other

Dependent Functions. If one function calls another, they should be vertically close, and the caller should be above the callee, if at all possible.

CodeAnalyzer.java as a code rearrangement example

Chapter 5 of Clean Code shows the well written CodeAnalyzer.java code with an easy to follow “newspaper like” vertical organization: every method has the callee methods below, which makes the class easier to understand.

But let’s start with a slightly modified CodeAnalizer.java where methods have been sorted by visibility instead of following the rules introduced above. The code structure will be like this:

As you can see public methods go first followed by the private ones. While it can be an acceptable way of organizing the code it certainly doesn’t help you “follow” the story.

Fixing a bug and repairing a window

Remember “broken window” from Pragmatic Programmer? We should always prevent the code base to deteriorate by improving it at any moment.

So, suppose we have to fix a bug within CodeAnalyzer.java. We spend some time understanding the code and then we see it would be much better to organize the class in a different way, with methods vertically arranged following the “dependent functions” recommendation from above.

Suppose the final organization we follow (as described in Clean Code) is as follows:

We still didn’t modify a single line of code to fix a bug, but the file is already quite different.

Enter the diff hell

Suppose you checkin your code rearrangement right now (remember you even didn’t fix the bug yet).

Suppose that someone else from your team goes and checks your code now.

This is what a regular side by side diff tool will show:

A few changes and the text based tool trying to match unrelated blocks (like trying to match “findJavaFiles()” with “measureLine()” as shown above).

Obviously the tools are not helping here to embrace best practices...

So you don’t repair the broken windows

Confronted with the previous situation most teams will:

Forbid “code rearrange” during bugfixing to avoid “code review hell”.

As an outcome: the code will deteriorate since many opportunities to make it better will be loss.

There must be a better way

I often talk about how important “code merge” is but sometimes simple daily operations like diffing changes can greatly benefit slightly more clever tools. You’re going to be diffing much more often than merging at the end of the day, and as we’re going to see now, good diffing can help you enforcing good practices like cleaning up code when possible.

Suppose your tool instead of comparing text-blocks does something slightly more intelligent: what if it parses the code first and then compares based on the code structure instead of trying to do line-per-line text matching?

Well, the tool could be able to show you something like the following:

At first glance you’ll see methods were just moved (watch the “m” icons) and basically nothing has been modified yet from a functionality point of view. Quite easy to understand.

Better diffing means more focus on the problem to solve

Now suppose one of the moved methods was really modified. It would take a while to find out with the traditional text-based diff tool, but check what SemanticDiff can do:

Again it is very easy to find where the “real” change is and also to select the method and just diff its contents on a “subdiff” that lets you focus on the right part of the code:

In my case it was like this:

Conclusion

Our focus with SemanticMerge is not only merging but also trying to improve the toolset we have available as developers on a daily basis. Rendering changes on an understandable way can save precious time and as we’ve seen, help you adopt well-known best practices.