Diff. Patch. Change. Delta. Δ.

Whatever you call it, diffs are pretty much the aggregate output of any developer’s day. A bunch of deletions and additions from a set of files that can be codified in a .patch file as a bunch of pluses, minuses and context indicators.

‐‐‐ a/todo.md

+++ b/todo.md

@@ -1,5 +1,4 @@

# TODO

buy anniversary present

build arduino-based vaucanson’s duck

-blog on git diffs

fork webkit

‐‐

I’m still undecided as to whether it’s depressing or amazing that our life’s work can be so simply encoded.

Regardless, git has given us a bunch of neat ways at looking at these diffs. The following post is a quick primer on the various incantations of git diff. Note that I’m going to be focusing on different output format for viewing diffs. There’s also a whole range of ways you can specify different commits and paths to diff, but that’s a topic for another day.

git diff

You’re probably already familiar with the built in git diff command. By default, diff will show you changes to tracked files in your work tree that haven’t yet been added to the index. For example:

It was the best of times,

it was the blurst of times. $ sed -i ‘.tmp’ ‘s/blurst/worst/’ 2cities.txt $ cat 2cities.txt $ git diff It was the best of times,

-it was the blurst of times.

+it was the worst of times.

This works fine, but it’s unfortunate that it shows up as one removed line followed by one added line. All I did was change one word!

git diff ‐‐color-words

git diff also has a special mode for highlighting changes with much better granularity: ‐‐color-words. This mode tokenizes added and removed lines by whitespace and then diffs those.

It was the best of times,

it was the blurst worst of times. $ git diff ‐‐color-words

That’s an improvement! Now we’re down to the individual words that have changed. But if you look closely, we’ve really only changed three characters: “blu” has become “wo”. We can do even better.

git diff-highlight

If you clone the git source you’ll find a sub-directory called contrib. It contains a bunch of git-related tools and other interesting bits and pieces that haven’t yet been promoted to git core. One of these is a perl script called diff-highlight. diff-highlight pairs up matching lines of diff output and highlights sub-word fragments that have changed.

It was the best of times,

-it was the blu rst of times.

+it was the wo rst of times. $ git diff | ~/src/git/contrib/diff-hightlight/diff-highlight

Sweet! Now we’ve pared down our diff to the smallest possible change.

If you like this fine-grain way of looking at diffs, you can add a shell function alias to ~/.git/config to handle the piping to diff-highlight for you:



[alias]

# assumes git.git clone at ~/src/git

diff-highlight = "!f() { git diff \"$@\" | ~/src/git/contrib/diff-highlight/diff-highlight; }; f"



Then, you can get a nicely highlighted diff by simply invoking git diff-highlight.

Diffing binary files

If you make changes to a binary file (such as an image, pdf or archive) the default output from git diff is pretty uninteresting.

Binary files a/script.pdf and b/script.pdf differ $ git diff

However, git does have a nifty feature that allows you to specify a shell command to transform the content of your binary files into text prior to performing the diff. It does require a little set up though.

First you need to specify a textconv filter describing how to convert a certain type of binary to text. I’m using a simple utility called pdftohtml (available via homebrew) to convert my PDFs into human readable HTML. You can set this up for a single repository by editing your .git/config file, or globally by editing ~/.gitconfig.

.git/config



[diff "pdfconv"]

textconv=pdftohtml -stdout



Then all you need to do is associate one or more file patterns with our pdfconv filter. You can do this by creating a .gitattributes file in the root of your repository.

.gitattributes



*.pdf diff=pdfconv



Then, the next time we attempt to do a diff of a binary file, we get much more meaningful output:

<HTML>

<BODY>

Interior. Room filled with monkeys bashing on typewriters. Mr. Burns tears a sheet of paper from a typewriter.<br>

-Burns: It was the best of times.. it was the blurst of times?<br>

+Burns: It was the best of times.. it was the blurst of times? You stupid monkey! <br>

Burns balls up the sheet and throws it at the monkey.<br>

</BODY>

</HTML> $ git diff

And remember, this is just regular diff output, so we can redo our diff-highlight trick to get something a bit more palatable:

Interior. Room filled with monkeys bashing on typewriters. Mr. Burns tears a sheet of paper from a typewriter.<br>

-Burns: It was the best of times.. it was the blurst of times?<br>

+Burns: It was the best of times.. it was the blurst of times? You stupid monkey! <br>

Burns balls up the sheet and throws it at the monkey.<br> $ git diff-highlight

The same technique can be applied to get useful diffs from all sorts of binary files, for example:

zips, jars and other archives: using unzip -l (or similar) in place of pdf2html will show you paths that have been added or removed between commits

using unzip -l (or similar) in place of will show you paths that have been added or removed between commits images: exiv2 can be used to show metadata changes such as image dimensions

exiv2 can be used to show metadata changes such as image dimensions documents: conversion tools exist for transforming .odf, .doc and other document formats to plain text. In a pinch, strings will often work for binary files where no formal converter exists.

Want to talk more about git or Atlassian Stash? Leave a comment or hit me up on twitter @kannonboy