As I have already written in a previous post, I have moved away from bzr to git for most of my software projects (I still prefer bzr for documents, like my research papers). A lot if not most of the comparison of git vs other tools focus on speed. True, git is quite fast for source code management, but I think this kinds of miss the point of git. It took me time to appreciate it, but one of the git’s killer feature for source code control is the notion of content tracking. Bzr (and I believe hg although I could not find good information on that point) use file id, i.e. they track files, and a tree is a set of files. Git, on the contrary, tracks content, not files. In other words, it does not treat files individually, but always internally consider the whole tree.

This may seem like an internal detail, and an annoyance because it leaks at the UI level quite a lot (the so-called index is linked to this). But this means that it can record the history of code instead of files quite accurately. This is especially visible with git blame. One example: I recently started a massive surgery on the numpy C source code. Because of some C limitations, the numpy core C code was in a couple of giantic source files, and I split this into more logical units. But this breaks svn blame heavily. If you just rename a file, svn blame is lost can follow renames. But if you split one file into two, it becomes useless. Because git tracks the whole tree, the blame command can be asked to detect code moves across files. For example, git blame with rename detections gives me the following on one file in numpy:

dc35f24e numpy/core/src/arrayobject.c 1) #define PY_SSIZE_T_CLEAN dc35f24e numpy/core/src/arrayobject.c 2) #include <Python.h> dc35f24e numpy/core/src/arrayobject.c 3) #include "structmember.h" dc35f24e numpy/core/src/arrayobject.c 4) 65d13826 numpy/core/src/arrayobject.c 5) /*#include <stdio.h>*/ 5568f288 scipy/base/src/multiarraymodule.c 6) #define _MULTIARRAYMODULE 2f91f91e numpy/core/src/multiarraymodule.c 7) #define NPY_NO_PREFIX 2f91f91e numpy/core/src/multiarraymodule.c 8) #include "numpy/arrayobject.h" dc35f24e numpy/core/src/arrayobject.c 9) #include "numpy/arrayscalars.h" 38f46d90 numpy/core/src/multiarray/common.c 10) 38f46d90 numpy/core/src/multiarray/common.c 11) #include "config.h" 0f81da6f numpy/core/src/multiarray/common.c 12) 71875d5c numpy/core/src/multiarray/common.c 13) #include "usertypes.h" 71875d5c numpy/core/src/multiarray/common.c 14) 0f81da6f numpy/core/src/multiarray/common.c 15) #include "common.h" 5568f288 scipy/base/src/arrayobject.c 16) 65d13826 numpy/core/src/arrayobject.c 17) /* 65d13826 numpy/core/src/arrayobject.c 18) * new reference 65d13826 numpy/core/src/arrayobject.c 19) * doesn't alter refcount of chktype or mintype --- 65d13826 numpy/core/src/arrayobject.c 20) * unless one of them is returned 65d13826 numpy/core/src/arrayobject.c 21) */

You can notice that the original file can be found for every line of code in the new file. The original author and date may be found as well, I just removed them for the blog post.

This is truely impressive, and is one of the reason why git is so far ahead of the competition IMHO. This kind of features is extremely useful for open source projects, much more than rename support. I am ready to deal with quite a few (real) Git UI annoyances for this.

Edit

It looks like my example was not very clear. I am not interested in following the renames of the file: in the example above, the file was not arrayobject.c first, then renamed to multiarraymodules.c, and later to common.c. The file was created from scratch, with content taken from those files at some point. You can try the following simplified example. First, create two files prod.c and sum.c:

#include double sum(const double* in, int n)

{

int i;

double acc = 0;