Git: Grafting repositories

We recently evaluated replacements for our VSS-workalike source control system at work. We have about 14 years of history in our current database, though, and it seems like a good idea to preserve that.

The problem is that all that history takes time to import, and shutting all development down for a week while we get the data into the new system was just not an option. I knew there had to be a way to get this done right, and it turns out git can do exactly what we want.

Moving to git

The first step is to fetch a clean snapshot of the current source tree and stuff that into a git repo as the root commit. All the engineers can then start working from that repository with no effective downtime.

I’m going to handwave the data conversion, but suffice it to say we’d need a fast-import script. The interesting bit is how to get all of this historical data back into git.

Pull it in

So now our “new” repository has some work done on it, and it looks like this:

And we’ve imported all the old history into an “old” repository, which looks like this:

Now what we want to do is change the first commit in the “nuevo” repo (“New commit #1”) so that its parent is the last commit in the “old” repo (“Old #3”). Time for some voodoo:

git fetch ../old master:ancient_history

Git lets you fetch from any other git repository, whether this repo is related to it or not! Brilliant! This leaves us with this:

Note how we renamed the old master branch to ancient_history . If we hadn’t, git would have tried to merge the two, and probably given up in disgust.

Now we still have a problem. The two trees aren’t connected, and in fact a git pull won’t even get the ancient_history branch at all. We need a way to make a connection between the two.

Grafts

Disclaimer: I know there must be an easier way.

Git has a facility called a graft, which basically fakes up a parent link between two commits. To make one, just insert a line into the .git/info/grafts file in this format:

[ref] [parent]

Both of these need to be the full hash of the commits in question. So let’s find them:

$ git rev-list master | tail -n 1 d7737bffdad86dc05bbade271a9c16f8f912d3c6 $ git rev-parse ancient_history 463d0401a3f34bd381c456c6166e514564289ab2 $ echo d7737bffdad86dc05bbade271a9c16f8f912d3c6 \ 463d0401a3f34bd381c456c6166e514564289ab2 \ > .git/info/grafts

There. Now our history looks like this:

Perfect! What could go wrong?

What went wrong

Cloning this repo results in this:

Woops. It turns out that grafts only take effect for the local repository. We can fix this with judicious application of fast-import :

$ git fast-export --all > ../export $ mkdir ../nuevo-complete $ cd ../nuevo-complete $ git init $ git fast-import < ../export git-fast-import statistics: [...]

This effectively converts our “fake” history link into a real one. All the engineers will have to re-clone from this new repository, since the hashes will all be different, but that’s a small price to pay for no downtime and a complete history.