First, I did some research on what tools are available that can convert an Hg repository to use Git while preserving history. GitHub actually provides a repository importer that can import from Hg when creating a new repo, and there's an open source project called fast-export that can do the same locally. Later on in the process, I also found the Mercurial ConvertExtension which advertises similar capabilities.

My first efforts were less than encouraging. Transforming 140k history entries takes a while, so it wasn't until about an hour and a half later that I got word from the GitHub importer that the repository was too large to import, and it took another 45 minutes for `fast-export` to fail with an error about an invalid date on a commit. Drat. Trying fast-export again with `--force` got it past the first error, only to fail on another invalid date later in the conversion. I later learned these errors weren't caused by the date, it was just a symptom.

Given that these errors occur on revisions that are several months old and that I wanted to reduce the size of the repository anyway, I started looking for a way to prune the history before converting to Git. I found the Mercurial ConvertExtension which purported to do Hg > Hg conversions (and a lot more) with options for starting at a specific revision, filtering files out of history, and all sorts of powerful options. However, it consistently failed with an obscure error about an invalid compression type about 3/4ths of the way through the history (`Abort: invalid compression type ‘k’` and `compression type ‘6’` were both seen. Not helpful). IRC, StackOverflow, and years-old mailing lists were unable to help me get it working, and after a day without progress I had to abandon this tool that seemed so promising.

After these defeats, I talked to a coworker who had done a similar conversion in the past. He pointed me to `fast-export` and a pair of tools for mass-modifying history called git-filter-branch and BFG Repo-Cleaner. He also mentioned the same invalid date issue I saw, and pointed me to a solution—the error wasn't the date, it was a null author causing misaligned columns. Running `fast-export` with that `authors.txt` file worked! About an hour later, I had an equivalent Git repository, with a backup so I could screw with history all I wanted without having to re-export from scratch. And to keep it up-to-date with changes my coworkers were making, `fast-export` will do incremental exports as long as nothing new has been committed to the exported Git repo.

Now that I had our complete history in Git, I could start trying to pare it down to a manageable size.

Some trial runs with BFG Repo-Cleaner showed it was very good at its job, reducing the size of the `.git` folder from ~1.6GB post-export down to around ~100MB in minute or two. But it still left me with the same number of commits, implying there were tens of thousands of empty revisions clogging up the history. `git-filter-branch` supports pruning empty commits with `--prune-empty`, but some experimentation showed it would take over 2 hours to completely process the history.

Luckily, not only has somebody opened a PR adding `--prune-empty-commits` to BFG Repo-Cleaner, but when it hadn't been merged after a few months, they published a .jar! With this .jar and the new flag, the Git history has gone from almost 140k commits to about 30k, and the`.git` folder is down to ~70MB. Still a large repo, but a huge, vast improvement. To be thorough, I ran `git filter-branch --prune-empty` afterwards to see if there were any more commits to find. It got the history down to ~27k commits and took about 25 minutes—a small improvement that takes quite a while, but still worth it for a one-time migration.

Ultimately I had to write up two scripts to automate the process. One script was run on each branch we needed to keep, deleting unnecessary files and creating a `.gitignore`, and one "coordinating" script that did repo-wide tasks like renaming and deleting branches, running the other script on each branch, removing deleted files from history, and garbage collecting the git repo.

To recap, before the migration, we had:

6 years of history

1.2GB in .hg

800MB of current files

~50 Visual Studio solutions

~200 Visual Studio projects

140,000 commits

~180 top level folders my team doesn't use

After the migration, we’re left with:

Still 6 years of history

70MB in .git

15MB of current files

0 solutions

0 projects

27,000 commits

No files that we don't use

Tools successfully used:

My lessons learned:

The process of changing SCM systems and mass-modifying history was actually a lot easier than I expected. I came away with a negative impression of Hg; it allowed corrupt commits into the history that caused multiple tools to fail. The tools for manipulating Git history were a lot more straightforward than the ConvertExtension. I spent a lot of time reading docs and asking questions about it on IRC and never felt I was getting closer to understanding how it was supposed to be configured. And it gave me yet another demonstration that, no matter how odd and unique you think your problem is, somebody else has had to do the same thing.