There has been a recent kerfuffle over a pull request to libuv: it was rejected, applied, reverted, and re-applied. There was some question as to the authorship of that commit, and I wanted to show you why that was, because it illustrates how git handles history.

At first, the pull request was closed. Then, it was committed in 47d98b6. Then 804d40ee reverted that 47d98b6. But when you look at 804d40ee on GitHub, you’ll see no indication of which branch it’s on. That’s because it’s not on any branch. If you clone down libuv, and try to find the commit, you’ll see it’s not there:

~/libuv(master)$ git log 804d40e fatal: ambiguous argument '804d40e': unknown revision or path not in the working tree.

What gives?

Let’s make a test repository:

$ mkdir test $ cd test $ git init Initialized empty Git repository in /home/action/test/.git/ $ touch HELLO.md $ git add HELLO.md $ git commit -m "initial commit" [master (root-commit) 646567b] initial commit 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 HELLO.md

Okay, now let’s make a second commit:

$ touch WHATEVER.md $ git add WHATEVER.md $ git commit -m "adding whatever" [master 7c232cc] adding whatever 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 WHATEVER.md

Actually, that commit was a mistake. Since we haven’t pushed anywhere yet, let’s just use git reset --hard to just throw it out.

$ git reset --hard HEAD~1 HEAD is now at 646567b initial commit

But what about that commit? Where is it?

$ git show 7c232cc commit 7c232cceb94a2e7cdd95c526de785efe08da2325 Author: Steve Klabnik <steve@steveklabnik.com> Date: Sat Nov 30 20:19:26 2013 +0000 adding whatever diff --git a/WHATEVER.md b/WHATEVER.md new file mode 100644 index 0000000..e69de29

It’s still in the repository. We can use git reflog to see our changes:

$ git reflog 646567b HEAD@{0}: reset: moving to HEAD~1 7c232cc HEAD@{1}: commit: adding whatever 646567b HEAD@{2}: commit (initial): initial commit

Git calls these kinds of commits ‘unreachable’ because, while they exist in your repository, you can’t find them unless you know their SHA. They’re not connected to any tag or branch. We can use git fsck to find these for us automatically:

$ git fsck --no-reflogs --unreachable Checking object directories: 100% (256/256), done. unreachable tree 1536f028d8051a63f7f39951f06b7180a96faff5 unreachable commit 7c232cceb94a2e7cdd95c526de785efe08da2325

Some Git commands will run git gc , the Git garbage collector, as part of their normal operation. However, if we run git gc ourselves, it doesn’t look like anything happens:

$ git gc Counting objects: 5, done. Delta compression using up to 8 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (5/5), done. Total 5 (delta 0), reused 0 (delta 0) $ git fsck --no-reflogs --unreachable Checking object directories: 100% (256/256), done. Checking objects: 100% (5/5), done. unreachable tree 1536f028d8051a63f7f39951f06b7180a96faff5 unreachable commit 7c232cceb94a2e7cdd95c526de785efe08da2325

What gives? Well, git gc has some settings that indicate how long it will let an unreachable commit lay around in your repository. The gc.pruneExpire configuration variable controls this behavior, and it defaults to 14 days. So even if we try to throw away a commit, and even if we manually run the garbage collector, we’ll still have the commit for two weeks.

We can, of course, tell git gc to ignore the setting:

$ git gc --prune=now Counting objects: 5, done. Delta compression using up to 8 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (5/5), done. Total 5 (delta 0), reused 5 (delta 0) $ git fsck --no-reflogs --unreachable Checking object directories: 100% (256/256), done. Checking objects: 100% (5/5), done. unreachable tree 1536f028d8051a63f7f39951f06b7180a96faff5 unreachable commit 7c232cceb94a2e7cdd95c526de785efe08da2325

Uhhhh what? It turns out that git gc won’t touch commits that are still in our reflog. So let’s clear that:

$ git reflog expire --expire=now --all $ git reflog $

Good. And now, let’s take out the garbage:

$ git gc --prune=now Counting objects: 3, done. Writing objects: 100% (3/3), done. Total 3 (delta 0), reused 3 (delta 0) $ git fsck --no-reflogs --unreachable Checking object directories: 100% (256/256), done. Checking objects: 100% (3/3), done.

Easy! When you fetch from a remote repository, Git does not include unreachable commits. That’s why when we cloned down libuv earlier, we didn’t get the orphaned commit.

Anyway, as you can see, it’s actually really hard to lose your data with Git, even if you explicitly try to throw it away. This is why you can rebase with… well, maybe not reckless abandon, but you don’t have to worry about your data. All it takes is a quick git reset --hard HEAD@{1} (read: reset me to the last entry in the reflog) and you’re back to where you were before that operation you screwed up.