János Kubisch Full-Stack Engineer at RisingStack

In this post, I'd like to highlight some git features that might be less used/known, but can end up saving your @$$ when things go south in the codebase. Fortunately, it is really hard to irrevocably mess something up with git, as long as you have the .git hidden folder in your project intact!

Let’s discuss...

amending,

reverting multiple commits (or even merges),

and proper housekeeping.

Git Amend & Force-Push Shenanigans

Sometimes you end up in a situation when you need to amend a commit to add further changes to it, either because you forgot some important things, or due to company git policies.

$ git commit --amend --no-edit

Amending is usually simple, but if you aren’t careful, it can be a source of some headache, as this operation alters the git history.

If you amend a commit that has already been pushed to the remote, you may only use force-push – git push -f – to submit your changes. This way you can potentially edit other peoples' work, and even push directly to a branch, skipping pull request and code review.

Let me share my horror-story about amending:

At one evening, I was sitting in the office completely alone, as all of my colleagues left for home already. We did some code reviews before, where I changed branches back-and-forth and kind of lost track of where I have ended up. I un-stashed my changes with git stash apply to continue with an important fix I was working on for the next day. I had a commit already on my fix branch pushed to the remote, and we had a policy to only submit one commit to pull-request, so I was looking at a history rewrite already. As it was pretty late and I was eager to head home, I chose to just amend the commit to what I assumed was my other commit and force push it to origin. But it turned out I was on our development branch, so I amended the last successfully merged commit on it. I sat there scratching my head for a while.

How can one fix such a mistake, and have the merge commit restored?

It turns out it's not that complicated as it may sound at first. First of all, we should forget about git log - it does not contain enough information for us to go with, and check the git reflog instead. It will tell us what happened exactly in our local repo:

The reflog contains way more useful information than git log. A new entry is created whenever a branch update happens, including branch changes, merges, resets, and commits, even tagging amends and using the stash. We can thus pick where to go back easily:

$ ~/ git reset --soft d1b3026

d1b3026 is the short hash reference to the state before the amend happened. I chose soft reset, to preserve the changes I made, as I would like to commit them later properly to my fix branch.

Instead of the hash, I could have also used the head position ( [email protected]{1} ) and the result would have been the same. Soft reset allows me to keep the changes staged for commit, a minor convenience compared to the default mixed mode, which retains changes as un-staged, so I'd have to git add <file names> them again. Hard reset would have done me no good of course, as that discards all the changes.

Now we can just git stash the changes, force-push the restored history to the remote, switch to the fix branch to un-stash and commit the changes.

Reverting multiple git commits, including merges

It’s story time again!

One of my colleagues discovered that there was an issue with a rather important feature on his project the other day. He quickly ruled out trivial things like typos and such, and as time was tight, extensive digging in the code was not really an option. The last commit known to be clean was created quite some time ago, and everything has been pushed to the remote already. This meant that the other developers probably already had the faulty piece of code checked out. Fortunately, most of our work was separate from the other teams, but we wanted to make sure we resolve the situation as painless as possible. We took a look at our options.

Using git reset [email protected]{34} could take care of the problem for us - it points the HEAD to the commit we specified and discards or keeps the changes since then as desired, but it would also alter the git history by actually removing the commits.

History changes would then result in a mismatch with the remote, meaning we could only use force-push when submitting. Force pushing to the working branch is rather rude, and such an alteration would probably have caused a bunch of conflicts and some confusion for the other team.

We settled on reverting the suspected commits instead.

When reverting, git creates a new commit undoing the original commit's changes, and then we can add it to the history. This keeps the normal flow and usually results in fewer conflicts when another contributor pulls the code later.

Reverting is pretty straightforward in itself. We had to choose whether to create separate revert commits for each commit by simply calling git revert <hash> , or keep the changes they introduced as un-staged by adding the --no-commit option and commit them later. Should you go with the latter, be aware that you will probably have to resolve some conflicts along the way, as git can have difficulties merging the reverse changes! We tried --no-commit first, but it turned out to be quite ugly, so after a git revert --abort , I opted for the separate commit for each reverse option.

Dealing with merge commits

We soon hit another issue - there were merge commits nestled in-between 'regular' commits. The problem with these, as it turned out, is that git doesn't know which branch to follow backwards.

$ git revert 1586b43..4767fcd error: Commit 32f2e08 is a merge but no -m option was given. fatal: revert failed

To deal with this, we need to call git revert -m 1 32f2e08 , where -m 1 specifies the parent number of the branch to take, but if you try to use this with a range of commits, git will assume that the first commit in the range is the merge commit.

Parent number 1 belongs to the branch into which the other one - with parent number 2 - has been merged. This wasn't optimal for us, since in our case, the merge commits were scattered across the branch. You also need to be aware that the side branch cannot be re-merged after reverting its merge commit. This applies to all of the commits of the merged branch. If you later decide you will need those changes, after all, you could revert the previous revert commit, or just move the changes to a different branch and re-commit them. Generally, you should avoid the need for such an operation if possible.

This doesn't sound very pleasant now, does it? What could be the least painful way to solve our issue then?

Unfortunately, there is no interactive revert in git yet, where one could specify the whole range to be reverted and have it prompt for merge revert decisions. This way, we decided to go with calling revert with ranges of commits up to before a merge commit, revert the merge commit separately, and then reverted the next range until the next merge, then repeat, something like this:

# First range of commits to revert $ git revert 1586b43..e33f9a0 # A merge commit $ git revert 32f2e08 -m 1 # Next commit range $ git revert 04e4703..4767fcd # Next revert commit $ git revert 58a1c10 -m 1

This turned out to be the easiest way to get the job done, while keeping the git history and hopefully the sanity of the other development team intact.

Git Built In Housekeeping

Git keeps all its references organized in a simple key-value database, called the object database. Whenever a branch update happens, git stores the changes in the object db. For example file revisions, using the hash of the changes as the key. These change objects can get relatively large in active repositories as the time goes on, but usually, older references are rarely used.

Something that is quite useful for dealing with an expanding object database is git's built-in housekeeping tool, the git gc command. Running it regularly will enable you to save some disk space, by compressing these objects - like the mentioned file revisions as well, which usually take up most of the disk space, and also commits and trees. If you call git gc --prune , you can remove orphaned objects, but they are removed permanently, and cannot be restored later. You can check what could potentially be removed when garbage collection is run using git fsck --unreachable beforehand to make sure you're okay with the results.

It doesn't matter if you have just a few repos cloned locally, or a lot, large ones or small, it can save you a surprising amount of space if you've been working on them for some time. I'd encourage you to run it on your local repositories on a regular basis, and occasionally supply the --aggressive option to initiate a slower but more thorough cleanup.

More tips to git good

If you work with git on a daily basis and haven't come across it yet, I recommend checking out Scott Chacon's presentation, available on youtube. It covers the inner workings of git in detail, and is definitely worth your time.