Here at Adslot we use a whole bunch of great tools for development including Node.js, DynamoDB, AngularJS, Browserify and LevelDB, and to keep our development process running smoothly, Trello, HipChat, Jenkins, GitHub and specifically the GitHub Flow style of git workflow. A pretty standard NODABLTREHIJENGIF stack really.

So master is always deployable, everyone works on small (hopefully) feature branches and creates pull requests for them, which get code reviewed, automatically tested, and usually pummeled by our brilliant QA team before getting the green light to be merged (literally).

All good right? Weeeell, not always. Here’s an embarrassing snapshot from 2012 - and keep in mind this was the result of only 3 of us working on this codebase at that stage.

Branch Madness 2012

Gross

Yep. Doesn’t really leave you with that nice clean-code feeling.

Nor do things like this:

In some cases we ended up with far more commits than actual lines of code changed.

For many the commit log is sacrosanct - it’s there to tell a story of how the code got to where it was - and in the case of master I certainly don’t disagree. But do you really need to know when you’re reviewing my pull request that I committed a random work-in-progress, afraid the code bogeyman would attack when I stepped away from my desk? Do I really need to see you were having a bad day fixing you’re grammar?

Here’s what we really want our pull requests to look like:

What we really wanna do is rewrite history.

One point twenty-one jigawtf?

Just to be clear, I’m not talking about master . Don’t mess with the master . Unless you really have to. But feature branches aren’t really “done” until they’re merged, so cleaning them up is perfectly acceptable in my book.

Staying safe

Rewriting history in git involves a fair dose of rebase , amend and push -f with dashes of reset and cherry-pick - all of which can make the unseasoned dev a little nervous they’re gonna somehow mess things up irrecoverably.

Thankfully, with a little knowledge of git reflog (and regular pushes to GitHub), it’s pretty damn hard to mess things up so bad that you can’t undo them. Just think of reflog as a history of all the branch-changing commands you’ve performed locally. Messed something up in a rebase or merge? No probs, just reset to the state you were in before you even started it.

Which brings me to a broader point about git - if you’re struggling to understand it, it might help to think of every branch name (or reference like HEAD ) as just a pointer to a commit. You can shift these pointers around with commands like git reset , and that might update the current state of files on your system and where that branch name points to, but it doesn’t make the original commits suddenly go away - they still live in your .git directory and you can just update your branch to “point” back to them again:

collected, but that really shouldn’t happen for anything that’s still in your reflog .

# Switch to a branch we've been working on, "new-blog-post" (master) $ git checkout new-blog-post Switched to branch 'new-blog-post' Your branch is up-to-date with 'origin/new-blog-post'. # Show our most recent commit (note the commit hash) (new-blog-post) $ git log -1 commit a5bbf9188581ae0fce1f29eebedc518837c1f43f Author: Michael Hart <mhart@nodabltrehijengif.com> Date: Tue Mar 25 17:16:51 2014 +1100 New post: Rewriting Git History # Hard reset to master (!) # Which means change where "new-blog-post" (current branch), # and files point to - exactly the same commit as "master" (new-blog-post) $ git reset --hard master HEAD is now at 61603bb Site updated at 2011-12-20 03:05:27 UTC # We could have also used the commit hash, or part of it (new-blog-post) $ git reset --hard 61603bb HEAD is now at 61603bb Site updated at 2011-12-20 03:05:27 UTC # If we check what file differences now exist between # our current branch "new-blog-post", and "master", # it will be empty - we're in the same state as "master"! (new-blog-post) $ git diff master # Run reflog to see our command history (most recent first) # We're looking for the state just before we reset to master # (ie, the second line - same hash as in our initial log) (new-blog-post) $ git reflog 61603bb HEAD@{0}: reset: moving to master a5bbf91 HEAD@{1}: checkout: moving from master to new-blog-post 61603bb HEAD@{2}: pull: Fast-forward # ... # Now we can get back to our original state at any stage # Just reset to our original commit hash # and "new-blog-post" is back to where it started (new-blog-post) $ git reset --hard a5bbf91 HEAD is now at a5bbf91 New post: Rewriting Git History

Let me amend that

By far the easiest way to keep your commit logs clean is just to amend the last commit you made. So simple I’ll just dotpoint it:

Make whatever changes you want to your files, and save them Do a git commit -a --amend - this will stage any changes you made, and bring up your commit editor with the details of the last commit plus your new changes (add --no-edit to bypass the editor). Save the commit log and you’re done - you now have a clean commit (with a new hash). YOU HAVE REWRITTEN HISTORY… A LITTLE BIT. If you had already pushed your branch to a remote (like GitHub) with the commit before you amended it, then you’ll need to do a force push - more on that later.

You can git add the changes manually if you want, removing the -a flag - you can also change the author of a commit with the --author flag. And a handy tip to just update the time of a commit is to use --reset-author

I want to fix that thing, like, 3 commits ago

So a few options here, the first one I’ll walk through is probably the easiest conceptually:

Hard reset to the commit you want to amend Make changes, then amend the commit git cherry-pick commits back to where we started

So essentially we’re going back to fix a commit, then replaying all the commits after that one again (which will technically create new commits because one of their ancestors was modified)

You can do the cherry-pick step a few ways, but the easiest is to use a range:

git cherry-pick <commit-we-reset-to>..<commit-we-reset-from>

(new-blog-post) $ git log -1 commit a5bbf9188581ae0fce1f29eebedc518837c1f43f # We know we want to edit commit 10af827, go back in time (new-blog-post) $ git reset --hard 10af827 # Make our edits and then: (new-blog-post) $ git commit -a --amend --no-edit (new-blog-post) $ git cherry-pick 10af827..a5bbf91 # replayed commits output here...

(even though we specify the commit we reset to, it’s exclusive so it will be excluded from the cherry-pick)

Because creating a branch is such a lightweight operation in git (I like to think of it as “Save As” or ln -s ), you can also save a branch before you start so you don’t need to remember or find the commit you have to replay back to.

# `git branch` creates a new branch in the background, # so we stay in our current branch (new-blog-post) $ git branch pre-rewrite (new-blog-post) $ git reset --hard 10af827 (new-blog-post) $ git commit -a --amend --no-edit (new-blog-post) $ git cherry-pick 10af827..pre-rewrite # Can delete it now we're done (new-blog-post) $ git branch -D pre-rewrite

Remember you can always do a git diff <pre-rewrite-commit> as a bit of a sanity check to make sure you’ve only changed the things you wanted to - especially useful if you’ve had some conflicts along the way and you’re not sure if you resolved them correctly or not.

No, I really wanna mess with time

For more complex operations - like squashing, reordering and deleting commits - you’ll want to learn how to rebase . I like to think of rebasing as “resetting and replaying” - similar to what we did manually in the last section.

OK, ease me into it

Firstly, git pull --rebase is a useful way to ensure your logs don’t fill up with Merge branch 'master' into my-feature-branch commits as you’re getting the latest changes into your branch - it also eliminates the crazy merge trees we saw in the first diagram.

If you’re in a branch and you do a git pull --rebase origin master , here’s what is essentially happening:

Git fetches the changes from the master branch on origin (eg, GitHub) into a branch called origin/master on your local machine - the same thing happens when you do a git fetch Git hard resets your current branch to origin/master Git tries to replay all the commits you had made in your branch on top of this point - a bit as if you had just created a branch from the master on GitHub and added all your commits to it

Why is this useful? Well when we’re reviewing pull requests, the easiest way to reason about them is to say “ok, when this gets merged, these commits will appear on master”. By continually rebasing as other pull requests are getting merged into master, we ensure that the commits will cleanly apply, exactly as they appear in the pull request diff.

When fixing conflicts in the middle of a rebase operation, you just save them and then do a git rebase --continue (no explicit committing necessary). Again git diff is your friend here if you’re unsure what you just did.

Note that the branch called “origin/master” is technically a different branch to “master”. origin/<branchname> branches are updated whenever you git fetch (and git pull ) - and they’re not branches that you commit onto directly - which actually makes them perfect to get changes from.

I usually specify origin/master over master because my local master could really be in any state - I actually rarely use it - whereas I know that origin/master has all the changes from when I last did a git fetch or git pull .

I want more control

An interactive rebase, git rebase -i , will let you specify exactly how commits should be replayed - including being able to squash them together, reword them, remove them and even insert new ones. It’s really the ultimate tool for rewriting history.

So its invocation is git rebase -i <commit> where <commit> is the branch/commit you want to “replay” your commits onto. Git will ignore the commits you have in common with the destination and only try to replay the ones that don’t already exist.

So, an interactive version of git pull --rebase origin master would be something like git fetch && git rebase -i origin/master - ie, fetch from GitHub, and then rebase interactively onto that point.

Equally, doing git rebase -i HEAD~3 will rebase the last 3 commits.

It’s “interactive” because it will bring up your editor with the commits to be replayed and a handy guide to the available commands:

The time lord's command centre

So when you save this file, Git will execute each “command” (in orange) for each commit (in yellow). The green text is just the commit message and will be ignored (even if you tell it to reword a commit, changing the text here won’t have any effect, it will prompt you to change during the rebase - I’ve fallen for this soooo many times)

So by default you can see the commands are all pick , so saving as-is will have the same effect as a non-interactive rebase - it will just replay all the commits.

reword and edit are great for modifying the commits along the way and squash and fixup will join commits together.

fixup (or you can just use f ) is one of the commands I use the most - I prefer it over squash because I usually don’t need to keep the commit message of the commit I’m squashing - they’re usually “fix” commits of some sort and the messages are mostly unnecessary noise.

So let’s say in this case I want to:

Reword the first commit and apply it after the others Squash the last commit with the previous one Add a completely new commit, 234ab43 , from another branch

Mixing it up

So I changed the command from pick to r (for reword ) and moved the whole line down two, I changed the other command to f (for fixup ) so that it will be squashed in to the commit above it (and the commit message will be ditched) and I added a new commit at the end there with p for pick . I left the 2e2b92e commit as is, so it’s still pick .

So when we save this and rebase does its thing we’ll end up with three new commits - the first two in the list here squashed together with the first commit’s message, we’ll have been able to reword the other commit’s message in our editor, and the final commit now appears on our branch.

There’s also an --autosquash flag that you can pass which will look for commit messages that start with fixup! (and others) so instead of pick it will have fixup prefilled for that commit.

One time you may find yourself needing to delete commits in an interactive rebase (ie, remove the line altogether), is if you’ve branched from another branch and a commit on that branch has been modified - if you then rebase onto that branch, it will appear as though you have added the commit in your branch - because Git sees a “different” commit in your branch. Usually you’ll just want to delete it, because it exists in a modified form in the branch you’re rebasing onto.

The upshot

So this is extremely powerful - and it really encourages us to keep our commits separated into functionally similar chunks of code. Say, one commit for refactoring one DB table and its queries, another commit for another DB table and its queries. If ever we need to fix something up, perhaps as a result of testing, or code review - we add a commit for one table fixup, and another commit for the other table fixup.

Once those fixes are approved, we can interactively rebase them to squash them in with the relevant commits they “belong” to. This way we reduce the number of commits we have on our branches (especially irrelevant commits), which makes it easier to rebase against, or git bisect later if we’re trying to track down bugs.

It also means that we could potentially pull each commit out into its own pull request if we’re starting to get too many lines/files to review or test in just one.

By grouping our commits together by functionality, it really makes working collaboratively a lot easier.

None of this works for me

If you’re still pulling your hair out - and this may happen especially if you’re in the unfortunate situation of trying to cherry-pick or rebase with merge commits, then there’s another way.

Let’s say your code’s in exactly the state that you want it - it’s perfect. It’s just your git log that looks all messed up - fixes here and there, merges all over the place.

This tip also applies if you want to “split” a commit - something you can’t do (AFAIK) from git rebase .

Basically, if you do a non-hard reset (ie, not passing --hard ), git will keep your files in the same state, but it will just move where the branch points to - so you can then commit all your changes as granular as you like.

It’s a bit like just copying all your updated files over onto a freshly created branch and then committing from there.

Make sure you’re in your feature branch git fetch && git reset origin/master git commit however you like

(note the lack of --hard in the git reset )

A GUI tool is especially useful if you want to commit different lines of files in separate commits (called hunks ) - which you may want to do to try and keep the commits separated by functionality.

GitHub for Mac committing hunks at a time

Forced pushes - not so scary

When it comes time to push our changes, if the remote branch we’re pushing to has commits that aren’t on our branch, then a normal git push won’t succeed, because git is safe by default and won’t overwrite them.

By amending or replaying a commit, we’re really replacing it with a new one, and also changing where our branch currently points to - but the commits on GitHub will still be unchanged, and they no longer exist on our branch.

So we have to force push ( git push -f ) to overwrite those commits. Now, BEFORE you run that little command, just check a couple of things:

Are you on master ? If so, don’t do it, don’t mess with the master . Has anyone else pushed commits onto that branch? If so, do a git fetch or similar to get them locally and resolve any issues. Are you using Git < 2.0? If so, run git config --global push.default - if it says simple you’re fine - if not, update it! This is the default option starting Git 2.0

Because force pushing will overwrite the remote branch, people tend to steer clear of it for fear of deleting important commits. However, as long as you have (or someone on your team has) local access to the commit that got overwritten (ie, it’s lying around in your history somewhere), then you can just hard reset your branch to that and force push again.

Argh, I don’t have that commit locally!

(new-blog-post) $ git reset --hard 61603bb fatal: ambiguous argument '61603bb': unknown revision or path not in the working tree.

Fair enough, a little scary - but if you’re using GitHub, then know this: GitHub stores every single commit that you or anyone else pushes - it has that old commit.

Here’s something that actually happens every now and then at Adslot:

git push -f%$^!

So that’s a HipChat transcript of good ‘ol Dingo Jones doing a git push -f - he meant to only push the branch he was working on, but his push.default config was accidentally set to matching so he happened to push all the other branches he had checked out, including master. Doh.

But! No biggie. See the link to the commit that master used to be at, a539e0 ? Even if I don’t have that commit locally, I can:

Follow that link to the commit on GitHub Create a new branch from that commit on GitHub Fetch that branch locally Reset master to that branch/commit and force push

Browse code -> tree dropdown

You're welcome Dingo Jones, you're welcome

Wait, why are we doing this again?

That was a lot of ground to cover.

So let me get back to the original premise - why are we rewriting history again? What do we get?

Basically it boils down to commits in our pull requests being:

easier to understand

easier to cherry-pick/rebase

easier to split up pull-requests

easier to git bisect and find bugs

and find bugs it just looks nice and feels good

Once you’ve mastered git time travel, you won’t look back.

Well, you will, hmmm, probably constantly actually.