At Smarkets we use GitLab to manage our codebase. As our team doubled in size several times, our merge-request workflow started to suffer, so we wrote a bot to make it scale. In this post, we will explain what it does, and why you may also want to use it inside your team. Check out the project page to learn more.

Meet Marge-bot

On a normal day, my workflow looks something like this. Code some feature on a new branch until it looks ready; push it; create a new merge-request and assign it to a colleague for review. Some back-and-forth later, the change is good to be merged into master, so we assign the merge request to the user @marge-bot , and go work on something else.

Rebasing a branch is boring. but marge-bot is always happy to do it for you, as many times as needed.

At some point, Marge-bot, our merge-bot, will process the merge-request. That means:

She’ll checkout the branch, rebase it onto the current state of master, and push it back (with — force ), so the merge request will now have an up-to-date version of the code. Because we’ve pushed a new version of the branch, GitLab will launch a new CI job to verify that the new, rebased version of the branch is still fine. In our setup, this usually takes around 10 minutes, and Marge-bot will patiently wait until it finishes. Once the build succeeds, she will attempt to merge the request. Now, here comes the crucial part: if someone else merged their branch while our CI job was running, GitLab will refuse to merge ours (for a good reason, otherwise, who knows if CI will still be green after merging!), so Marge-bot will repeat the process: rebase, push, wait for CI to pass and attempt to merge. The merge will eventually succeed or, at some point, something will fail (e.g. rebase conflicts or a test that breaks). In the latter case, Marge-bot will leave a message and assign the request back to me. Either way, I will get a notification from GitLab so I never lose track of what happens.

There are a couple more things that we let Marge do while merging our commits and we’ll discuss these below, but her main job is to implement the logic above. And what we get from this is a nice, simple and well-integrated workflow, that produces a tidy, linear history and, above all, a master branch that is always green (NB. our main development branch is called master and we’ll call it like that throughout this post; of course, the actual name of this branch is irrelevant).

Not rocket science

At this point, you might be thinking: ‘Well, this bot doesn’t seem to do anything too sophisticated’. And you’d be right; as Graydon Hoare (of Rust fame) puts it:

To automatically maintain a repository of code that always passes all the tests is not rocket science.

Which makes it so frustrating when it is 2017 and you can’t easily find an implementation of this simple rule that just works, out of the box.

You can almost get there using GitLab Enterprise Edition. First, you configure your project so that merge-requests can be accepted only if CI passes. Then you set it to use either “Fast-forward” or “Semi-linear history” as merge method; so that your request can be merged only if a fast-forward merge is possible. These two options, together, will ensure by construction that if a change is merged to master, all tests must have passed on exactly its head commit.

These settings ensure a green master branch on a GitLab project… if you manage to get your stuff merged!

With this setup, you don’t even need to rebase the changes after the review: GitLab can rebase them for you and, after the CI job is started, it will even offer to merge the request once the tests pass. Except that if someone else was in the process of merging their changes, your merge will fail… silently… 10 minutes later (or however long your CI pipeline takes). So you have to click on “Rebase” and “Merge when pipeline succeeds” again and hope this time around it does pass. Which, if your team is large enough, it likely won’t. This is boring, repetitive, and over time just maddening.

GitLab can wait for the pipeline to pass, but will just silently fail and give up if someone merges in the meantime.

Basically, the above configuration works, but only if your CI pipeline is extremely fast or the chance of two persons merging requests at around the same time is extremely low.

Instead, by using Marge-bot, we have all our accepted merge-requests effectively going through a queue (rather a priority queue, with the oldest always on top) and processed one by one. This is simply what all the GitLabs, GitHubs and BitBuckets of the world ought to implement to enable automatically enforced green branches, for everyone. Seriously, it is not rocket science. It is 2017; it should be ubiquitous already!

Aside: is your CI pipeline up to the task?

All that said, an organisation’s ability to enforce a branch that is always green depends crucially on the time their CI pipeline takes to run. A 10 minute pipeline means that you can expect six jobs to be merged per hour, for an overall capacity of 144 merge requests per day. A two-hour pipeline, on the other hand, yields a capacity of only 12 merge requests per day. That means, say, around a hundred developers could work on a shared project in the former, while only a few in the latter.

A smarter system that speculatively runs CI jobs in the queue under the assumption all previous merge requests will succeed could improve the throughput if there are on average few failed CI runs. But you have to be realistic: with a 2 hour long CI pipeline there is no way to enforce a repository that is always free of broken tests while having a team of medium-large size.

Now, turn the problem around. You have a team of 300 developers working on a shared repository. At this point, the cost of having a broken main trunk is quite high. But you should expect something in the order of 500 merges per day, and to allow for this capacity, each CI run should take, on average… 20 seconds? Now you have an interesting problem to solve: caching of builds, very fine-grained dependencies between tests cases and code, caching of test results, lots of parallelism. This is no longer not-rocket-science territory.

The moral here is: ensure your CI runtime is appropriate for your team size! And also: take into account the impact on your CI pipeline if you are planning to migrate to a mono-repository model.

Other goodies

There are a few other things that Marge can also do for you while merging, if you tell her to:

Add a Tested-by: Marge Bot <link to merge request> tag to the top-commit message of the branch to be merged. We keep a linear history and this helps us easily identify which commits were merged-in together. Moreover, it makes using git-bisect more convenient, since you can just skip those commits that weren’t run through CI. E.g., to find the first tested commit where some property stopped holding you can do:

$ git bisect run /bin/sh -c 'if !(git log -1 --format %B | fgrep -q Tested-by:); then exit 125; else ./check-property; fi'

GitLab allows to set a required number of explicit approvals for a merge request to be accepted. Moreover, you can set it to reset approvals if someone pushes new changes to the branch. If you use this and Marge-bot rebases the branch, then GitLab will obviously expect re-approval of these changes. If her user is given admin rights, then she can be instructed to re-approve the request on behalf of the original approvers, so that the request can be accepted.

In addition to that, she can optionally tag the commits with a Reviewed-by: A. P. Prover <their@email> entry per approver. As we handle financial data and assets from our users, we are subject to strict auditing requirements and need to keep a permanent record of our code vetting process. But besides that, it is rather useful, to easily find out people that may know about certain portion of the code, beside the author, just by looking at the relevant commit.

entry per approver. As we handle financial data and assets from our users, we are subject to strict auditing requirements and need to keep a permanent record of our code vetting process. But besides that, it is rather useful, to easily find out people that may know about certain portion of the code, beside the author, just by looking at the relevant commit. Enforce merge-embargoes. It’s common for teams to have policies like: “no unshipped commits during the weekend” since this makes it easier to diagnose and fix sudden issues. Of course, it is easy to forget that it is Friday afternoon and you shouldn’t merge the request under review. Marge-bot will not forget and merge it on Monday morning instead.

Try it out

If your team is using GitLab, give Marge-bot a try. You only need to create a new user for the bot, and add it as a Developer to your project. Then just launch it providing a private token for its user and a suitable ssh private key, and pointing it to your GitLab instance. If you are using Docker, this is as simple as:

We are using it on the Enterprise Edition of GitLab, but in principle it should run on the Community Edition as well. Any problems or ideas, just open an issue, or even better, leave us a pull request.