As you may know from my previous posts I work on language-agnostic real-time messaging server called Centrifugo which is now built on top of library for Go language called Centrifuge.

Recently while testing how Centrifuge library behaves on Go 1.12 I found a massive regression in one performance benchmark — the one that measured message write performance. The regression was almost 2x times.

This is a write up on how I bisected a commit in Go language source code repo that caused such a massive regression. I believe that many developers can come across a similar problem with their code and projects so the flow described here can be useful for future explorers.

Charles Darwin — just a nice first picture for a post and a hint for an impatient reader.

Actually what that benchmark does is calling many write syscalls. Let’s write a simplified version of it which is not Centrifuge specific — the benchmark just writes data to /dev/null :

func BenchmarkWrite(b *testing.B) {

f, err := os.Create("/dev/null")

if err != nil {

panic(err)

}

defer f.Close() buf := make([]byte, 512) b.ResetTimer()

for i := 0; i < b.N; i++ {

f.Write(buf)

}

b.StopTimer()

}

We don’t use io.Discard here because we need syscall.

Let’s first install different Go versions. The one where benchmark behaves well:

go get golang.org/dl/go1.11

go1.11 download

And the one where benchmark behaves bad:

go get golang.org/dl/go1.12

go1.12 download

Here we used a cool feature of Go to download binary distributions of language at specific versions — and we now have go1.11 and go1.12 executables ready to use in our terminal.

Let’s run our simple benchmark using Go 1.11 first:

$ go1.11 test -test.run=XXX -bench=BenchmarkWrite -benchmem -memprofile mem.old -cpuprofile cpu.old | tee old.bench goos: darwin

goarch: amd64

pkg: github.com/centrifugal/centrifuge BenchmarkWrite-8 1000000 1167 ns/op 0 B/op 0 allocs/op PASS

ok github.com/centrifugal/centrifuge 1.201s

And then using Go 1.12:

$ go1.12 test -test.run=XXX -bench=BenchmarkWrite -benchmem -memprofile mem.new -cpuprofile cpu.new | tee new.bench goos: darwin

goarch: amd64

pkg: github.com/centrifugal/centrifuge BenchmarkWrite-8 1000000 2184 ns/op 0 B/op 0 allocs/op PASS

ok github.com/centrifugal/centrifuge 2.226s

Now with a help of benchcmp tool let’s look at performance difference:

$ benchcmp old.bench new.bench benchmark old ns/op new ns/op delta

BenchmarkWrite-8 1167 2184 +87.15% benchmark old allocs new allocs delta

BenchmarkWrite-8 0 0 +0.00% benchmark old bytes new bytes delta

BenchmarkWrite-8 0 0 +0.00%

So we really see almost 2 times performance degradation between two Go versions.

As you may noticed we saved memory and CPU profiles while running our benchmarks. As we don’t allocate at all here we will only look at CPU profile. Here is one for Go 1.11:

$ go tool pprof cpu.old

> web

Pretty straightforward right? Now let’s look at one for Go 1.12:

$ go tool pprof cpu.new

> web

Wow! Here we see that a lot of additional runtime staff happens in new version of language.

At this moment I realised that something is most probably wrong with Go 1.12 but before raising an issue in Go issue tracker I decided to find a commit where this regression was introduced.

What we want here is run our custom benchmark which is located inside our project against various Go compiler throughout Go repository commit history. We know that regression happened somewhere between Go 1.11 and Go 1.12.

Looking at number of different commits in Go git history:

$ git rev-list --no-walk go1.11..go1.12 | wc -l

1684

– we see 1684 commits! It’s pretty hard to find that commit manually. Also building a Go compiler toolchain from certain commit takes a significant amount of time so we need a more clever way than just sequentially running our bench with Go built from each commit in history.

Turns out that the way exists. You most probably know about bisecting technique. Maybe you already used this to find bug in you Git repository using git bisect command. Here we will use a tool called benchmany to find out where a regression had happened.

First let’s clone Go language repo:

As soon as repo cloned we can change directory to our project’s dir (where our target benchmark located). We will run all the following commands from our project’s directory (in my case this is $GOPATH/src/github.com/centrifugal/centrifuge — just use yours).

We have to create a folder for benchmany logs:

mkdir regression_search

Of course you can provide your own name for that dir. And then we can launch benchmany tool:

benchmany -d regression_search -order metric -benchflags '-test.run=XXX -test.bench=BenchmarkWrite' -buildcmd='/Users/ayuemelin/go.dev/bin/go test -c' -C /Users/ayuemelin/go.dev/ go1.11..go1.12

We point benchmany to regression_search directory we just created — it will be used for logs.

The -order argument takes three possible values: seq , spread and metric . The metric value says benchmany to select the commit half way between the pair of commits with the biggest difference in the metric. The metric is ns/op by default which is what we actually need. As said in benchmany docs:

This is like “git bisect”, but for performance.

We also set custom dir where Git operations will be run using -C option and point it to Go source code repo — in this case benchmany is smart enough to recognise that we run benchmark over Go source code repository and will automatically call make.bash script on every iteration. As a result on every bisect iteration we have Go compiler toolchain built from specific commit at /Users/ayuemelin/go.dev/bin/go (just adapt for your username).

The buildcmd option allows to define custom test command so we change it to use our custom Go compiler built from certain commit: /Users/ayuemelin/go.dev/bin/go test -c .

As soon as you run command above you should see sth like this in your terminal:

commit 5e9ad4a, iteration 1/5: building...

commit 5e9ad4a, iteration 1/5: running...

commit 5e9ad4a, iteration 2/5: running...

commit 5e9ad4a, iteration 3/5: running...

commit 5e9ad4a, iteration 4/5: running...

commit 5e9ad4a, iteration 5/5: running...

commit ef21689, iteration 1/5: building...

45/360 runs, 63 unstarted+0 partial+9 done+0 failed commits, ETA 1h28m36s

benchmany tool selects commit, builds Go toolchain using that commit, and then runs benchmark several times. Then selects next commit…

As this tool runs you can look inside regression_search folder and watch how benchmark information appears in bench.log file. You may also notice how it step by step approaches a place where regression happened.

In my case I had to stop benchmany at some point (after about 15 build iterations), take 2 commits which were much closer in time than original range go1.11..go1.12 (but still one commit had a bad performance result and another one had a good performance result) and run benchmany again with a smaller commit scope.

And finally I did one more benchmany run with -order seq option with commit range happened in several hours interval to sequentially go through them and find out the one that caused regression.

Of course the first time I did this it took me several hours of work because I had to do everything from scratch. Now I suppose I can repeat the same in much more reasonable time. And hopefully you will too.

So here is a link to commit that caused such a big regression:

And here is an issue it relates:

It turns out that a way Go used syscalls on Darwin before did not follow a Apple recommendations which state:

Apple does not support statically linked binaries on Mac OS X. A statically linked binary assumes binary compatibility at the kernel system call interface, and we do not make any guarantees on that front. Rather, we strive to ensure binary compatibility in each dynamically linked system library and framework.

Now Go is linking against libSystem.dylib which includes things like libc and libpthread for example and exists on all macOS and iOS boxes.

The sad news is that there is nothing to do actually with this regression. It’s unfortunate but at least Darwin is mostly used for development in Go these days (until some rare scenarios when your main target is Darwin).

The good news for me (and most of Gophers I suppose) is that this does not affect Linux which is the main target system for Centrifugo.

Takeaways: