Get blog posts to your inbox. Facebook Twitter Linkedin Link Email

When we shared a post a few weeks back about why we chose Go for CockroachDB, we received a number of questions about how we deal with some of Go’s known issues, specifically those related to performance, garbage collection, and deadlocks.

In this post, we’ll share a few powerful optimizations that mitigate many of the performance problems common to Go’s garbage collection (we will cover “fun with deadlocks” in a follow-up). In particular, we’ll share how embedding structs, using sync.Pool, and reusing backing arrays can minimize memory allocations and reduce garbage collection overhead.

Minimizing memory allocation & optimizing garbage collection

Something that sets Go apart from, say, Java, is that Go gives you the ability to manage your memory layout. With Go, you can combine things that would be separate allocations in other garbage collected languages.

Take a look at the snippet below, which is a bit of code from CockroachDB that reads data from disk and decodes it:

metaKey := mvccEncodeMetaKey(key) var meta MVCCMetadata if err := db.GetProto(metaKey, &meta); err != nil { // Handle err } ... valueKey := makeEncodeValueKey(meta) var value MVCCValue if err := db.GetProto(valueKey, &value); err != nil { // Handle err }

In order to read the data, we’ve performed 4 allocations: the MVCCMetadata structure, the MVCCValue structure, and two keys. Go gives us the ability to reduce this to a single allocation by bundling the structures together and preallocating space for the keys.

type getBuffer struct { meta MVCCMetadata value MVCCValue key [1024]byte } var buf getBuffer metaKey := mvccEncodeKey(buf.key[:0], key) if err := db.GetProto(metaKey, &buf.meta); err != nil { // Handle err } ... valueKey := makeEncodeValueKey(buf.key[:0], meta) if err := db.GetProto(valueKey, &buf.value); err != nil { // Handle err }

Here we declare a type getBuffer , which includes two different structs inside it: MVCCMetadata and MVCCValue (both protobuf objects). The third member is an array, which you don’t see in Go as often as you see slices.

When you have an array of a fixed size (1024 bytes), it can be done directly in the struct without requiring an extra allocation. This allows us to embed three objects in the getBuffer structs, reducing our allocations from four to one. Note we reuse the array for both keys which is fine in this usage as the keys are not used simultaneously. We’ll return to the array later.

sync.Pool:

var getBufferPool = sync.Pool{ New: func () interface{} { return &getBuffer{} }, }

Truth be told, it took us a while to figure out what sync.Pool was actually for and why we would want to use it. It’s a free list that reuses allocations between garbage collection cycles, so that you don’t have to allocate another object that’s going to have to be collected by the garbage collector later. Each time a garbage collection cycle starts, it clears items out of the pool.

An example of how to use sync.Pool:

buf := getBufferPool.Get().(*getBuffer) defer getBufferPool.Put(buf) key := append(buf.key[0:0], …)

First you declare a global sync.Pool object with a factory function, which in this case allocates a getBuffer struct and returns it. Instead of making a new getBuffer, we can get one from the pool. Pool.Get returns an empty interface, which we then type assert to the correct pointer type. When we’re done with it, we put it back in the pool. The end result is that we don’t have to do even the one allocation to get the Buffer struct.

Arrays & Slices

One thing worth noting is that arrays and slices are distinct types in Go, and that nearly everything deals in slices rather than arrays. You can get a slice from an array just by using the square bracket syntax [:0]

key := append (buf.key[0:0], …)

This creates a zero-length slice backed by an array. The fact that this slice already has a backing storage behind it means that any appends will actually go into that array instead of creating a new allocation. So when we are decoding a key, we can append it to a slice created out of this buffer. As long as that key is less than 1 KB, we don’t have to allocate anything. It just reuses the array that we already allocated.

In the case of a key over 1 KB, which is possible but less common, it transparently allocates a new backing array, and our code doesn’t have to be aware of it.

Gogoprotobuf vs Google protobuf

And finally, we use protocol buffers to store everything on disk. However, instead of using Google’s own protobuf library, we use a fork of it called gogoprotobuf which we highly recommend.

Gogoprotobuf follows many of the principles we outlined above to avoid unnecessary allocations. In particular, it allows marshalling into a byte slice which can be backed by an array to avoid allocations. Further, the non-nullable annotation allows embedding a message without an allocation, which is useful when the embedded message is known to always be present.

A final bit of optimization with gogoprotobuf is to use the generated marshalling and unmarshalling routines, which provide a nice performance boost over the reflection-based marshalling and unmarshalling that are present in the standard Google protobuf library.

Wrapping Up

By combining the above techniques, we have been able to minimize the performance overhead of Go’s garbage collection and optimize for better performance. As we approach beta and focus more heavily on memory profiling, we’ll share our results in a follow-up post. And of course if you’ve learned other performance optimizations for Go, we’re all ears.

Illustration by Mei-Li Nieuwland