As my twitter followers probably know, the last couple days I've been having fun learning Google's Go language. My goal is to build something in the FlockDB universe, but for now I'm just getting familiar with things. For those who aren't familiar with it, Go is Google's attempt to build a modern system programming language. A modern day C, if you will.

Go has some pretty neat features. And it has some annoying limitations. I'll blog about those when I get more familiar with it. One thing that I can definitively say is that it is both young and there appears to be a strong will to keep it slim and focused. The result is a comparatively lacking framework (or set of built-in packages as they are called). For example, there's no built-in Set .

Please keep in mind that I don't really know what I'm doing.

My goal tonight was to benchmark various Set implementation, in the hope that I'd not only find a good implementation but also familiarize myself with Go. I want to be very clear: I have specific requirements for my Set. I'm therefore focusing on testing against my use-case. What is my use-case? Basically I want to populate a collection with integers and explicitly remove all duplicates on demand. In other words, I only care about unique values when I care about unique values - the rest of the time, duplicates are ok. Ordering isn't important. Here's an example of how I might use it:

func Load(users []User) ([]int) { ids := new(MySpecialCollection) for user := range users { for id := range.user.friendIds { ids.Add(id) } } ids.RemoveDuplicates() return ids; }

So, I want to get all the unique ids of all the user's friends. Ids are removed once at the end and then the "set" is returned.

We'll begin by defining an interface:

type Set interface { Add(value int) Contains(value int) (bool) Length() (int) RemoveDuplicates() }

We only really care about Add and RemoveDuplicates . However, the Contains and Length methods will help us test our implementation (kinda pointless to see which runs faster if we don't also test that they actually work!)

The first implementation uses a map (Go's built-in hashtable type). This is our baseline because it's the simplest most obvious solution. Here's the implementation:

type HashSet struct { data map[int]bool } func (this *HashSet) Add(value int) { this.data[value] = true } func (this *HashSet) Contains(value int) (exists bool) { _, exists = this.data[value] return } func (this *HashSet) Length() (int) { return len(this.data) } func (this *HashSet) RemoveDuplicates() {} func NewSet() (Set) { return &HashSet{make(map[int] bool)} }

If this is your first time looking at Go, you might be a little lost. First we define our type called HashSet and define a private member data (it's private because the name starts with a lowercase letter) which is a map. One really cool thing about Go is that you don't have to explicitly implement an interface. Simply implement its methods and it can safely be cast to it (it's awesome!) Methods are defined as functions with a receiver (this *HashSet) . All of the implementations are pretty straighforward. RemoveDuplicates does nothing because duplicates are never added (that's just a basic property of a hash/map).

Two things about the above code worth exploring. First, the Contains method is somewhat funky. In Go, methods can return multiple values. Getting a value from a map ( this.data[value] ) returns two values. The actual value at the key, and true/false on whether the value existed. We discard the actual value by assigning it to the magic variable _ and assign whether the value existed or not to the exists variable - which is actually a named returned parameter. The other thing is the NewSet function. Functions like this is how you do a constructor. I think it kinda sucks. Within our "constructor" we actually initialize our map using the special make . make is something else which I think sucks about Go. Three built-in types need to be instantiated with make , all the other types are instantiated with new (or the &TYPE{} syntax we are using here, which is syntactical sugar for new and is similar to C#'s constructor initializers).

We'll create two tests to make sure our implementation works:

import( "testing" "rand" ) func TestCanAddUniqueValues(t *testing.T) { values := []int{5,4,3,9,1,2,8} set := NewSet() for _,v := range values { set.Add(v) } set.RemoveDuplicates() Assert(t, set.Length() == len(values), "Wrong length") for _,v := range values { Assert(t, set.Contains(v), "Missing value") } } func TestUniqueRemovesDuplicates(t *testing.T) { values := []int{5,4,3,5,2,1,2,3,4,5,1,2,2} uniques := []int{5,4,3,2,1} set := NewSet() for _,v := range values { set.Add(v) } set.RemoveDuplicates() Assert(t, set.Length() == len(uniques), "Wrong length") for _,v := range uniques { Assert(t, set.Contains(v), "Missing value") } } func Assert(t *testing.T, condition bool, message string) { if (!condition) { t.Fatal(message) } }

Testing in Go isn't particularly pretty. But, at least we'll be able to make sure that whenever RemoveDuplicates is called on our set, that duplicates are, in fact, removed.

Now comes the interesting part. Go has built-in benchmarking capabilities, and it's pretty neat. Rather than define a Test function, we define a Benchmark method, like so:

func BenchmarkSmallSetWithFewCollisions(b *testing.B){ for i := 0; i < b.N; i++ { set := NewSet() for j := 0; j < 100; j++ { set.Add(rand.Int() %700) } } } }

If you look at the first for loop, you'll notice that we are looping b.N times. b is the parameter which is passed into our function. The way it works is that the engine will loop through our code as many times as it needs to until it feels comfortable that it has an accurate measurement for the cost of a single iteration. This'll become more clear briefly.

My goal was to test the performance of a small, medium and large set (100, 5000 and 100000 items) and for each of those, with a small, medium and large number of duplicates. The above code will generate a set of 100 values, where a given value can be between 0 and 699 - so duplicates are pretty rare. Here's what all 9 benchmarks look like:

func benchmark(b *testing.B, size int, fill int) { for i := 0; i < b.N; i++ { set := NewSet() for j := 0; j < size; j++ { set.Add(rand.Int() % fill) } } } func BenchmarkSmallSetWithFewCollisions(b *testing.B){ benchmark(b, 100, 700) } func BenchmarkSmallSetWithMoreCollisions(b *testing.B){ benchmark(b, 100, 100) } func BenchmarkSmallSetWithManyCollisions(b *testing.B){ benchmark(b, 100, 25) } func BenchmarkMediumSetWithFewCollisions(b *testing.B){ benchmark(b, 5000, 35000) } func BenchmarkMediumSetWithMoreCollisions(b *testing.B){ benchmark(b, 5000, 5000) } func BenchmarkMediumSetWithManyCollisions(b *testing.B){ benchmark(b, 5000, 1250) } func BenchmarkLargeSetWithFewCollisions(b *testing.B){ benchmark(b, 100000, 700000) } func BenchmarkLargeSetWithMoreCollisions(b *testing.B){ benchmark(b, 100000, 100000) } func BenchmarkLargeSetWithManyCollisions(b *testing.B){ benchmark(b, 100000, 25000) }

And, here's the output from running this against our simple HashSet implementation:

set.BenchmarkSmallSetWithFewCollisions 50000 52600 ns/op set.BenchmarkSmallSetWithMoreCollisions 50000 42991 ns/op set.BenchmarkSmallSetWithManyCollisions 50000 31406 ns/op set.BenchmarkMediumSetWithFewCollisions 1000 2663531 ns/op set.BenchmarkMediumSetWithMoreCollisions 1000 1703337 ns/op set.BenchmarkMediumSetWithManyCollisions 1000 1411807 ns/op set.BenchmarkLargeSetWithFewCollisions 50 54807860 ns/op set.BenchmarkLargeSetWithMoreCollisions 50 48436380 ns/op set.BenchmarkLargeSetWithManyCollisions 50 34149020 ns/op

The first column is the benchmark that was run. The 2nd column is how many iterations it ran (or the value of b.N ). The last column is the time it took (in nanoseconds) per loop. From the above, we can tell that the more collisions there are, the faster the code (I'd guess that's because it has to insert less values into the map).

For my second implementation, I decided to use an array and scan the array for a duplicate before adding it. The interesting method here is Add and Contains :

func (this *RealtimeArraySet) Add(value int) { if this.Contains(value) == false { this.data = append(this.data, value) } } func (this *RealtimeArraySet) Contains(value int) (exists bool) { for _, v := range this.data { if v == value { return true } } return false }

And, here are the results:

set.BenchmarkSmallSetWithFewCollisions 100000 22390 ns/op set.BenchmarkSmallSetWithMoreCollisions 100000 19421 ns/op set.BenchmarkSmallSetWithManyCollisions 100000 17037 ns/op set.BenchmarkMediumSetWithFewCollisions 100 19248020 ns/op set.BenchmarkMediumSetWithMoreCollisions 100 12210990 ns/op set.BenchmarkMediumSetWithManyCollisions 500 5158366 ns/op set.BenchmarkLargeSetWithFewCollisions 1 7482466000 ns/op set.BenchmarkLargeSetWithMoreCollisions 1 4660109000 ns/op set.BenchmarkLargeSetWithManyCollisions 1 1785185000 ns/op

For small sets, this is much faster. For large sets, it's much slower. This isn't really a surprise. Doing a linear search in the array just can't scale with large sets.

The last implementation I did was to allow duplicates to get added to the array, but to remove them when explicitly told to:

func (this *OnDemandArraySet) Add(value int) { //the append method will automatically grow our array if needed this.data = append(this.data, value) } func (this *OnDemandArraySet) RemoveDuplicates() { length := len(this.data) - 1 for i := 0; i < length; i++ { for j := i + 1; j <= length; j++ { if (this.data[i] == this.data[j]) { this.data[j] = this.data[length] this.data = this.data[0:length] length-- j-- } } } }

We remove duplicates by comparing every value to every other value. This leans heavily on Go slices. You can think of a Go slice as a movable window above an array. Slices are really efficient. When we find a duplicate, we replace it with the last element and shrink by 1. So, if we have: [1,2,3,2,5,6] it'll become [1,2,3,6,5] (the 6 overwrites the duplicate 2 and the slice is shrank). Again, all of this is efficient - we aren't creating copies of array, just playing with pointers. The results:

set.BenchmarkSmallSetWithFewCollisions 200000 12362 ns/op set.BenchmarkSmallSetWithMoreCollisions 200000 12425 ns/op set.BenchmarkSmallSetWithManyCollisions 200000 12430 ns/op set.BenchmarkMediumSetWithFewCollisions 5000 682792 ns/op set.BenchmarkMediumSetWithMoreCollisions 2000 720831 ns/op set.BenchmarkMediumSetWithManyCollisions 2000 706491 ns/op set.BenchmarkLargeSetWithFewCollisions 100 12645380 ns/op set.BenchmarkLargeSetWithMoreCollisions 100 12682960 ns/op set.BenchmarkLargeSetWithManyCollisions 100 12863000 ns/op

Under all circumstances, this is a clear winner. In the simplest case it's 4.25x faster than the map and 1.8x faster than the original array implementation. In the most complex case it's 2.6x faster than the map and 138x faster than the original array implementation. (I did play with pre-allocating the underlying structures with different sizes, it didn't change anything.)

Beyond raw speed, we can also measure how much memory each approach takes. For this, I simply enabled Go's memoryprofile and looped through each set a fixed number of times. To be honest, I'm not positive what all the values mean, but I extracted the three which seemed the most relevant:

HashSet Alloc = 1227039448 TotalAlloc = 2249302288 HeapInuse = 1241645056 RealTimeArraySet Alloc = 1033523080 TotalAlloc = 1033574400 HeapInuse = 1047220224 OnDemandArraySet Alloc = 3631414928 TotalAlloc = 3631466248 HeapInuse = 3659673600

No huge surprise here. Since the fastest implementation doesn't discard duplicates until we tell it to, it takes the most amount of memory. The RealTimeArraySet takes the least amount of memory because duplicates are discarded, and there's little overhead to the underlying structure (an array). I'm quite surprised at how close the memory footprint of the map is to the array.

The source code for this is available at https://github.com/karlseguin/golang-set-fun.