(Updated: 9/8/2013 Added “Stateful Iterators” thanks to ryszard.)

I’ve been using Go to implement some performance-sensitive services recently and, so far, it’s been a very nice. When writing Go code, it often feels like a (slightly verbose) scripting language, but has nice type-safety and good performance.

But I’ve realized that this feeling is really only true on a line-to-line basis. What I’ve noticed recently is that a lot of my code ends up being “collection munging”: creating, modifying, and transforming collections. Python excels at this – the built-in core functional working set ( map , reduce , and friends) combined with iterators/generators and list / dict comprehensions allows for some very concise, readable code. And often it even has decent performance.

But it isn’t always good enough performance, and that was a reason for moving to Go (or any higher performance, more systems-y language). But I found myself even missing C++98 because collections are actually pretty nice there. Iterators and templates allow code nearly as concise as the Python version, even with all those type declarations.

Unfortunately generics aren’t coming to Go soon . But iterators are just interfaces. They abstract away the underlying container to make it look like a sequence of values of some type with a standard interface (e.g. Next() , Prev() , etc). This allows you to write at least partially generic code, e.g. a sum function that assumes an iterator producing a specific type. If each container can generate an IntIterator , we can write one Sum() for all of them. Without generics we still need SumInt() vs. SumFloat() , but at least we don’t need SumIntSlice() , SumIntDictValues() , SumFloatSlice() , and SumFloatDictValues() .

Once I realized that this is what I was missing, I wanted to know what the canonical iterator pattern in Go is: what’s the interface and the implementation? I ended up in this thread on the golang-nuts mailing list which covers the basic options for a forward-only iterator (which admittedly simplifies the problem, mainly focusing on implementation rather than the appropriate interface, but this is also the most common type of iterator we need).

But this discussion only covered the basics of the different options and some pitfalls, but didn’t address performance issues and focused primarily on the iterator’s implementation without much consideration for the caller’s code, which I think is the more important part. The whole idea is to provide a good abstraction to the caller and hopefully not compromise performance. So which of the iterator patterns gives the best balance? Or, a more realistic question – when I’m writing code, which version should I use in each case: a) I just want to get the code working b) I want to extract maximum performance or c) I want a balance between readability and performance?

Iterator Patterns

To start, let’s review both the implementer and calling code for each of the options, hopefully giving a clearer summary than that original mailing list thread. None of the patterns explicitly define an iterator interface. They either use a channel, which is a very natural stand-in for a forward-only iterator, or a function which produces values for the caller. The two key aspects I want to look at are a) how does the iterator implementation work and b) compared to ideal caller code that looks something like

for item := range Foo.Iterator() { do_ops_with(item) }

what does the calling code look like? What, if any, hoops do they need to jump through to get things working with that pattern, and how does it affect readability/speed of coding?

In all the examples, I’ll use iterating over a []int to make the code as simple as possible. To show what the caller code looks like, we’ll implement sum() for each type of iterator.

Pattern 1: Iterator Function with Callback

The first approach looks the least like an iterator. Instead, we create a function with the body of the loop and the “iterator” gives a callback for each element:

func IntCallbackIterator(cb func(int)) { for _, val := range int_data { cb(val) } }

This is clearly very easy to implement. The calling code needs to define the callback and pass it in, using closures to track data used across iterations:

var sum int = 0 cb := func(val int) { sum += val } IntCallbackIterator(cb)

I’ve written it out in steps to make it clear, but we can write it more “loop-like” by writing the function directly in the call:

var sum int = 0 IntCallbackIterator(func(val int) { sum += val })

Because Go has anonymous functions and with closures, this doesn’t look too bad. It feels a lot like writing JavaScript. To me, one of the major drawbacks is that we’ve lost type declaration elision: we have to explicitly define the full type of the function, including the type you’re iterating over.

You’ll notice there’s no return value on the callback. We can add control, e.g. break , by allowing the function to return a value. Unfortunately, that also then means it always must return a value, or you need two iterators if you want to have concise code when you don’t need break .

Pattern 2: Closures

The second pattern also uses anonymous functions with closures, but in the iterator implementation itself. The basic idea is that the returned iterator is a generator function that you call repeatedly:

func IntClosureIterator() (func() (int, bool), bool) { var idx int = 0 var data_len = len(int_data) return func() (int, bool) { prev_idx := idx idx++ return int_data[prev_idx], (idx < data_len) }, (idx < data_len) }

The return value has two components: the iterated value followed by a bool indicating if there is another value, i.e. has_next . The initial call to get the iterator also returns this has_next so you can tell if there are any elements.

I’d hope there’s a better way of implementing that, but this is what I was able to came up with. There are two reasons it’s ugly. First, you can’t use a regular for loop because the iteration needs to happen during successive calls to the returned function. Second, you need to also indicate whether there is a next value so the caller can terminate the loop. We could also do this by returning a *int to indicate the validity of the current value, but then we need to dereference in the caller.

The caller for this style doesn’t look too pretty either:

var sum, val int = 0, 0 for it, has_next := IntClosureIterator(); has_next; val, has_next = it() { sum += val }

Again, there might be a better way here. It looks similar to iterator interfaces in, e.g., Java. One major annoyance is that the value can’t be declared in the third part of the for so we have to declare it earlier. The entire for line is muddied up with extra control info – has_next needs to be stored and checked. This is downright ugly compared to a nice range for loop.

Finally, some people like to wrap this type of iterator function into an interface, i.e. they return an object with an interface like this:

type IntIterator interface { Next() (int,bool) }

which directly wraps the function, similar to the style common in other languages like Java. I don’t think it much matters, but some people find the it.Next() call to help clarify what the loop is doing.

Pattern 3: Channels

The final pattern uses channels and fits very nicely into Go. The basic idea is to fire off a goroutine which generates values and pushes them into a channel:

func IntChannelIterator() <-chan int { ch := make(chan int) go func() { for _, val := range int_data { ch <- val } close(ch) // Remember to close or the loop never ends! }() return ch }

Control is returned to the caller who can then just read from the channel until it’s exhausted, which range does automatically:

var sum int = 0 for val := range IntChannelIterator() { sum += val }

This is very Go-ish (or whatever the equivalent of “Pythonic” is for golang…). Because channels have direct support in the language through range , the calling code is nice, concise, and clear. And in theory shouldn’t be that expensive because goroutines are cheap (they aren’t “real” threads).

This is as good as it’s going to get in terms of simplicity from the caller’s side, barring the introduction of something like list comprehensions.

However, there is one issue with this approach. If we do something like this:

has_zero := false for val := range IntChannelIterator() { if val == 0 { has_zero = true break } }

we leave uncollectible garbage. The goroutine continues to “run” in the background, but is blocked from doing any work after it generates one value after the calling loop exited since it can’t write to the channel anymore. I haven’t yet thought much about fixing this problem. I suspect using a bidirectional channel (or two channels) as a request/response mechanism would work, but you’d still need to make sure either side close() d the channel at the right time. Maybe a defer statement could handle that nicely even with break statements. Handling all the cases properly might not be trivial and probably requires the calling code to be more complicated than the simple for - range loop.

Buffered Channels

Although goroutines are cheaper than threads, context switches still cost something. If you’re only doing a small amount of work to move to the next item (as the iterator) and a small amount of work consuming the item (as the caller), the cost of dealing with the channel and switching between goroutines could be expensive.

Conveniently, Go, provides buffered channels. They have a buffer of data instead of just one value. This means that the Go runtime can choose to fill the entire buffer before switching to a different goroutine, avoiding the overhead of switching so frequently. The only implementation change is choosing the buffer size and specifying it in the make() call for the channel. Instead of

ch := make(chan int)

we can use

ch := make(chan int, 10)

The trick is deciding how large that buffer should be. Unfortunately, this is completely dependent on the context.

Pattern 4: Stateful Iterators

A common pattern in other languages, as mentioned in the section about closure-based iterators, is to use an interface with some variant of HasNext() and Next() functions. ryszard contributed this implementation, which he refers to as Stateful because they hold the iteration state in the iterator struct itself (as opposed to the implicit storage in the closure-based implementation). The iterator implementation:

type intStatefulIterator struct { current int data []int } func (it *intStatefulIterator) Value() int { return it.data[it.current] } func (it *intStatefulIterator) Next() bool { it.current++ if it.current >= len(it.data) { return false } return true } func NewIntStatefulIterator(data []int) *intStatefulIterator { return &intStatefulIterator{data: data, current: -1} }

And here’s how it’s used:

var sum int = 0 it := NewIntStatefulIterator(int_data) for it.Next() { sum += it.Value() }

In Go, just adding an interface declaration let’s us specify a generic interface to an IntIterator:

type StatefulIterator interface { Value() int Next() bool }

which would allow use to write generic functions for any data structure that could generate an iterator with that interface.

I didn’t include this approach initially (beyond noting you could get it easily from the closure based approach) because this was part of what I set out to avoid. It may seem superficial, but I would much prefer an iteration technique that can make the actual iterator not appear in the calling code if possible – I just find that more readable. This is a place where I think C++ seriously fails. Not only do iterators show up, but often the meaning of expressions is very unclear. For example, any time you use a std::map , your code ends up littered with it.first and it.second . On the other hand, if you want more than a simple forward-only iterator, this is the right way to handle it since, e.g., bidirectional iteration can’t clearly be specified via a single function.

Finally, the other reason I don’t like some of the approaches that require additional variables is that I think it’s important to scope properly where possible. I really dislike holding on to the iterator outside the for loop it’s used in. I would probably change the calling code to this:

var sum int = 0 for it := NewIntStatefulIterator(int_data); it.Next(); { sum += it.Value() }

although I think the for loop with missing post statement is awkward.

Evaluation

I wrote up versions of each of these with simple benchmarks courtesy the lovely Go testing framework. The code is available here.

I wrote two versions of each. One directly iterates over a []int , the other introduces a struct and uses a []*struct which it needs to extract the data from. The goal was to hopefully avoid benefits from locality. The results don’t vary much, so either we weren’t benefiting anyway or the allocated data is still laid out nicely. In either case, having both makes me feel more confident these are somewhat reflective of real world performance.

Here are the results, slightly modified for readability:

Callback Ints 500 3836683 ns/op Callback Struct 500 4104532 ns/op Closure Ints 500 5391391 ns/op Closure Struct 500 6420672 ns/op Channel Ints 10 207776217 ns/op Channel Struct 10 212255270 ns/op Buffered Channel Ints 20 96576062 ns/op Buffered Channel Struct 20 99274603 ns/op Stateful Ints 1000 2558344 ns/op Stateful Struct 500 4706915 ns/op Stateful Interface Ints 500 7887930 ns/op Stateful Interface Structs 200 9286961 ns/op

The results aren’t too surprising, but the scale of the performance difference is interesting. The simple but noisier callback based approach wins on performance. The closure-based almost-an-iterator approach isn’t too slow in comparison, a bit under 50% performance hit. Using this approach instead is probably only an issue if it’s in an inner loop of performance critical code. Finally, the most natural channel based implementation is… slow. By a factor of nearly 50x. Buffering does help though, reducing that to a factor of 25x.

I’d probably choose the channel approach for most code, then use callbacks for performance sensitive code. I think the callback code is cleaner than the closure-based iterator and it only gets too ugly when you have too many nested “loops”.

Updated: The stateful iterator results are new. Not surprisingly, since all the state management is done manually, they are on par with the basic callback version. The two are essentially equivalent when you use the struct directly – the iterator struct takes the place of the closure, and both require one function call per iteration. The interface version is more expensive because it requires extra runtime checks and indirection to find the correct function to call.

Apparently, based on some examples from the standard packages, this style (or a version using a pointer return value to indicate both availability and value at once) are the “Gothic” way of implementing this. I went looking for other patterns because I assumed such a verbose approach wouldn’t be preferred.

Notes

Apparently channel performance has improved significantly in Go over the past couple of years, but clearly the overhead is still too much. My guess would be that the overhead of the goroutine scheduler is the source of the cost, although with only 2 goroutines it’s not clear why it’s as high as it is.

To me, the biggest drawback to all of these options is that they don’t all provide the same type, so they aren’t interchangeable. We can’t modify the iterator implementation independently of the calling code to get better performance.

The channel approach fits perfectly with the language because range directly supports iterating over channels. It would be great to be able to unify these all with a channel interface, but there’s no way to adapt the more efficient approaches without involving a goroutine. I’d guess that the overhead is with scheduling, not with the channel itself (which is effectively a 1-element buffer). It looks like the best option for unifying them for now is to write boilerplate wrapper code, i.e. define something like this:

type IntIterator interface { Next() (int,bool) }

and expose a function that returns that interface. Unfortunately, this generator-like function also produces the ugliest, hardest to read (in my opinion) caller code. So for now, I just use the channel interface first for ease of implementation and assume I’ll have to pay the refactoring cost to uglify the calling code if I need better performance.

What I’d really like to see is some sort of range adapter: provide an interface in the builtin package that lets you use any iterable in a range expression. That should be the unifying interface and allow you to use any of these interchangeably and get really clean calling code. It could just be the usual HasNext() / Next() interface seen in other languages. I assume the issue with this is that there is no way to specify those generically.

Thanks

Thanks to Faolan Cheslack-Postava for editing this post. Thanks to ryszard for the Stateful Iterator implementations.