Philosophy and "for" loops — more from Go and Rust

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

Once upon a time, a new programming language could be interesting because of some new mechanism for structured flow control. An if statement that could guard a collection of statements would be so much easier than one which just guarded a goto . Or a for statement which took control of the loop variable could simplify matrix multiplication significantly. An illuminating insight into this earlier age can be found in Knuth's "Structured Programming with go to statements" [PDF]. Many of the issues that seemed important in 1974 seem very dated today, but some are still fresh and relevant.

The work of these early pioneers has left us with five basic forms that appear to be common to most if not all procedural languages: two conditional constructs, if and switch/case ; two looping constructs, while and for ; and one encapsulation construct: the function or procedure.

While interesting new control flow is unlikely to be a headline item on a newly developed language these days, each language must embody concrete choices concerning these structures and it is quite clear that, while there is similarity, we are far from uniformity. Exploring how a language handles control flow can provide interesting insights into the philosophy behind the language. In this article, we will continue our explorations of Go and Rust by looking at various control-flow structures, but particularly focusing on the "for" loop.

The background of for loops

The for loop first appeared in programming languages as an easy way to step through a fixed list of values. We can see this in Fortran, which used the word do rather than for (here 10 is the label of the statement after the loop):

do 10 i = 1, 100, 2

and in Algol58:

for i := 1(2)100 do

Algol60 adds some syntactic sugar

for i := 1 step 2 until 100 do

while Pascal dropped the step clause so you would need:

for j := 0 to 49 do

and then set i := j * 2 + 1 inside the loop.

The Algol60 for loop was actually quite rich as can be seen by the examples here. It is a richness that probably seems excessive by today's standard. In C, which came a decade later, several of the ideas in Algol were generalized and simplified to encapsulate all the interesting possibilities in just three expressions: initializer, test, and step, thus:

for (i = 1; i < 100; i += 2)

As the three expressions can be almost arbitrarily complex, very rich looping constructs can be created from this simple form. The effect is that the head of the for forms a coroutine that is executed in concert with the body of the for loop. Control alternates between one and the other, so that together they achieve the desired result.

The coroutine nature of the for loop's head is made particularly obvious by the many (over 150) for_each macros that appear in the Linux kernel. With these the code for one routine is physically quite separate from the other, emphasizing the separate roles of the two pieces of code. An example of such a for_each macro, from include/linux/radix-tree.h is

#define radix_tree_for_each_slot(slot, root, iter, start) \ for (slot = radix_tree_iter_init(iter, start) ; \ slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \ slot = radix_tree_next_slot(slot, iter, 0))

This example is interesting for a couple of reasons.

First, the middle expression — the loop-continue condition — is not simply a condition, but contains an assignment and is sometimes used to find the next value. This makes it clear that they aren't simply expressions with fixed purposes, but rather three separate entry points into a coroutine.

Secondly, it contains two variables that change throughout the loop: slot and iter . The slot variable is the regular loop variable that any for loop would have, while iter contains extra state for tracking the path through the list and is largely of internal interest.

While it is primarily internal, it needs to be visible externally, and, in fact, needs to be declared externally. The for statement has some properties of a coroutine, but cannot define local variables for use throughout the loop.

So we see in the C for loop, particularly when combined with other features of C such as the rich expressions and the macro preprocessor, a very powerful, though not completely satisfactory, for loop mechanism. One that will serve as a basis for examining others.

Go for — broke or beautiful?

The for loop in Go comes in three different forms — not quite the range of Algol60, but seemingly more than C. One form is superficially very similar to that in C: the parentheses are not required, and the loop body must be a "block" rather than a simple statement. But these are syntactic differences which don't affect expressiveness. The earlier iterative example looks much the same in Go as in C:

for i := 1; i < 100 ; i += 2 { .... }

The parallel ends there, however. Simple for loops will look much the same, but complex for loops will have to look quite different. This is partly because Go has no macro preprocessor and partly because Go expressions are not as rich as C expressions. While the C for loop simply contains three expressions, the Go for loop contains a "simple statement", an "expression", and another "simple statement", where "simple statement" specifically includes assignments and increments/decrements.

Were we to try a literal translation of the radix tree for_each loop into Go, we would have mixed success. Go allows the declaration of local variables inside a for loop head, so there would be no need to declare slot and iter separately. However, as the condition in a Go for statement cannot contain assignments, we find a complete literal translation is impossible. Of course measuring a language by how literal translations from another language fare is far from reasonable — we may not be using the best tool for the job and, as already noted, there are other forms of the for loop in Go.

The second form is really a reduced version of the first, with the two simple statements missing and, thus, their semicolons discarded:

for i < 100 { ... }

That form is essentially what many other language would call a while loop.

This leaves the final form — the for/range loop.

for x := range expression { ... }

will iterate though members of the result of the expression in various ways depending on the type of the result. This makes explicit a difference from the for loops in the earlier languages. For Fortran, ALGOL, and Pascal, the for loop dealt with sequences of numbers, or possibly "enumerated constants" which are very number-like. As we have seen, C can work with arbitrary values and the Go range clause make it clear that this loop is for much more than just numbers.

The value can be an array, a slice (part of an array), a string (of Unicode characters), a map (also known as a "hash", "associative array", or "dictionary" in other languages), or a "channel" (used for IPC). In the first four cases the for loop steps through the components of the value in a fairly obvious way. Channels are a bit different and will be examined shortly. As range does not work with user-defined types at all, we cannot translate our "radix_tree" loop directly into for/range and so must look elsewhere.

A reasonable place to look might be some existing body of Go code to see how such things are done. Though the Go compiler is not written in Go, the Go language source distribution includes many tests, libraries, examples, and tools written in Go, with a total of 2418 .go source files, all of which were presumably written by people quite familiar with the language. Altogether, there are over 7000 for loops to consider.

Of these, 1200 are of the while loop form, nearly 2800 are for/range loops, and the remaining 3000 are in the three-part form, the vast majority of which have a numeric loop variable (demonstrating that the numeric loops of yesteryear are very much alive and well). So there are not a lot of examples of iterating user-defined data structures — a fact which itself might be significant.

One example of interest is in src/pkg/container/list/list_test.go :

for e := l.Front(); e != nil; e = e.Next() { le := e.Value.(int) ....

This example is not vastly unlike the for_each macros we saw written in C. The syntax is clearly different, but the idea of having a very simple "head" on the for loop, with the actual code for the coroutine being off in a different file, is represented quite clearly. The for loop fragment given could easily be for almost any data structure. If there was a desire to keep the value ( le above) more distinct from the iterator ( e above), a construct like:

for slot, iter, ok := l.Front(); ok; slot, ok = iter.Next() {

could return a sequence of slot s using an iter ator much like the radix_tree_for_each_slot loop we saw earlier. This construct is really quite elegant and extremely general.

Another interesting example occurs in various files in src/pkg/net , such as src/pkg/net/hosts.go and takes the form:

for line, ok := file.readLine(); ok; line, ok = file.readLine() {

This is very similar to the Front / Next example, except that Front and Next are identical. This could be considered to violate the DRY principle: Don't Repeat Yourself.

In C, this sort of loop is regularly written as:

while ((line = fgets(buf, sizeof(buf), file)) != NULL) {

but that cannot be used in Go, as expressions do not include assignments.

This issue of expressions excluding assignments has clearly not gone unnoticed by the Go designers. If we look at the if and switch statements, we see that, while they can be given a simple expression, they can also be given a simple statement as well, such as:

if i := strings.Index(s, ":"); i >= 0 {

which includes both an assignment and a test. This would work quite nicely for the readLine loop:

while line, ok := file.readLine(); ok {

except that Go does not provide a while loop — only a for loop. Though the for loop does include two simple statements, neither are executed at a convenient place to make this loop work as expected. So if we are to remove the repetition of the readLine call, we must look elsewhere.

One possibility is to explore the fact that while expressions do not include assignments, they do include function calls and functions can include assignments. Go supports function literals. This means that the body of a function can be given anywhere the name of a function can be used. The body of a function may be assigned to a variable, or it may be called in place. Further, the function so defined can access any variables that are in the same scope as the function. So:

for line := ""; func() (ok bool) { line, ok = file.readLine() }(); {

is a for loop in the three-part form which behaves much the same as the example above from hosts.go but without repetition.

The "initialize" part of the for loop ( line := "" ) declares a new variable, line which is initialized to the empty string (it syntactically needs to be initialized to something, though the value won't be used).

The "condition" part of the loop is an immediate call to a function literal which calls file.readLine() , returns the ok part of the result and has a side effect of assigning the line part of the result to the line variable. The = form of assignment is needed in the function, rather than the := form, so that it does not declare a new line variable, which is local to the function, but instead uses the one local to the for loop.

The "next" part of the loop is empty, and appears between the second ; and the { .

While this does remove the unfortunate repetition of the readLine call, the cure turns out to be much worse than the disease, as the loop is close to unreadable. While function literals certainly have their place, this is not that place.

This leaves one more possibility to explore — it is time to examine that " range channel" construct hinted at earlier.

Channels

Concurrency and multiple threads (known as goroutines) are deeply embedded in Go, and the preferred mechanism for communicating between goroutines is the "channel". A channel is somewhat like a Unix pipe. It conceptually has two ends, and data written to one end can be read from the other. While a pipe can only pass characters or strings of characters, a channel can pass any type known to Go, including other channels.

for i := range my_channel {

will repeatedly assign to i each value received from my_channel and then run the body of the for loop. This is a lot like our readLine example — if only we could make lines appear on a channel. And, of course, we can.

func lines (file *file) (<- chan string) { ch := make(chan string) go func () { for { line, ok := file.readLine() if !ok { break } ch <- line } close(ch) }() return ch }

This lines function creates a channel (the make function) and starts a goroutine (the function literal after the go keyword) that sends lines back over the channel. This could be called as:

for line := range lines(file) {

which will very cleanly iterate over all the lines in the file with no violation of the DRY principle.

However, further examination shows that this isn't really ideal. It certainly works in the simple case, but problems arise when you break or return out of the for loop. When you do that, the channel is not destroyed and the goroutine remains in existence trying to write to it, though no one will ever read it again. Go has built in garbage collection that will reclaim unreferenced memory, but not unreferenced goroutines.

In order to clean up properly here, we would need to close the channel after breaking out of the for loop. Strangely only the write end of a channel can be closed and, since the return value of our lines function is currently the read end ( <- chan string ), we need to change it to return the double-ended channel. We also need to declare a variable to hold the channel:

func lines (file *file) (chan string) { ch := make(chan string) go func () { for { line, ok := file.readLine() if !ok { break } ch <- line } close(ch) }() return ch } ... c := lines(file) defer close(c) for line := range c { ... }

Now we have a for loop that iterates over lines in a file, but that we can break out of without leaking channels or goroutines. However it isn't really elegant any more. Needing to return both ends of the channel, needing to declare a separate variable to hold that channel, and the explicit defer close are all warts which tarnishes the elegant:

for line := range lines(file)

The conclusion is that despite the repetition, the form used in the net package of:

for line, ok := file.readLine(); ok; line, ok = file.readLine() {

does seem to be the best way to implement the task. All of the alternatives fall short.

From loops to philosophy

It is in that last observation that part of the philosophy of Go seems to show itself. While Go offers a lot of functionality, it often seems quite restrictive in how this functionality is accessed. This is reminiscent of the 13th aphorism from the Zen of Python:

There should be one — and preferably only one — obvious way to do it.

We see this restrictiveness in for loops where the range syntax is only available for built-in types, and where the first/next structure is really the only way to do other for loops, even if it involves repeating yourself.

We can see a similar pattern with inter-goroutine communication, where channels have a privileged status. There are several language facilities that only work with raw channels much like for/range only works with internal data types. Send ( ch <- v ), receive ( v <- ch ), and the select statement (which is a bit like switch but chooses which of several blocking operations is ready to run) are completely unavailable to user-defined types.

Where Python provides a default implementation for "maps", but allows a class to provide an independent implementation using the same syntax, Go provides a built in "map" data type and permits no substitutes. The Go FAQ makes it clear that this is a conscious decision and not an oversight:

We believe that Go's implementation of maps is strong enough that it will serve for the vast majority of uses.

This is probably why we found so few examples of iterating user-defined data structures in the Go code — maps are used instead.

Finally, even the syntax has an element of restrictiveness. We saw this briefly in a previous article where the handling of semicolons impose certain style choices on the programmer. We can see it also in the go fmt command, which will reformat the code in a .go file to follow a particular standard. While this is not imposed on programmers, the language designers recommend the use of go fmt to ensure that code follows the one true layout.

This philosophy certainly has a lot to recommend it. By removing options from the programmer, the language removes the need to make choices and so frees the programmer to focus on the actual functionality that they need. It is a philosophy that also imposes heavy requirements on the language and support environment. If there is only one way to do something, then that one way had better work extremely well. Given the vibrant community that has been built up around Go, and the strong emphasis on performance shown in the recent release of Go 1.1, it seems likely that Go does live up to this requirement

Rusty loops

Turning to Rust we see a very different style of for loop. The example loop we started with which iterates over odd values from 1 to 99 would look like:

for uint::range_step(1, 100, 2) |i| { ... }

Here the:

|i| { ... }

piece is a function literal, similar to those we saw when exploring Go, though with a very different syntax and a different name. Rust like many other languages calls it a lambda expression. It consists of a list of formal parameters between vertical bars, and a statement block.

The

uint::range_step(1, 100, 2)

is a reference to a function called range_step in the uint module. The uint::range_step() function actually takes 4 arguments: start , stop , step , and function . The behavior of range_step() is to call function , repeatedly passing values from start up to the stop , incrementing by step each time. Consequently our for loop could be realized simply by:

uint::range_step(1, 100, 2, |i| { ... })

There are two problems with this. A minor point is that the syntax is arguably less pleasing than the first version. More importantly, constructs like break and continue don't have any meaning inside a function literal, so they could not affect the flow of this second loop.

The for statement addresses both of these. It provides syntax for writing the function literal outside the normal list of function parameters and it gives meaning to break , loop (the Rust equivalent of continue ), and return .

By convention, the function in the head of for should stop looping when the function argument that it calls returns false . The for statement uses this by effectively translating break to return false and loop to return true . If any return statement appears in the body of the for loop, it is also translated to something that will "do the right thing".

This seems like a fairly complex set of transformations, but the end result is extremely flexible. It allows a very clear separation of the two coroutines that make up a for loop, with the head routine having the full power of a regular function that is able to declare local variables and to communicate in arbitrary ways with the body routine.

Both the "iterate over all the lines in a file" loop which we struggled with in Go, and the radix tree loop from the Linux kernel, would be trivial to implement as an iterator routine in Rust. The first of these would look like:

pub fn every_line(f: @io::Reader, it: &fn(&str) -> bool) { while !f.eof() { let line = f.read_line(); if !it(line) { break } } }

and could be called as:

let f = io::file_reader(&Path("/etc/motd")).get(); for every_line(f) |line| { io::println(fmt!("Line is %s", line)); }

This power to write elegant iterators is not without its cost. While Rust allows an arbitrary function to provide the head of the for loop, it also requires the head of the for loop to be some function. The simple initialize, test, increment form of C and Go cannot be used.

If we go back and look at the nearly 3000 for loops in the Go source code that use a numeric loop variable, we find that the vast majority of them could be implemented using uint::range_step() or even the simple uint::range() . But not all. Some examples include:

for ; i > 0; i /= 10 { for (mid = (bot+top)/2; mid < top; mid = (bot+top)/2) { for n := 1; n <= 256; n *= 2 { for rate := 0.05; rate < 10; rate *= 2 { for parent := ".."; ; parent = "../" + parent {

(the last one does not have a numeric variable of course, but is still a useful example).

Several of these could be supported by adding a very small number of extra iterators to the standard library, the rest could just as easily be implemented with a while loop. So this limitation doesn't really limit Rust a significant amount.

A Rusty philosophy?

We see, in the for loops of Rust, a very different philosophy to that of Go. While Go forces you into a particular mold, Rust lets you build your own mold with enormous freedom. You could even modify the exact behavior of break inside your for loops if that seems like a useful thing to do.

This freedom and flexibility extends to other parts of Rust too. In last month's article, we saw that Rust does not draw a distinction between expressions and statements, so it allows if and match constructs (the latter being similar to switch ) deeply inside expression, whereas Go does not permit such things.

Rust goes even further with a rich macro language that can declare which syntactic elements (e.g. identifier, expression, type) may replace each macro parameter, and can repeat the body of the macro if the parameter is a list. This leaning towards extreme flexibility seems to pervade Rust and is reminiscent of the Perl programming motto: There is more than one way to do it.

Summary

There will always be a tension in language design between allowing the programmer freedom of expression and guiding the programmer toward clarity of expression. In a previous article, we saw how the type system of Rust prefers clarity over freedom. Go is not such a stickler, and is satisfied with run-time type checks in places where Rust would insist on compile-time checks. Here, when we look at the structuring of statements and expressions, we find Rust prefers freedom while Go seems more focused on clarity by eliminating unnecessary flexibility.

Which of these is to be preferred is almost certainly a very personal choice. Some people rebel against a constraining environment, others relish the focus it allows them. Both provide room for creativity and productivity. Go and Rust provide very different points in the spectrum of possibilities and it is good to have that choice ... except that it does mean that you have to choose.