In early June, Googler Robert Hundt published a paper comparing the performance of four programming languages: C++, Java, Scala, and a rather new addition to the world of systems programming, Google's own Go. Go is designed to provide the performance of a compiled language like C++ and the "feel" of a dynamic language like Python, but under Hundt's tests, its performance lagged well behind that of Java and Scala as well as C++.

At the time, one Google Go team member indicated the language didn't exactly receive a fair shake. And now, the Go braintrust has taken another a crack, using Go's profiling tools to better optimize Hundt's test program. The results are quite different. If nothing else, they show that comparing the performance of various programming languages is not an exact science – and that there's always room for debate.

Identifying and correcting bottlenecks in Hundt's Go program, Goolger Russ Cox and team improved its performance by an order of magnitude while using less than one-sixth of the memory. Then, they made the same changes to Hundt's C++ program, and in the end, the two programs ran at similar speeds.

"This shows that, despite its youth, Go is competitive with the other languages presented in the paper," Google Go man Andrew Gerrand tells The Register." And we have barely started optimizing our compiler..."

For their tests, the team used a snapshot of the 6g Go compiler and the GNU C++ compiler that ships with Ubuntu Natty Narwal. It's unclear which versions Hundt used. The Go team did not test Scala or Java, Cox says, because "we are not skilled at writing efficient programs in either of those languages, so the comparison would be unfair".

Hundt's tests allowed for optimization, but after his paper was published, Go team member Ian Lance Taylor said that very little work went into the Go optimization. "Despite the name, the [ostensibly optimized version of the Go] code was never intended to be an example of idiomatic or efficient Go. Robert [Hundt] asked me to take a look at his code and I hacked on it for an hour to make a little bit nicer. If I had realized that he was going to publish it externally, I would have put a lot more time into making it nicer," Taylor said on the Go mailing list.

Hundt's original (unoptimized) C++ and Go benchmark programs run a particular loop-finding algorithm. In revisiting the tests, Cox and his team ran these programs on a 2.13GHz Core i7 machine with 4GB of RAM running Ubuntu. The C++ code ran in 27.47 seconds and used 700MB of memory, while the Go program ran in 56.92 seconds and used 1604MB of memory. But then they optimized each piece of code.

After fine-tuning Hundt's Go program using the language's profiling tools – gopprof – they dropped its runtime to 3.84 seconds, and the program used only 257MB of memory. Then they translated the optimized code into C++ code, and according to Cox, the Go program ran slightly faster – though the C++ program was slightly shorter and easier to write because the C++ code uses automatic deletes and allocation instead of a cache.

"Benchmarks are only as good as the programs they measure," Cox said. "We used gopprof to study an inefficient Go program and then to improve its performance by an order of magnitude and to reduce its memory usage by a factor of six. A subsequent comparison with an equivalently optimized C++ program shows that Go can be competitive with C++ when programmers are careful about how much garbage is generated by inner loops." ®