As curious Go practitioner our engineer Maarten Bezemer is curating a weekly, containing the latest quality Go publications. Recently he was the very first to discover an interesting paper: understanding real-world concurrency bugs in Go. Its results intrigued me, leading to an interview with the authors in our #live-interviews channel at gophers.slack.com. As a statically-typed programming language, Golang aims to provide an easy, efficient and secure way to develop multithreaded software. Since recently, its creators plead for the use of message passing as inter-thread communication instrument and serve several concurrency mechanisms and libraries to ease multi-threading. Could this advocacy for message-passing have implications for the number of errors or bugs in software built in Go? Assistant professors Yiying Zhang and Linhai Song assessed this by analyzing concurrency bugs in sophisticated Go software such as Docker, Kubernetes and gRPC. Their main conclusion? "Surprisingly, our study shows that it is as easy to make concurrency bugs with message passing as with shared memory, sometimes even more. For example, around 58% of blocking bugs are caused by message passing. In addition to the violation of Go’s channel usage rules (e.g., waiting on a channel that no one sends data to or close), many concurrency bugs are caused by the mixed usage of message passing and other new semantics and new libraries in Go, which can easily be overlooked but hard to detect."

Eventually their findings intrigued quite some software developers, leading to debates on Reddit and Hacker News, which also include very critical notes as we should expect from a smart community. Last month, Zhang and Song also presented their findings at the ASPLOS conference. Their talk, on which you can reflect by checking its slides, is summarized in above’s lightning talk. This week I interviewed them to find out how they consider message passing nowadays and which potential improvements they identify. Read on!

Yiying Zhang:

This is a joint work between me and Linhai (Purdue University and Penn State University). I think our motivations were a bit different. On my side, I was looking into message passing vs. shared memory at the systems level (e.g., where and how to build message passing and shared memory mechanisms). I then had the question of how much modern programming languages and applications actually use message passing and shared memory, and what their mechanisms are.Go is a relatively new and popular language that supports both. So that’s how I chose Go and the focus of this study. Linhai Song: I have worked on how to combat different types of bugs since my PhD. In order to better understand real-world problems, my PhD advisor always asks us to start with an empirical study. Go is a new and widely used language. It is designed for concurrency. Therefore, we think it would be interesting to see whether there are concurrency bugs in Go and what they are like

Yiying Zhang:

Personally I wasn’t. I heard a lot about it but haven’t actually used it before the study. I’m a Linux kernel programmer by training. Good old C 🙂. Linhai Song:

I learned Go when I worked at FireEye. I had some experience before our research project, but I was not an expert. I learned a lot of new things about Go when studying the bugs. I also wrote a lot of toy programs to validate my understanding through the study. I am an expert in concurrency bugs in C.

Linhai Song: I thought message passing is better. I see a lot of concurrency bugs in C and most of them are caused by failing to protect shared memory. I have always known that shared memory is error-prone. Yiying Zhang: I was leaning more towards message passing, as the mechanism that could be less error-prone. I worked with distributed systems a lot. There, I often see message passing as a more explicit and an easier way of reasoning about `ordering`.

Yiying Zhang: Let me nuance the statements in the video a little bit. These were based on `absolute number` of bugs. But we found that message passing is used a lot less often than shared memory. If you compare bugs in a `relative` sense, then there are more bugs caused by message passing for both blocking and non-blocking bugs.



Linhai Song:

I agree with Yiying. We did not mention the relative numbers in our paper, because we can only compute some estimated numbers, but we feel these numbers are not precise enough.





Yiying Zhang: Here you see that roughly speaking, 80% of concurrency usages are carried out through shared memory.

Yiying Zhang: Yes. For Go, and for our studied software, at least. Linhai Song: Yes. the numbers of bugs caused by message passing should be multiplied with four, if we consider the percentage of usages.

Linhai Song: Developers may just be not familiar enough with message passing. Additionally, the tool support for message passing could be not good enough. I think these two currently are the main reasons why message passing in practice is more error-prone in Go, since much more development and research efforts are spent on shared memory now. Yiying Zhang: We found many bugs are caused by misuse of Go’s message passing primitive (e.g., buffered and unbuffered channel) and the result of combining message passing with other (misunderstood) Go semantics (e.g., `select`). Some of these are more specific to Go than the others. For example, Go’s `select` semantics is quite unique, but the buffered and unbuffered channel semantics is also in other languages and can also cause concurrency bugs for those languages. I was recently asked whether a better type system can solve concurrency bugs caused by buffered channel vs. unbuffered channel. This also made me feel that more tool supports are needed for message passing. Yiying Zhang: But still, intuitively and maybe theoretically too, message passing should actually be less error-prone. Because fundamentally, it forces programmers to think about what, how, and when data is communicated across threads.

Linhai Song: This is really a tough question. Maybe the whole practice should start from the design of a concurrent system. We should think more about how threads / goroutines communicate, instead of how they share. If we have a message-based design, it would be easier to do the implementation. Yiying Zhang: In a sense, when used correctly, there won’t be concurrency bugs whether you use message passing or shared memory, and when used incorrectly, you could always end up with bugs. The question to me from a programming language and software engineering point of view is how to help programmers avoid concurrency bugs as much as we can, and which mechanism is easier to achieve that. For that, I still think message passing has a better future, just because of the fact that it is a more explicit way. It could also make bug detection easier. Linhai Song: I agree. It is also easier to reason what happened when a concurrency bug is triggered. We can check the trace of messages. A member of the #live-interviews channel and spectator of the interview, Banzai Cloud engineer Márk Sági-Kazár from Banzai Cloud, at some point joined the conversation with a complementary question:

Linhai Song: Yes, I think so. But I consider the lack of tool support for message passing also as problematic. We know for example that Go programmers widely use the Go race detector in testing, but this race detector is known to help capturing a lot of bugs caused by shared memory.

Yiying Zhang: Well, reading our paper could be a good start 🙂. Linhai Song: For shared memory, developers should always be careful what are shared across threads and where are the read/write. For message passing, my feeling is that developers should always be careful whether there is a corresponding sender or a receiver. Yiying Zhang: The other thing that’s more specific to Go is when you are using message passing with other new semantics/primitives/libraries in Go. Citing our paper: Rules of channel and complexity of using channel with other Go-specific semantics and libraries are the reasons why these non-blocking bugs happen (......) When used correctly, message passing can be less prone to non-blocking bugs than shared memory accesses. However, the intricate design of message passing in a language can cause these bugs to be especially hard to find when combining with other language-specific features. Yiying Zhang: Shared memory bugs on the other hand have similar causes and similar fixes as traditional languages. There are exceptions of course, that are more specific to Go. For example, misusing anonymous functions and shared memory could cause bugs, the former is not in traditional language. Another interesting finding is that when using message passing together with shared memory (people do that sometimes), it’s easy to have bugs.