New: 25July2013, updated 28Jan2018 (13Sep2020: typo)

This page is in group Technology.

Intro

This blog page is meant to be a reflection based on these two lectures and my own views:

“ Go concurrency patterns” by Rob Pike (Google) [1] and

concurrency patterns” by Rob Pike (Google) [1] and “C++ Concurrency, 2012 State of the Art (and Standard)” by Herb Sutter (C++ leading authority) [2].

The lectures are 4178 km and 39 days apart – provided they are on the same planet. I will investigate. Disclaimer: to learn the lectures’ contents, see them. This page will not sum them up.

At the bottom are Comments. Rob Pike has mailed me, and Herb Stutter has added comments.

Looking back

I have had the opportunity to talk with Bjarne Stroustrup once. It was at the Simula-67 25 years anniversary at the University of Oslo 22June1992 [3]. I asked Stroustrup why he hadn’t built concurrency into C++. I wrote down the reply immediately, and have kept it to this day.

He replied that there were at least “forty different solutions” for it, and that he did not want to use any one of them. The user should then rather rely on a library. Dag Brück [4] at that time coordinated C++ real-time programming. One had to look after “everything”, so that it matched what they had been making. I asked about occam [5], and he said that it was especially targeted towards a certain model (CSP [6], C.A.R. Hoare was also there) and thus it could be “better” than a certain library. I also asked about aliasing and side effects (which occam is free of, except for the side effect that communication constitutes), and Stroustrup said that both of these certainly easily appear in C++ (some are by design, as doubly linked lists). Stroustrup also said that how a process maps to an object (part of the “process model”) therefore depends on the model chosen (for a C++ library). I asked him why he had built on C, and he replied it was because it was available. He said that efficiency and generality were the only two factors he related to, and that these were not necessarily mutually exclusive. End of summary (more here). Except we spoke in our native languages, Danish and Norwegian. By the way, we’re of the same age.

(31Jul2013: I deleted a paragraph here after Rob Pike mailed that “Holmdel was a different location than Murray Hill. The transistor, C, C++, Unix, and the laser all came out of Murray Hill. Holmdel had more to do with radio: things like discovering radio astronomy and the big bang background radiation.” I certainly had recollected that wrongly!) (Holmdel, Murray Hill)

Later on, in the late ninetees (when internet was available) I discovered Limbo [7], a language designed at Bell Labs by, among others Rob Pike. When I saw its syntax with “<chan of protocol>” without mention of ‘occam’, ‘CSP’ or ‘Hoare’ in the document, then I – somewhat offended on behalf of my favourite language, sent them an email. Of course, it stopped there. (After I wrote this note I discovered a paper by Rob Pike that predated Limbo, where he certainly acknowledges CSP’s influence on Newsqueak, see [29]. Following, I have even found the very paper I read. In my private archive! A revised version (2005) is still available, see [30])

Until Go appeared in 2009 [8]. Rob Pike was again on the language design team.

Of Herb Sutter I know nothing more than the fact that I have mentioned his seminal paper “The free lunch is over” in my Nondeterminsm blog note, see [21].

In August 2010 I went a little through the C1X and C++0x concurrency in C and C++ Working Drafts in my bloge note 022.

“Bell Labs and CSP Threads”

Russ Cox has written a very interesting note with lots of references entitled “Bell Labs and CSP Threads“, see [18]. Here are some sentences, taken out of their context and just jotted down, as a teaser for you to read the full note. Text in gray has been added by me:

The power (of CSP channels) has been forcefully demonstrated by the success of the filter-and-pipeline approach for which the Unix operating system is well known [2]. Indeed, pipelines predate Hoare’s paper. In an internal Bell Labs memo dated October 11, 1964, Doug McIlroy was toying with ideas that would become Unix pipelines: “We should have some ways of coupling programs like garden hose–screw in another segment when it becomes necessary to massage data in another way. Of course, the Unix pipe mechanism doesn’t require the linear layout; only the shell syntax does. McIlroy reports toying with syntax for a shell with general plumbing early on but not liking the syntax enough to implement it (personal communication, 2011). Later shells did support some restricted forms of non-linear pipelines. Rochkind’s 2dsh supports dags; Tom Duff’s rc supports trees. In 1980, barely two years after Hoare’s paper, Gerard Holzmann and Rob Pike created a protocol analyzer called pan that takes a CSP dialect as input. Holzmann’s protocol analyzer developed into the Spin model checker and its Promela language, which features first-class channels in the same way as Newsqueak. Luca Cardelli and Rob Pike developed the ideas in CSP into the Squeak mini-language. Pike later expanded Squeak into the fully-fledged programming language Newsqueak [5][6] which begat Plan 9’s Alef [7] [8], Inferno’s Limbo [9], and Google’s Go [13]. In a similar vein, Rob Pike demonstrated how the communication facilities can be employed to break out of the common event-based programming model, writing a concurrent window system. Plan 9 has no select call, and even on Unix you need multiple procs if you want to overlap computation with non-network I/O. It is interesting that despite this, the language (Limbo) provides no real support for locking. Instead, the channel communication typically provides enough synchronization and encourages programmers to arrange that there is always a clear owner for any piece of data. Explicit locking is unnecessary. Rob Pike’s half of his 2010 Google I/O talk with Russ Cox shows how to use channels and Go’s concurrency to implement a load balancing work management system.

I would like to understand ..”even on Unix you need multiple procs if you want to overlap computation with non-network I/O.”

My true colours

I will warn you, to spare you from reading any further. I think that Sutter’s statements that everything blocking is “bad” or even “evil”, and everything non-blocking is “good” is wrong. I think it’s more balanced: both blocking and non-blocking may be bad, evil or good. Pike does not use such one-sided arguments. Personally I have come up with a published idea that merges the two disciplines, the xchan. I would have considered anybody who even thinks he has a suggestion of a solution for a charlatan! So, who am I, then?

I am Øyvind or Oyvind Teig, having worked 35++ years with safety-critical fire detection real-time systems at Autronica [10] in Trondheim in Norway. I have published some in magazines and done some peer reviewed papers [11], and I write blogs like this that some read. Of course, this page is not endorsed by Autronica. I just work there, and this is how I see concurrency matters. I am allowed to say it, just as any who would have contrary views would have been allowed to. The discussion is welcomed. As long as we never disclose product-sensitive matters. Of course I don’t.

So, the xchan or XCHAN leans some on the Linux/Posix operating systems, where a way to handle flow control in a pipe may return EAGAIN or EWOULDBLOCK as error. It leans some on the CSP blocking channel, and some on CSP-composed buffered channel. It has not been implemented in any language (I hope: “yet”!).

XCHAN send will never block, even its zero-buffered version; it will enable overflow to be handled at application level; it will make it possible to flush low priority earlier messages when overflow occurs (so a message doesn’t have to be “lost” in a pipe or queue); it is buffered (if that what you really need) or unbuffered; it breaks any deadlock cycles; and should not have much overhead. It is send and ..you are not allowed to really.. forget. If you are still with me, you may read the paper XCHANs: Notes on a New Channel Type at [12]. (Another follow-up paper Selective Choice “Feathering” with XCHANs is to be presented at CPA-2013, see [23])

I have also tried to blog about SDL (Specification and Description Language) which has an asynchronous, buffered, non-blocking foundation, see my blog 056, “Some questions about SDL” [13]. The need for infinite buffer sizes and missing WYSIWYG semantics certainly in my opinion make asynchronous non-blocking systems problematic for potentially life saving software. That blog is commented point for point by a named university professor, who I deliberately asked to comment because I knew we disagreed. If you want to comment, please do (there or here), or send me a mail.

By the way, I have not programmed a single C++ line in my life. So I was enlightened when I discovered the C++ concurrency cheatsheet, see [22].

Main differences

The best thing you can do right now is to listen to Pike and Sutter’s lectures. Don’t skip anything. With a break in between you need 2.5 hours. I hope you will come back for my ponderings here,.. tomorrow! Their messages are equally important as mine.

The most evident differences I see here:

Rob Pike (transcript) Herb Sutter (collection of my scribblings) “Now for those experts in the room who know about buffered channels in Go – which exist – you can create a channel with a buffer. And buffered channels have the property that they don’t synchronize when you send, because you can just drop a value in the buffer and keep going. So they have different properties. And they’re kind of subtle. They’re very useful for certain problems, but you don’t need them. And we’re not going to use them at all in our examples today, because I don’t want to complicate life by explaining them.” Queue is way more scalable because you don’t wait. Don’t stop! We hate to block! Blocking is almost always harmful. You can always turn blocking “the bad thing” into non-blocking “the good thing” via async() at the cost of occupying a thread. Anything shared is evil. Blocking code is not good for the universe.

I don’t really know what to say. I was perplexed to hear Sutter’s comments. Even if he explains his opinion and argues about it, I am as bewildered.

“You can always turn blocking “the bad thing” into non-blocking “the good thing” via async() at the cost of occupying a thread” has been commented by Sutter and me below.

Futures

Sutter’s world is C++. It did not have support for concurrency. C++11 has some, but he’s really talking about beyond even that, so most of what he talks about isn’t there yet, like “future.then, when_any, when_all, concurrent_queue and concurrent<T>”. (Read Sutter’s comments #2 where he tells that future.then already exists.) Each of them is nice and he argues well for them, explains why they are needed. Pike’s world is a language that from day one has had support for concurrency. This means that the compiler will have had to know about scheduling and rescheduling. Go had to come with a scheduler and a library, where C++ came with a library. Go knows what a process is. Both in addition know what their object is.

“Waiting faster” (Hoare)

Sutter argues that to get responsiveness one can’t block. This is both right and misleading. He certainly indicates that this is dependent on the level of the software, “you don’t want everything to be asynchronous”. There is a layer in his talk that tells when blocking is ok. I like the bus metaphore quite well. Hoare (then at Microsoft Research in Cambridge, UK) has a paper called “Concurrent programs wait faster” [14].

The mess: waiting for one (sending, receiving, work) , waiting for a set of (sending, receiving, work), blocking and synchronizing.

Hoare talks about “waiting faster” meaning that waiting is ok if you do it right! Like Sutter I have nothing to “wait with” when I press a button in a dialogue box. I want “immediate” response. However, this is not necessarily so much a matter of blocking on the sending of the button press than it is a matter of scheduling and having ready cycles, and of granularity – or “parallel slackness” (enough and small processes/threads). It is also an architectural matter. If the button press is sent to a standard (=big) event handler loop I would tend to agree with Sutter. Also, as he points out, it’s not necessarily easy to spot in the code if the code would (pathologically or not) block. My comment: a good naming convention with link level cues always helps. What this event handler is handling is difficult to handle.

Specification-driven

But I am used to specifications. I would see specified that the key press should give immediate response, and not that code down there must never block! Ok, so I use the tools and methdologies I have. I remember the transputer days in the early ninetees. There was a plug-in board with a transputer on (IMS B008), and there was a GUI library for Windows (yes: 3.0, later 3.1). The library sent non-buffered key press events to occam programs that ran on the transputer. Every part of the message path was synchronous, i.e. blocking. But there was nothing wrong with the response. How come? Because the code was blocking, but it didn’t block in that situation, by design. My that-dialogue-box-event-handler in occam was always ready to run, in parallel (concurrently) with a bunch of other-dialogue-box-event-handlers that were also always ready to run. I could have started a big job from the previous dialogue box; like getting a combustion pressure curve from a five floor height diesel engine on a large ship, that rotated like 62.5 RPM, and I needed several runs to build the curve; it took seconds [15]. But the immediately following next dialogue box’s buttons I was of course still able to serve with immediate response, by another process. The two were not in the same event loop. They had each their’s.

But Hoare really discusses waiting smarter as waiting for something to happen. Don’t wait for bus 43 only, if also bus 34 takes you there. Wait for the first of 43 or 34, but not for 42 or 21! But then, this means that if I only had waited for bus 43, then bus 34 would painfully have to block to have it’s arrival be announced. Have you seen the bus at the airport, you know it’s coming for you, but why on earth is it then still? So, sending is just another side of receiving.

This is where synchronizing comes in. In CSP, with non-buffered channels, as for occam and Go, communication and synchronizing are the same events. It could move data from one core to another when both sender and receiver are ready, if could move data from one machine to another. Go moves pointers to the channels and keep track of them, which limits its use to shared memory. A channel is only a key to share data during the communication. After data has been built, put its pointer and length in the channel. If the receiver is ready, let the channel do the memcpy and make ready to schedule the receiver. Then let the sender go on. It will not block, and the pointer is ready aftwerwards for new data. If the receiver is not ready, let the pointer and data be untouched (invariant) by returning to the scheduler. This is “blocking”. When the receiver is ready, data is moved over the channel and the sender is rescheduled. The “first” or “second” on the channel is the driving force here, not “sender” or “receiver”.

There is nothing that the blocked process should have done that it didn’t get done while being blocked. Simply because, by design, in that case the scehdule was empty! Since nobody asked it to do anything then there is nothing wrong with not doing it!

But if the requirement was to do something after data was ready to be sent, then design for it! Add buffer processes or use a buffered channel, or use my more elegant XCHAN.

So we could say that the natural is waiting for one (sending, receiving, work) , waiting for a set of (sending, receiving, work), blocking or synchronizing.

None of these are untouchables! They are all tools in a chest.

In occam-pi (occam-π) [16] a mobile channel could even send data by passing over a pointer and then the language saw to it that that pointer went out of scope: zero-copying.

Building concurrency on CSP in Go

Much of this is what Rob Pike also talked about. Go’s concurrency is based on CSP. In “Why build concurrency on the ideas of CSP?” [17] they write:

Concurrency and multi-threaded programming have a reputation for difficulty. We believe this is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers. One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare’s Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go’s concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects. Experience with several earlier languages has shown that the CSP model fits well into a procedural language framework.

Misunderstanding?

Around 08:45 Sutter talks about that in order to reach the goal of composability then asynchronous / non-blocking code is needed. He says that in order to connect to off the shelf components they need to be asynchronous. Is he then saying that using RPC (Remote Procedure Call where remote is on any machine) is also “bad”? When I try to open a file on my network disk I can see it fine (dir is often cached), but some times I have to wait to have the disk spin up before the picture appears. Some times I, as Sutter, laments on why an application couldn’t do that or that while what I just asked for is being picked up. My XCHAN would be a good component here, but it’s the same as doing a non-blocking call to asked for service, and if it’s not available, just return that not-ready info. Then wait for a signal, event or channel for a ready, when it arrives, read the data. Back to Hoare: “concurrent programs wait faster”. What we need is a mechanism where my server could in fact handle that or that while it’s waiting. The ALT or selected choice is that mechanism. When Sutter doesn’t use that mechanism (Unix select is in the right direction), then he would understandably state that all needs to be asynchronous. But he would perhaps then prepare busy-poll a little too tempting for the programmer? Remember, the async call was for a service that was not ready. Good, so programming with future makes this possible! But the more Sutter will be going to use futures (and result.then), won’t the requirement for everything to be asynch fall? My bet is that Sutter in his laments isn’t really thinking concurrency (like Hoare).

Many futures?

By the way, is waiting for N-futures the same as a selective choice? I have discussed select in Go in a blog note, see [19]. I’d certainly like to write about N-future thinking as well.

Queuing

Around 22:46 Sutter states something like you can always turn blocking “the bad thing” into non-blocking “the good thing” via async() at the cost of occupying a thread. With my XCHAN you won’t need that extra thread, and the bad thing is never present, only the good thing. But do bear in mind that XCHAN simply is making a language primitive from a sw pattern. There is nothing bad or good here, mostly whether it fulfills the requirement – even if I have used “a life” to try to share my experience that starting with synchronous in the bottom is best. It will never require infinite buffers, not even one buffer, to start with. Ada started with synchronousity (but then its concurrency is based on CSP, just like occam and Go), and the Integrity operating system from Green Hills Software does it (but then it started its life as a runtime system for Ada). The Ada implementation of rendezvous uses queues that Sutter acclaims so much. The same queues that introduced “bad nondeterminism” (determinism may also be “good”) and caused Ada to have to define a subset. It wouldn’t be like a MISRA subset that I think only covers single-threaded matters. It’s called the Ada Ravenscar profile for safety-critical systems. I have blogged about it, see [20].

But Go’s selective choice is also built on queuing, just like the mentioned Ada rendezvous, but unlike occam’s ALT. The Go designers say it is not designed for safety critical systems. Pike does not cover this in his lecture, but it’s stated somewhere in the golang-nuts group, in one of the threads I have started. The mentioned blog about nondeterminism also tries to discuss this, see 049.

Buffered channel

Do observe that Go’s buffered channel has a dimension. When an N-pos buffered channel is full, blocking semantics switches in. The sender has to catch this behaviour in a proper way, more than just saying that treating something as natural as a full bucket is bad for the universe. To avoid spilling the wine all over.

Nondeterminism

After 42:32 Rob Pike mentions non-determinism. Although he doesn’t cover it much, here is a collection of sentences after that time:

Are you worried about the non-determinism?

It’s actually kind of a non-issue because of the way the language works.

..Nowhere does this function here know what that channel has behind it.

..that channel is, in effect, the capability to a service,..

..In fact, this might be a mock for a service that returns values at arbitrary intervals.

..the whole idea of a channel is that it hides what’s behind it.

..It’s a first-class citizen in the language

As mentioned, I have tried to write some about nondeterminism in a blog note, see [21].

Conclusion

As I see it, future is a pattern hardenened into C++, a basically non-concurrent language – to forge it into C++11 – with more functionality also into C++future. To me the last point, that C++ is non-concurrent (ref. Stroustrup’s statement to me in 1992), helps explain why Sutter seems so “blocked on non-blocking”. About the first point (futures cannot produce results before.. some time in the future.): allowing the same process that uses a future to also do something in the meantime is then “non-blocking”, but having partitioned the architecture so that this particular process has nothing to do in the meantime is “blocking without harm, per design, the other processes are able to get the world going”. The interplay between the architectural design and the use of futures, I think, will cause C++11 programmers to become less concerned about blocking. Channel is often built into a language as a “first-class citizen”, and the sw threads (=processes) that use them use blocking (zero-buffered or full channel buffer – making communication, synchronization and even scheduling the same thing) or non-blocking (still space in channel buffer or an inserted process to form an “overflow buffer composite process”) will make this tradition concerned about blocking as a function of the specification. XCHAN is a suggestion to merge the synchronous and safe asynchronous (not infinite buffer, and layered non-determinism) schemes.

It seems like I could add important quotations here forever. Have a look at Pike’s personal blog note “Less is exponentially more“, see [31], where he describes how Go evolved, partially triggered by the new thoughts of C++(11), going back to C, then even to square one, then the designers borrowed and added what they wanted. And “concurrency, too, naturally.” Read his thoughts on why so few are moving from C++ to Go. It seems to me that Go’s potential success will not depend on C++ programmers. Maybe these worlds are not on the same planet after all.. (But there are others on this planet as well, see the Welch and Martin presentation, essentially explaining why objects are considered harmful, see [32])

I don’t consider this a “religious war”. We all have different background and we see things differently. Of course we will try to persuade the other. We would know more about the fate of OO and/or POP in ten years. But even then, we won’t know about the coming ten years!

Update 21Sept2014: A new blog note that puts so much of this in context, “Not so blocking after all“. Press on its figure to go there:

Aside: Concurrent web server

(Is this a good case? In any case, this is a stretch for me..:) The nginx web server (reverse/mail proxy server) seems to outpace Apache these days (2013). Where Apache uses a separate thread for each connection (reused from a thread pool), the nginx does it differently:

And.. “What resulted is a modular, event-driven, asynchronous, single-threaded, non-blocking architecture which became the foundation of nginx code.” [24]

I wonder if it is possible, in Go, to implement an efficient one million connections, highly concurrent web server – built with an architecture almost the opposite of nginx; using concurrency to implement the concurrent behaviour of each connection – and synchronous communication “in the bottom” (or non-blocking xchan)? One million, as opposed to ten thousand, also called “C10k”, see [27]. At least the one million number may be reached (concurrency used or not) by f.ex. modifying the Linux kernel, seemingly tested on Last.fm [26]. What has been done already in Go on such a theme? If it’s true that YouTube is written in Go [28], then Go most certainly would handle a huge number of connections?

And what about this in C++11 with concurrency?

The above being said, nginx of course isn’t that single-process an architecture:

“nginx runs several processes in memory; there is a single master process and several worker processes. There are also a couple of special purpose processes, specifically a cache loader and cache manager. All processes are single-threaded in version 1.x of nginx. All processes primarily use shared-memory mechanisms for inter-process communication.” [24]

It seems like nginx makes its own “lightweight threads” (workers, each may hold thousands of connections) and an internal “scheduler” (launching, multiplexing), and uses callbacks as the basis engine mechanism? Anyhow, the nginx author seems to be as phobic about blocking as many of you out there.

There is a paper about a web server in occam [25]. However, having some architectural virtues, the speed at the time did not seem to outpace Apache.

Aside: “Golang in web development”

In June 2014 a very interesting thread appeared on golang-nuts. I stole its title above, and you may read it at https://groups.google.com/forum/#!topic/golang-nuts/xHVc5K0e6vw.

As one commenter writes: Go is a very capable web service & web application development language, but the philosophy is typically one of “modularity” instead of “kitchen sink”.

Aside: Google already has a platform called App Engine (appengine) [33], which I believe makes web frameworks not needed. “You also get the choice of using Appengine or not, the go bindings for appengine are excellent, although not complete yet.” (Diego Duclos). Here’s something I like, copied from Go’s App Engine documentation: “The Channel API creates a persistent connection between your application and JavaScript clients, allowing it to send and receive messages in real time without the use of polling.” However, a channel here is not a Go standard chan; and appengine’s task is not a gorotine; they are different from the standard concepts. But this platform runs on Google servers and is not much free. Wikipedia App Engine shows that App Engine serves a huge number of languages. But I have a feeling that the Go bindings are much different from the other bindings?

Go ahead and read the thread. To try to tempt you further, here’s a summary of the name dropping in the thread: Revel, Beego, Martini, Traffic, PHP / Java (Codeigniter, Yii / Spring), Django, jQuery, JavaScript, Dart, Flask, Sinatra, gorilla/schema, Goji, gocraft/web…

July 2014: I just discovered in IEEE Spectrum’s 2014 ranking of the most popular programming languages ([34]) that Go already seems to have reached rank 10 of programming languages for the web (behind Java, Python, C#, PHP, Javascript, Ruby, PERL, HTML, Scala). However, the text “Created by Google, Go has built-in support for programs that share information while running concurrently on different computers” is somewhat misleading. As far as I know the CSP implementation of Go is based on shared memory, as opposed to the rather original occam language which runs on both shared and distributed memory. But maybe the web-guys mostly think shared memory?

Personally, after writing this blog note (but much before this chapter), I have tried to find out about web development. There are some blog notes which may be relevant, see Technology.

19April2016: I just listened to Rob Pike’s Simplicity is Complicated at dotGo 2015. His point is that the simplicity of Go hides a lot of complexity. And he’s almost fully satisfied with the solutions that the language designers came up with. It’s only 23 minutes long, well spent! ([35])

5Sept2017: Aside (humour): “The depressed dev discovers goroutines, making multi-thread programming easy and our depressed dev ever the more depressed—his multi-thread specialty now defunct. I concur… it’s a hard thing finding out your skills are obsolete” at Depressed Developer 10 [Comic], by Daniel Stori. See https://dzone.com/articles/depressed-developer-10-comic (Thanks Edvar, for amusing me (I’ve known this since I started with occam around 1990, so I never understood the “feeling less special” part..) and.. and Edvar, probably enlightening the others?)

References

This is the first time I use the academic style reference list CSS script, see note 061 Academic style reference list .