1. Rob, what have you been doing for the last 40 years? Forty years is a long time. I did a lot of graphics work when I was a student at the University of Toronto and they had a really interesting lab there with a very early release of UNIX. We actually had UNIX V5 in that lab. That got me hooked on C and UNIX and those sorts of things. I did grad work as a physicist, but I ended up back in computing in Bell labs and went back to New Jersey and worked at Bell Laboratories in Computer Science Research Department with the UNIX team basically for 20 something years and developed various tools with UNIX. But then we became a little dissatisfied with the UNIX and Ken Thomson and I started on Plan 9 and then some subsequent spinoffs from that like Inferno and things like that. In 2002 the telecom industry had changed and they didn't have a place for me anymore and so I went to Google, where I am today.

2. You mentioned your work with UNIX. What did you do with UNIX? I think, at least for the branch of UNIX that we were using, I helped bring graphics into the UNIX environment, and I did some work in the early ‘80s with Bart Locanthi. We built some hardware that was a bitmap graphics terminal. This was before the Macintosh. We had windows on the screen, multiple windows updating, mouse interaction menus, a lot of the sort of basic building blocks that have been popular for the last 25 years. They are mostly borrowed from the Smalltalk environment at Xerox, but I saw them as an opportunity to put the user interface that UNIX needed because UNIX had the ability to run multiple programs simultaneously and no decent user interface for that.



Putting windows on the screen, meant that this window could run one program or this window could do another thing and that combination of parallelism and graphics was very important to me and we made it work. It was very exciting at that time. I have videos on YouTube of early demos of that stuff.

4. So server style client? Not exactly - the terminal was on a regular 9600 baud UART. to the main frame, which was like a mini-computer, it was like a main frame and you had that machine in your room. It had a computer and a screen but it had no disk, all of its data came over the wire and so it was analogous to X11 but not the same architecture at all. Most of work was done in 1981 and 1982 and at that time it was wonderful stuff. I had to literally fly around the world to find someone to give us a mouse, because mice were very hard to come by then. We got one, actually I got two of them from professor Nicou at Lausanne who donated them to us because he developed some mice for Niklaus Wirth at ETH Zurich.



It was early days for this stuff and now of course everyone knows it, it is very familiar, it was fun to work in that stuff at the time. Macintosh came out in 1984. I think Macintosh was a beautiful thing very different style, and I wouldn't say that they got anything from our work, but we definitely have a common ancestor in the work in Smalltalk at Xerox in the ‘70s.

5. It all traces back to Xerox, essentially. You mentioned in your graphics work, you had this work on parallel updating windows. When we talk about parallel, what are we talking about? Are we talking about threads, about the processes that updated things? How did it work back then? The first system I did that behaved like that, I didn't realize it was a big deal, but it was the stuff I did as an undergraduate at the University of Toronto, on the graphic system that they had there, which ran on a PDP1145. What it literally meant was that there were multiple processes running on UNIX machines each of which had an independent connection to the terminal window running on your desk. Within the terminal there was, "operating system" is too fancy a term, it was really just a fancy library, but essentially in the terminal they were coroutines that were scheduled by IO events. The stuff we are going to be talking about here, you'll see that there is a common thread, so to speak, through all of this work.



The thing that made UNIX interesting for me in many ways was this idea that you can run multiple things at once, because I used other systems where you basically had one session and that was your single thing you did at the time. The idea that you can spin something up in the background, go back to it later was actually a nice idea but it needed a user interface. Bitmap graphics provided a way to implement the interface for UNIX. Today Apple laptops are essentially the same idea just much prettier.

6. A question I always had is: where did the idea of concurrent threads come from? Did that exist in UNIX, because UNIX always had concurrency? UNIX always had processes. It certainly goes back well before then. I couldn't say where the idea of processes in the operating system truly originated. There are some things that are really before my time. Threads within a process - the term "thread" is relatively recent. David Cheriton did his thesis on something he called "lightweight processes", which we would today probably just call "threads". I’ve never liked the terminology. I think that it’s a little misleading, because to me a process is anything that can execute independently and a process might share any number of properties with another process.



It might share memory or it might not; it might share file descriptors, or it might not; it might share an environment, or it might not - they are just processes. The thing I don't like about this terminology for threads is that it makes thread one kind of thing and process another kind of thing and they are not two things, they are points on a continuum and I prefer the old terminology where all of that continuum was called the "process", but that is not what people use. I tend use "process" as a word in many places where someone would say "thread" today, but to me they are just processes.

7. That's also the terminology of Erlang, for instance. They also use lightweight processes. It also seems that UNIX stayed single-threaded for a long time and didn't become thread-safe for a long time. You mean the libraries didn't become thread safe. That's true, because UNIX fairly late in its life got multiple processes in the same address space. That didn't come until quite late. In fact, the version of UNIX I used at Bell Lab didn't have that facility, Plan 9 did, but UNIX did not. Nowadays the thread concept has gotten into the system but the libraries pre-dated all that work, which is why it took a long time to get those bugs shaken out.

8. You mentioned that processes, the operating system process kind, can share a lot of properties, like shared memory, things like that. It seems that, recently, threads have gotten a bad name. Do you think there will be a path back towards the UNIX model of really having shared nothing and processes that might share something by IPC? Where do you see it going? I think that is an open question. We haven't yet figured out what model we want to use to talk about what I would call generally "concurrency" in a programming environment and obviously I have an opinion about that, but what matters more to me is that we think about programming in a more general way than a single sequential execution. I think that in the modern world with all the devices we are using, networks, servers talking to clients, clients talking servers, all these application running, there is clearly a lot of things running at once. And to program those multiple things at once with languages that only have a single sequential execution, makes it very hard. As you know, I care very much about building languages that let you express those sorts of things much better.

9. You mentioned Plan 9, and I think you created Plan 9 because you saw some deficiencies in the UNIX model. Which could those be? What happened to UNIX by the late 1980's was that the networks had come, graphics had come, but UNIX hadn't really absorbed them very well, and even today they're really add-ons, they don't fit the fundamental model of the operating system as originally designed. They work well but they are not coherent in the ideas of the operating system. To pick one example, if you have a cluster of Linux boxes, each of those Linux boxes got its own root, its own local disk, its own password file. You might have some of that stuff shared on network, but they don't tend to have a global security model for all those machines with a single user identifier, that they all understand, with single authentication. So security is done machine by machine instead by the set of machines, that is kind of the issue that bothered us.



Plan 9 was conceived as a way to build a network of computers that cooperatively would let users build interesting applications or share services. Even today it’s still a much nicer model that Plan 9 had in this respect than UNIX does today or Linux (to me they are the same thing). Security model had this notion that your network was a security domain, everything in that environment shared that security which is very different from having SSH keys you push around to let you talk between two computers. Graphics, I just had a very different idea about how to do graphics, than what X11 was doing.



We just decided mostly for the networking argument, to start from scratch and having taken what we learned in UNIX, the best ideas in UNIX, generalized them more, which we did a great deal, because Plan 9 had this notion of a file as the central notion for everything and the generality that gave us was really tremendous. Plan 9 for us is a very nice system but it hasn't really adapted to today, so it doesn't have a very good web browser or graphics toolkits are not very interesting and stuff. I haven’t worked on Plan 9 for many years but for a while it was a very nice system and I think quite a bit ahead of UNIX in terms of understanding the modern computing architecture.

11. Essentially you can open a file, read and write and that's it? There are a couple of other operations. The network protocol had 14 methods, I remember. It is actually the most object oriented system I think I’ve even seen because every single thing had exactly the same interface, it was totally uniform and that meant certain combinations of things worked really well. Remote debugging just fell out of the system. It wasn't a special trick because you just would reach across the network with the protocol that you used to talk to everything and grabbed the process on the remote machine, which you did with the file IO kind of operation. Then you could access memory and debug the remote process that way.



It was very natural, network gateways, the graphic system, remote graphics, remote networking, remote debugging, shared file systems, all that kind of stuff, they all worked because there was a single model for everything, communicated and it made a really clean system.

12. What did these files look like inside? If you access a device or something, do they have a special format inside? That's up to them. The key point is it's the interface that's the same, the representation that is used to implement that interface is up to the device. Obviously there was a network file system as you would know it and it could store regular files and directories and so on. But the was only one thing and there were dozens of dozens of other things: the network, graphics, the environment, printers, scanners, and all those things. They took that device idea from UNIX and generalized into a much more straight forward file API. The point was that you didn't know how it was implemented, all you knew was if you have a name and you open that, then you got permission to access the contents with "read" and "write" calls.

13. Then you had to know what to do with the contents. It didn't tell you what to do with the contents but it meant that, for instance, since graphics used "open", "close" and "rewrite" and network file systems worked, network graphics worked automatically, network debugging just fell out, it just worked. It was a pretty neat system. A lot people did think that we were saying that everything was a file, but that wasn't the point, it was that everything behaved like a file, and that uniformity made it really general.

15. I guess the difference is there are some semantics in REST that are different like immutability and things like that. Right, it’s different semantics, but the key point is the uniformity of the protocol. It means that things can be done, tools can transfer operations without knowing their meaning, because the interface is totally transparent. So it is easy to do things like gateways, because you just forward the messages and it works.

16. Just to give a more concrete example, you said that the GUI was essentially also represented as a file? The GUI wasn't, the Bitmap device was and the first layer of windowing was. There was some playing around with GUIs to the next level, but the frame buffer was a file. The way it worked, if you think of the frame buffer with this file interface and you think it’s "read" and "write", it's a little more specific than that, because it was a sort of messaging protocol, using "read" and "write" to make it more efficient, but think of it as just reading and writing the display. The way that the window system worked was it was just a multiplexer, all it was just a multiplexer and it read and wrote to the display in order to provide windows the functionality they needed. Then to the clients of a window it provides exactly the same protocol.



So you have multiple windows running talking to the display, the window system sits here and just directs the messages for the appropriate window down to the frame buffer underneath, handles clipping, gives you refresh events up. The point is that a window, in the same exact model as all the other things, the semantics of the window, not only its interface was exactly the same as of the frame buffer underneath it. Their window system was a trivial program: all it did was multiplex. It was very simple, the whole window system was a couple of thousands lines of code – it was tiny.

17. What was the protocol to write, because you mentioned the special protocol for writing these bitmap files. It was just like a little RPC representation of graphics operations, fill in a rectangle, draw a text string - that kind a thing. But it was represented in bytes that were then written to the device and the device interpreted them.

18. Every "write" would yield an update of the whole Bitmap? No, one of the messages was "Refresh the screen." It was fairly high level, but it worked very well.

19. Over the years you’ve implemented an experimented with a lot of languages. Why is that? Languages are a lot of fun. It is really "notational" fun to try to express things. The first language I did that was interesting, as opposed to just a toy, although in a way it was a toy as well, was the language I did with Luca Cardelli, I think around 1985, called Squeak, which unrelated to the thing called Squeak that’s Smalltalk. It was a quite a while ago. This language we wrote as a research idea to explore how you would express concurrent behaviors in the user interface. It was a really interesting thing for me to think about and Luca is very smart and he taught me quite a bit about this stuff. A few years later, there came a point where actually I wanted to write a window system using some of these ideas at a higher level, with concurrency as a model.



Squeak was just a toy, you couldn't write actually code in it, it didn't have variables. But I decided to write a full language although it was still just an experiment. So I wrote an interpreted language called Newsqueak, which looked a lot like C, but had pure value semantics - everything was by value. But more importantly it had concurrency and the by value stuff was to make concurrency work well. So it was a garbage collected, concurrent language, lots of what we would today call threads but were really just coroutines implemented by the interpreter and did very fine-grained threading of the various things running.



You could have hundreds of hundreds of parallel or "concurrently" executing things. With that I wrote a really fun toy, it wasn't something you want to use for real life, but as an architectural experiment it was a really interesting study of how a window system could be written with a concurrent language as opposed to an event driven model in a sequential language. I learned a tremendous amount about how to think about these things and how to write concurrent systems of fairly large numbers of communicating things, some independent, some cooperative. That led into several other things along the way. Phil Winterbottom at Bell Labs took a very similar approach to concurrency and he built a truly compiled language called Alef.



It was a part of the Plan 9 environment for a while, but it was too hard to maintain the C compiler plus the Alef compiler, so eventually we took the ideas of Alef the best we could and made them libraries in C. but one of the problems with concurrency is knowing who owns something. If you are passing data around between different, let's call them threads, it is not obviously a priori when it is safe to free something. The owner creates it and gives it to somebody else, who is responsible for cleaning that up. It is very important that you have garbage collection in a concurrent environment.



It is much easier to write concurrent programs when you don't have to do all the bookkeeping, to know when something needs to be released or reference counted or whatever. Alef was not garbage collected and of course neither was C, so it wasn’t really as nice as it might have been. But still with that library I wrote a couple of other window systems and various user interface tools to sort of continue to explore that and found it was a nice way to think about these things. Languages for me became a way to explore concurrency. There is a language I did at Google called Sawzall which I did with Robert Griesemer who also works with me on Go, and it was very different because it was extremely parallel but not concurrent.



It was written for analysis of very large data sets with many millions, or even billions of records.

20. Was that the MapReduce thing? It is related to MapReduce, it is a language that uses MapReduce, but you can write very small Sawzall programs, that will execute in ten thousand machines at once, but there is no concurrency in the language. Instead each program, you write the program and that defines what to do on one record. And there is no inter-record processing in the language, but it has data aggregators called tables that let you do things like count, or pick the most frequent elements, or do histograms and things like that. You write this query in this very procedural language, it is this pretty simple language, it is procedural it is not like SQL, then you devise some variables appropriate to this record and then you store them in a table which gets updated and instead updating it, it merges all the results from all the records.



It was very successful for doing analysis of these large datasets, because we could get programmers who were not comfortable with parallel programming to nonetheless be able to make thousands of machines work on the problem efficiently together. That was kind of interesting because it was really parallel but not concurrent. And then of course, GO, which is more recent.

21. Back to coroutines which are a quite old idea, but many programmers today wouldn't know them if they saw them. You mentioned that Newsqueak was the language that used coroutines. Newsqueak was a single process in the most general sense and the independently executing - I don't really know what word to say, they are not really threads because they are all running within what we would today call one thread. But the idea is that you run the instructions of one for a while and then you run the instructions for another one for a while, and then you might do this one, and this one and there is a little tiny scheduler that switches between them. They were coroutines because only one of them was ever running at once, in effect you switched from one to the other and back or over here and back. They never ran together. That made it very simple to implement but it also nonetheless made it possible to think about concurrent programming.



One of the misconceptions about concurrency is people often confuse concurrency and parallelism. Concurrency is a programming model that lets you express things that are independent, as independent executions and parallelism is about running two things at the same time. They are not the same idea and a lot of people confuse them, they think they are the same thing, but concurrency and parallelism are best thought of as a different idea. You can write beautiful concurrent programs with no parallelism whatsoever and you can also write extremely parallel programs that are not remotely concurrent. Concurrency is a model, parallelism is a result - maybe that's the best way to say it.

22. How did your coroutines, you mentioned you had a scheduler. Was that scheduler preemptive ? It was preemptive. It was a threaded interpreter [Editor's note: take a look at: http://en.wikipedia.org/wiki/Threaded_code for an explanation of threaded code which is a compiler/interpreter implementation technique with a somewhat unfortunate name; it has nothing to do with multithreading or concurrency.], and so every program was a sequence of things to do, there were function pointers but you can think of them as bytecodes. Every little piece of code that you were running was one of these somewhere in memory and each executing process (I’m going to use the word "process" because I’m going to use it even though they are all running in one UNIX process) would run for while, because the interpreter would just run it and then it might decide to give ten timeslots to this.



So you run ten of his instructions and then switch to another one and run 15 of his instructions, and then switch to another one and run 36 of his. It was pseudo-random, it just switched at pseudo-random times between them. So the programmers were really running at totally different speeds and different rates to make it truly concurrent in the sense that they were independently executing. Of course if you reached an IO event in one of them, then it would force a scheduling. If one of them needed to block for IO, then you would stop it and start another one that could continue to run. If nobody could run, you would wait for some IO event to trigger the wake up. It was a tiny little interpreter, 10 lines long or something - it was kind of cute.



Actually I wrote a paper about Newsqueak and it talks about the implementation of that model. It was very simple, but it made it easy to think about concurrency without worrying about locking and mutexes and memory barriers and all that kind of stuff.

24. Have you also done work with coroutines that talk to each other in some way – the CSP idea? Newsqueak was about trying to make CSP work in a programming language that you could actually write code in and so Newsqueak had channels which were not in the original CSP and they were mentioned in the CSP book that Tony Hoare wrote, but they are not really in the same idea. The original CSP much like Erlang, when you send a message you send it to a process or what you might think of as an identifier, a process ID. A channel is a different idea, a channel is a thing that actually you hold one end and you send a message on it and whoever is at the other end can get that.



If that isn't clear, think of the difference between a file and a file descriptor. If I want to write a file, I can say "Put the data in this file", and that is pretty much like delivering a message to a process so you can write to the file. But if I have a file descriptor, then I can open the file, give you the file descriptor and you don't know which file you are writing to, in fact that file descriptor you can give to somebody else and he could write to the same file. It is an indirection between the file and the handle used to talk to the file. There is a lot of power in that. Channels are the same idea for processes, instead of having a process, you have a channel to a process. In fact a channel to anything, any process could do it that has the other end of the channel to talk to.



In Newsqueak and all the other things we’re talking about, these channels are first class values. So, for instance, you can have a channel whose value that you send is itself a channel. Channels are typed – you can use a channel of integer, a channel of string, whatever. You also have a channel of channel and that means if we are sharing a channel then we’re talking over that channel, I can send you a channel and you can receive the ability to talk to the person who is at the other end of that channel. It’s like I’m sending you a phone call and you can pick up the phone and talk to whoever is on the other end. That higher order idea is really interesting and a huge part of what makes this model work nicely.



A typical example of this is a multiplexer. If you are some service that’s doing something and I want to send requests to you, if I have many requesters, we can all talk to the same scheduler deciding who is going to distribute the work on the other side, but all the guys sending requests can include a channel that goes only to them. Then, the worker, when it gets their answer, can send it directly back to the original guy, without having to come back through the same path. That’s a very simple example of that, but they act a little bit like capabilities, because if you have a channel in your hand, it has the power to communicate to whoever is on the other end. I don’t really work in security, but the same idea applies: let’s say we have authenticated the right to use this channel.



I can give that to somebody else and now you can speak for me on that channel. In that respect it’s very much like capabilities in security. A file descriptor has the same property. I can open a file descriptor or give it to somebody else to write it on.

25. Have you done any work with capabilities? No, I haven’t, but as I said, file descriptors and channels have some of the feeling of capabilities in this respect. And it’s a very different model from the Erlang process id model. I wouldn’t say one is better than the other, they’re just different and the patterns you write are very different.

26. Channels are synchronous, right? The channels in Newsqueak were synchronous, in Go, which was much later you have the ability to decide when you create a channel whether it’s synchronous or buffered. It’s not a property of the type, it’s a property of the value.

27. Writing to a channel in Newsqueak, for instance, would trigger the scheduler? Yes. These two things are running and this guy wants to communicate, so he is the one who is going to send the value. He blocks until somebody else reaches the point where they want to receive from that channel and then the messages exchange and then they go. Similarly, if this guy gets there first and wants to receive, he obviously has to wait for a sender to get there. So it’s actually just symmetric. Whoever gets there first waits for the other guy. Once they’re both there, you exchange, which is typically a scheduling event itself and then they go on.



If you have a buffered channel, which Newsqueak but Go does, then the sender might be able to put something in the buffer and just keep going. But the receiver obviously has to wait until there is something available and if the buffer has got values it can go directly. If there are no values in the buffer, it’s just like reading from a synchronous channel.

28. Was Occam in any way an influence? Occam was not an influence, but they shared the common origin in CSP. Squeak and Occam were roughly at the same time very different languages. Occam was also designed for the Transputer a very specific piece of hardware they had in mind. And it is again this idea of talking to a process rather than a channel, although I believe one of the later Occams had channels.