1. I'm Sadek Drobi. I'm here at Erlang Factory with Simon Peyton Jones and Joe Armstrong. Can you please tell us about yourselves and what you've been busy with lately? JA: I'm Joe Armstrong and I'm at Erlang Factory. I've just been to a very nice talk where Simon has told us about the birth of Haskell and Erlang and how they point along parallel routes solving the same problems. I think we can talk a bit about that, because in your lecture you said things about "We tried that and it was an absolute disaster". We also tried - you know this making everything parallel - we did that as well.



SPJ: Was it a disaster?



JA: Yes. You know this bit about you deleted all the people's code? We've done that as well.



SPJ: I'm Simon Peyton Jones, I work at the Microsoft Research in Cambridge at the moment. Previously I was a professor at Glasgow University. I've been doing functional programming now for 30-odd years, because I got addicted to it when I was at university and never been able to give it up. It's like a drug, but a good drug. Haskell is a programming language evolved because a bunch of different academics were working on lazy functional programming languages around the world, but they were all different languages and we realized that if we'd just agree to a common language, then we would be able to share a lot more work together, so we got together and formed a committee, which wasn't necessarily a very promising way to go about designing an elegant and beautiful language. But actually the committee worked very well, because we had enough common goals. That was in the very early '90s. Haskell has come a long way since then. Many languages die fairly quickly after they've been designed, but Haskell has lasted 20 years, as has Erlang.



JA: Erlang had a different start than Haskell. It started in an industrial environment. It started in the computer science lab at Ericsson and the original goal was to find a better way of programming. It wasn't actually to make a programming language at all, it was somewhat after the event that we discovered we created a programming language - that was kind of accidental.



SPJ: But you also had a very specific application at least, didn't you? You really were telecoms.



JA: We wanted to just program a small telephone exchange in the best way possible and we wanted to just steal our ideas from different programming languages and put them together and just try them out and see if they work.



SPJ: You knew the application you needed and you knew that concurrency was key - concurrency and robustness was key.



JA: I remember back in the '80s going to conferences and talking about shared memory systems, and I always poked up my hand and asked the same question over again - "Well what happens if one of these shared memory things fails?" and they would say "Well it all fails". That's what we didn't want, you know. We make products with... a standard Ericsson product should have a downtime of four minutes per year or something like that; or better than that. Our key problem is handling failure. Right from the early days, we thought "Well the only way to handle failure is no shared memory and no transaction memory or anything that locks things" and that's why we copy everything. That's why we have a sophisticated error handling. So, here we are back in the '80s, about 1984 - 1985, just thinking how we can put these things together and it was only later that we realized "Oh, we've made a programming language and better give it a name" and people started using it and such.



SPJ: It never struck me quite as positively as this before, but Erlang was born very much out of a demand pool. You had a telephone exchange, you wanted to program it and the language happened to come out of that. Haskell was the exact reverse. We were on a mission from God to explain why laziness was fantastic for the world, so it was very much technology push. We wanted to explain functional programming, lazy functional programming in particular, through the medium of a language. You were at this end and we were at this end.



JA: But in the same point in time we were doing the same things. I remember one of the things we did in Erlang, we... First of all we implemented it in Prolog and I did that. Later, Robert Virding came along and Robert was very into parallel concurrent languages. We made one of our first big mistakes, which was announcing what the performance of something will be before you've done it. You've made that one!



SPJ: No, no, no! We'd never do that!



JA: We cross-compiled Erlang to Strand, which was a parallel language and we implemented a language like that and we happily told everybody "It will be 6 times faster" before we'd actually done this.



SPJ: Was it?



JA: No. It didn't work.



SPJ: Didn't work at all! Naught times faster!



JA: The first thing we did, we wrote an Erlang program, we cross-compiled it to all this parallel stuff and we had 3 Erlang processes and it was as parallel as possible and suddenly there were 6 million processes in the machine, 6 million threads - we got far too much parallelism. At that was the same time you said, in your talk, "Well, we tried to make everything parallel and it was just far too much parallelism." We did the same stuff at the same time.



SPJ: A nice remark I remember you making about Robert was that he came to you one day and asked to make a few small changes in the compiler. Then, the next line in your paper was "Robert is incapable of making a small change to anything, so he completely rewrote the whole thing."



JA: He left a little comment at the top, "Joe first wrote this", and then everything is completely different. We used to rewrite each others' code.

2. Haskell and Erlang have 2 distinct models of concurrency, right? Haskell is going to side-effect free, Erlang is about messaging. Can you talk a bit about this in contrast to both models of concurrency? SPJ: I suppose Haskell initially wasn't a concurrent language at all. It was a purely functional language and we had the idea from the beginning that a purely functional language was a good substrate for doing concurrency on. But it turned out to be a lot harder to turn that into reality than I think we really expected because we thought "If you got e1+e2 then you can evaluate e1 at the same time as e2 and everything will be wonderful because they can't affect each other because it's pure."



But it turned out to be very hard to turn that into actual wall clock speedups on processes, leaving aside all issues of robustness or that kind of stuff, because if you do that style of concurrency you get lots of very tiny fine-grained processes and you get no locality and you get very difficult scheduling problems. The overheads overwhelm the benefits, in short. From there, we've evolved generally towards giving the programmer some control over that kind of concurrency. That was the first form of concurrency that Haskell grew. It was a controlled form of this implicit level of concurrency. We added other things afterwards that we'll talk about later, but it was very different to Erlang's model of concurrency.



JA: Erlang started with just this pure process and copying everything. The reason for copying was error handling. Error handling was central. In my view of the world, the view of the world I've always had was all these little processes all talking to each other in a global namespace.



SPJ: But that already, as you know, was completely alien to Haskell, which is just a functional program which is evaluated, so a totally different model.



JA: In the first incarnation of Erlang, it was just... little black boxes were communicating, copying their messages - it's a mailbox model, copying to a mailbox. What we were doing was relational. We had Prolog processes inside the black boxes.



SPJ: But you saw the light.



JA: We had to. You are launching a rocket program, because... We wanted from a black box perspective, if you send a certain sequence of messages in, you want the same sequence of messages to come out. You want that to be reproducible and deterministic and Prolog isn't like that. It backtracks, and it has things like that. It just sort of became more natural to make it functional. We never made a decision about having types or not having types. That wasn't an issue. We started with Prolog. Prolog didn't have types, so we got the dynamic type system that Prolog had. The issues we were interested in were limiting errors, propagation of errors, restarting things, restarting bits of the system without taking down all the system, having things which may appear inconsistent while you are upgrading them, continuously evolving systems, not systems you stop and restart.



We can't stop our systems and globally check they are consistent and then relaunch them. We incrementally change bits and we recognize that they are inconsistent under short time periods and we live with that. Finding ways of living with failure, making systems that work, despite the fact they are inconsistent, despite the fact that failures occur. Error models are very sophisticated. When I see things like Scala or I see on the 'net there's this kind of "Erlang-like semantics", that usually means mailboxes and message boxes, it doesn't mean all the error handling, it doesn't mean the live code upgrade. The live upgrade of code while you are running a system needs a lot of deep plumbing under the counter - it's not easy.



SPJ: I still think it's amazing you can make that work at all.



JA: Phil Wadler saw it... You know, you can have two versions of the same model with all the same names, he didn't believe it was possible until we showed it to him.

3. You mentioned some other language got inspired from Erlang and from Haskell, also about concurrency. For example, Scala. The dynamic part of Erlang actors in the same way that languages like F# and C# implement features coming from functional programming, more specifically Haskell. Can you talk a bit about this and how things get implemented in the mainstream or in other languages? JA: I don't know. People say "What will happen to Erlang?" and I have no idea. There are certain languages which are very influential, but don't get widely used, like Smalltalk, for example. Smalltalk is the ultimate object-oriented language and it has influenced Objective-C and all sorts of languages. It has this core band of pure Smalltalk programmers who don't believe in anything else, but it wasn't destined to take over the world. Erlang is probably... I don't know, maybe it's in that category, it influences other languages... It comes from a funny place, it comes from Ericsson, it comes from a telecoms company and that's really not our main business, to make languages.



Microsoft's main business is to make languages, but you have the sort of muscle to support a language. We have languages supported by companies like Microsoft and Sun and Java and C# and things like that. Then, you have languages not supported by big companies like Ruby and Perl, and they have their own communities. Erlang is in between - it's not in the sense of Microsoft or Sun being supported in the big scale, but it's not living all by itself in an open source community. It has the financial resources to do the compiler and keep the core clean, which is very good. Where it goes from there, I don't know.



SPJ: But nevertheless, there is this company that supports it - there isn't for Haskell, curiously. Microsoft is generously supporting myself and Simon Marlow, but they are not supporting us to work on Haskell. I'm hired as a researcher and I happen to work on Haskell.



JA: It's a good change to work on C#.



SPJ: In principle I could. I don't think it's going to any time soon. I suppose Microsoft could turn around and tell me "Please don't work on Haskell any more."

4. Like they did with Erik Meijer, right? SPJ: Erik is a whole different ball of wax. Erik works in the developer division in Microsoft and that's much more product focused, but amazingly Erik has brilliantly found a way to meld ideas from functional programming in quite a product-focused kind of way and build a little research bubble in the developer division that they perceive as being actively valuable to them. I think that's amazing! Whereas I'm a mere parasite. I'm a researcher who isn't required on a month-by-month basis in the way that Erik is to produce immediate value to the company.



But I think Haskell is also a bit different to Erlang.



I think of Haskell as a... Its principle value is a kind of ideas factory. It's a laboratory in which ideas can grow and flow. It has a very active user community and a very active research community. Lots of research papers use Haskell as their substrate and indeed use GHC, the compiler that Simon and I built, as a substrate for their work. It's a laboratory in which ideas can go in that may then get transferred elsewhere. We've seen lots of examples of that, not perhaps specifically for Haskell or for functional programming in general, going right back to garbage collection, which is now taken as "Of course you have garbage collection", but it was a long time before that was taken as wrote, it was a long time that people thought it would never be taken as wrote, but that grew up in the functional programming community.



Generics in Java and C# grew up in their functional programming type system world and are now taken for granted. LINQ, the language integrated query system in C#, is heavily informed by ideas of functional programming. I'm actually quite happy if Haskell serves as a role of generating ideas then move into the mainstream. Whether Haskell will itself ever become a truly mainstream language - it has hundreds of thousands of users, but not hundreds of millions of users, so it's a completely different kind of scale than Microsoft's product languages.



Whether it will ever become a language on that scale, I don't know and I wouldn't want to give up being an ideas factory in order to get that. I'm just delighted that my colleague, Don Syme, who is 3 doors down the corridor, has developed F#, in which now actually there's a kind of pipeline, so now he can get pipeline ideas out of Haskell into F# and out of F# into C#. And I'm also delighted that he successfully made the case to Microsoft to turn it into a product that the company does support in a way that Erlang is supported and Haskell is not. F# really is a Microsoft product. That's a huge breakthrough for a big mainstream company like Microsoft to support a functional language. But the ideas factory is the bit that I... that is the most important thing, the high order bit.



JA: That's fun, isn't it? It's the same in Erlang. I think in a way you know your graph, and it's like you have this initial enthusiasm, you get up to there and there's this sort of plateau where not much happens. I think that happened in Erlang as well. I don't understand why, it seemed to be very stop-go. You work on things, then suddenly something happens and then I think it's a time taken to ingest the ideas, these ideas are fermenting in people's brains for a long time.



When people started being permanently connected to the Internet with broadband connections - that's a quantitative change. You saw the file-sharing networks. It's only about 4-5 years ago that people started deploying distributed algorithms. In the first few years of that they said "What shall we do?" Nobody knows so they invented Bittorrent and things like that. You have social networking programs, you have things like that.



Google Wave will come along and replace email in a few years time. Suddenly people want to know how to write distributed programs, which they never wanted to do before. So, how do you do distributed transactions? How do you do consistency? It was always a pretty obscure part of computer science, these distributed algorithms. Then we said "We've got these to do parallel. How the heck do we program these things?" In Erlang or Haskell, these algorithms are just difficult, but in other languages they are...



SPJ: Downright impossible!



JA: Yeah, right! I mean even if you have a very clean, pure programming language and you take Paxton algorithms or something like that, they are complicated things. They make your head hurt. There is a whole branch of mathematics, a whole branch of computer science to understand distributed algorithms and they live at the bottom of these social networks and things, ticking along. So I don't think you are going to see Erlang replacing Javascript or anything like that in browsers, but where you are seeing them being deployed is in the infrastructures, in the clouds and things like that, to glue the bits together properly, because that's where you need the complex algorithms. We are seeing an infrastructure-building things, cloud computing and what you are doing inside modern massive multicores, how do you organize computations. There where it's being used a lot.



SPJ: Whenever you've got concurrency and multiple processes working, you need to be very careful about side effects. Otherwise it just does your head in. Something that Haskell and Erlang both share is being careful about effects. Haskell is sort of super-careful and Erlang is merely careful, but in both cases, we don't have this, unrestricted side effects all the time, the computational fabric being effectful. It seems to me it makes it jolly hard to write programs that exploit multithreads.



JA: I didn't really know what thread safety was in Java, so I wrote a little Java Swing thing and of a Java friend I asked: I wrote this Java process and it worked fine. I could create one window, and then I created 2 windows in a graphical program and I drew a rectangle in one and I drew a rectancle in the other and it crashed. And I said "Why did it crash?" And he said "Well the Swing library's not threadsafe". Now, what does that mean? It means if you got one thing that works, you do 2 of them in parallel, they interact in strange ways. I thought "How can you program like that? It's impossible to program!"



If you got this non-thread safety, maybe you got this code that's threadsafe that reads stuff and this code that's threadsafe that writes stuff, you try to compose into a program that reads and writes and then you go "Oh, it didn't work!" Then you are scratching your head and thinking "Maybe something is wrong. Let's put a mutate or a lock around the whole thing" and it works for a while and then somebody else has forgotten to lock it. What happens when it crashes? How is the failure model integrated with the locking models? Because if you lock stuff it seems you don't fail and you release the lock and then you hide that in the libraries and nobody can see and you inherit stuff from there and you've got a mess.



SPJ: One of the ways that I speculate that functional languages may end up influencing the mainstream, is that mainstream languages will gain a larger and larger subset of purely functional programming, that will make it easier to write functional programs that don't use side effects a lot. So, you can get more and more computation done, useful and complicated computation, like the algorithms you were describing, without using side effects. I think that, as they get richer subsets - F# is an example of this because it's built on a substrate that is completely imperative. The .NET system is an imperative system, but F# makes it easier - it reduces the barrier to entry for writing functionally. I think that's a trend that we will see continue.



JA: What I noticed is that the Erlang libraries, they are pure libraries, and the pure functions are incredibly reusable. They just work forever. You do them, you test them and forget about them and you can reuse them in many different contexts that you hadn't thought about. The goal in system design is to divide your system in such a way that as much as possible of it is pure and the messing stuff's on the side.



SPJ: That's why you need those types, Joe!



JA: No, that's why you need good error recovery mechanism, because even with your types, you need to test it. First, I want to write a factorial program, my factorial program will say "factorial of n is 42" and the type inference then will say "This is a type int to int", so it's OK, it's correct. You have to say factorial 3 is 6.



SPJ: You are misrepresenting me! I never said that types guarantee your program is correct, but you were talking about segregating events and types are a very good way to segregate things.

5. In the same topic, in Scala, which we say got inspired by Erlang, for the actor model, actually they don't copy memory, they don't copy data structure, but they rather share them and there are no constraints on the data structure, so it could be mutable and it could be shared. Martin Odersky, father of Scala wrote a paper about using types, be it inferred or not, type annotations for guaranteeing that the reference is not consumed concurrently. So, you can use mutable data structure, which is very nice for performance reasons, yet you get some guarantees of not having race conditions. Can you talk a little bit about that from the two perspectives? SPJ: First thing to say, Scala is not just a knock-off of Erlang. It's a whole sophisticated language which rather elegantly combines functional and object oriented systems, in a rather innovative and unusual way. But I think you are right - one thing you can do with Scala is implement an Erlang-like model. Once you've got an imperative system, in which you can mutate things, it is possible to use type systems to constrain exactly how you mutate them so that, for example, ownership types in object oriented languages are an example of doing this.



I don't know about Scala's... I don't know the paper that you are describing of Martin's, but systems that allow mutual objects to flow around and be used once and be owned by different threads can work, but they tend to be rather complicated. My gut feel is that you need type systems that tread this narrow line between being expressive enough to say what you want and not being so complicated that nobody can understand them anymore.



Ownership types for a conventional object oriented language are a way of controlling some of these things, but that feels to me like taking a system that's already using too many effects and trying to retrospectively patch it up, is better to start - from my preferences - would be to start with a system in which effects are (a) few, because the functional subset is very big and (b) rather constrained. Now, Haskell has a rather crude distinction between effectful programs and not. That's not enough to just say it has some effects, might not be enough. That gets you back into the land of controlling effects more precisely, but at least it's a smaller piece of your program.



At the moment I'm agnostic about whether types are the right mechanism for trying to do this much more fine-grained control of effects in which you are saying "This reference is owned by that thread for this period and then the ownership moves to that." That's a rather sophisticated thing to do. It's very clever, I'd be very interested in papers written about it.



JA: I don't like the idea of mixing things too much.



SPJ: What sort of mixing?



JA: If you look at Haskell, Erlang, Scala and F#, what do you see? If you look at Haskell, you see something which, within its context, within the little framework it sets up for its sandbox, it's very consistent, it's very beautiful. You look at Erlang, it kind of fits, the bits fit together nicely. Of course, they don't fit together nicely with the JVM or with .NET or anything like that. If you want to use all the nice things that are there, you can't use them, or you can use them, but it's difficult. So the other approach is, you say "Let's use the JVM and target lots of different languages, so that the different languages can use each other" or you can do that within the .NET framework, you get Scala and you get F#.



The benefit there is you can use all these other things that are available, but in order to do them, you have to break them, massively corrupt and break these abstraction boundaries and I don't like that. I think you are breaking abstraction boundaries in the wrong place. How I would like to see systems built is through communicating back boxes. And I would like to see the type systems applied to the definition of the protocols themselves and I haven't seen that done.



SPJ: That's been lots of work on that kind of stuff in the type systems for contracts.



JA: Yes, I've done some stuff myself, but it seems to have very little impact on how protocols are designed. Protocols are designed, typical ways are, if you write RFCs and things like that and they are defined in English with text and a lot of people think they are defining protocols, but they are in fact defining message structures and they don't say anything about ordering...



SPJ: You must send an A and then a B and they must be all applied.



JA: Yes. Classically, an API for files, say, you can open a file and you can close a file and you can read a file and you can write it. They don't tell you that you have to open it before you can read it, do they?



SPJ: The very idea that there is a thing that you do things to it is part of what gives rise to that problem.



JA: In a lot of programming languages, if you have a file open and read and write, the programmer is supposed to know just by magic that you should open it before you read it. That makes sense on one processor - what does it mean on a parallel program? What happens when you have 2 things that try to open the file at the same time? Do you get 2 copies and one of them fails? Is it like a transaction? Is the file put back to the old state? If you start trying to read the APIs, you can't find the answer in the API. John Hughes was saying "Did you know that if you open a file and write it and rewind it and do this and do this then this happens, which is very counterintuitive" and you try and look in the POSIX specs, and you can't find any reference to what they should do. This good check threading discovers things like this. If you are doing I/O properly, in the sense that you are doing it, you don't do it at all and save it all up to the end, then do it! A lot of those problems go away.



SPJ: We don't actually. It just looks like it. We certainly don't save it all up to the end!



JA: You're fibbing!



SPJ: We are tackling the awkward spots. There are very detailed operational semantics given that explains that I/O happens as you go. It's no good if you've got to print the message and then read the response and then print something else. You can't save that up to the end!

8. With any feature of the language, you define a mental model for the programmer, so that the programmer can reason about this feature. The implementation tries to get in sync with this mental model, but it's not always true. In some cases, it gets far from it. It's the same thing with processors, when you are sending messages. It's not exactly the same thing, because you need to copy or you don't need to copy. Sometimes it's not performant to do it that way, you do it another way. What do you think of this problem? The same thing with I/O, that as a programmer I think that is happening by the end, but it is actually happening on the fly, it's happening right away. SPJ: It is happening right away and by design. I think the presentation that I gave today which said the main program is an I/O action, which is gotten by stitching together other things, that's a first approximation. I've taken quite a lot of trouble to add a tutorial that explains in rather more explicit details. I mean, that model doesn't work at all for a concurrent program, for example, when you are spawning threads that have got to talk to each other, of course you aren't going to save all that up to the end.



So no, the I/O monad is an action that runs in an interactive fashion with other I/O threads that are going on at the same time and it's possible give a much more precise model for what's happening than say you save up one big thing that's performed at the end. I think the truth of it then, the one big thing you do at the end is just an oversimplification you might do for your first program that just printed 3 things, but then you have to get a slightly more sophisticated understanding of what's going on.



Or, alternatively, just bail out into "well it looks like a C program and actually it runs like a C program", you say "Do: print this, read that, send this message to this other process" and that's exactly what happens, just as if it were a C program. You can look at a monadic I/O Haskell program with 2 sets of spectacles. You can either think of it very much like an imperative program and that's a very accurate model actually, or you can think of it as the compiler does in this more functional way and that's what means that the compiler's optimizations are actually robust. I suppose that's the way I deal with the dissonance, is to try to say "that was an oversimplification to start with".



JA: You don't tell all the truth.



SPJ: That's right, the truth in bits.



JA: For example, it might sound that when you send messages between Erlang processors, your mental model of what's happening is you copy everything. And indeed we do copy everything, apart from - now there's small print - if it's a big binary and you are on the same processor, then it's in a shared heap with a reference counting garbage collector.



SPJ: And it's immutable, of course.



JA: Yes, but it's like... What's it called in operating systems, when you have a process, if you don't write to the memory, you've got the same memory, but you do copy on write. So, when you write to the memory, you split it into 2. The binaries behave like that and they are in a shared heap.



SPJ: You have to be careful about your cost model. If somebody suddenly redistributed that, so it suddenly was on a different processor...



JA: If you distributed, then it is copied.



SPJ: That's what I mean - suddenly the program would run at a very different speed.



JA: Even that statement, that it's on a shared heap and binaries are not copied, is not true for small binaries because small binaries are copied, because it's just not worth sharing them and doing things. Even that statement is not quite true, because sometimes you keep fragmented ones - there is an awful lot of small print.



SPJ: It's a business of a runtime system to provide a simple abstraction for complicated stuff.



JA: Should a programmer know about that? No.



SPJ: No. But we are not talking about the cost model, right? Whereas you were talking about the understanding of what semantics of the program is.