1. We are here at TechMesh 2012 in London, I’m sitting here with Duncan Coutts, so Duncan, who are you? Well I’m a Haskell enthusiast, I guess and also a Haskell consultant, I run a small company and we are called Well-Typed and we help companies that use Haskell. So we do training and consulting and development and also other cool stuff related to Haskell.

2. Why do companies need help with Haskell? Well, they might be using it heavily in which case they might want help with the tool chain, support, improvements to the compiler , integration between Haskell stuff and whatever they are doing or they might be just thinking about getting into Haskell. So they might heard of this great idea and want advice or help or they might have quite a few of our clients are small start ups and they might have one Haskell programmer or two and their manager just wants someone to build a core as backup, so we do that sort of thing and we also do quite a lot of work with a sort of R & D technology transfer kind of things, so working with Academics, University or Microsoft Research and getting technologies from Academia into the real world, so a variety of things related to Haskell.

3. Your talk yesterday was about a rather interesting topic nowadays, which is parallelism and concurrency, so can you tell us what is the difference between parallelism and concurrency, aren’t they the same? Yes, so people often use them synonymously but it helps to clarify your thinking by describing what the difference is. Parallelism is all about trying to make your programs run faster by using more hardware like multiple CPU cores or GPU cores. Everyone wants their programs to be faster and parallelism is one way to do that using more hardware. Concurrency is about how your program is structured into multiple threads of control like threads or processes and those threads or processes can synchronize by locks or mutable shared variables or message passing, so concurrency is really about how you structure your program whereas parallelism is about making it run faster, so those are orthogonal things, you can clearly do both at the same time, if you have multiple operating system processes running on multiple cores, then you’re doing concurrency and you’re getting parallelism, You can also do concurrency without parallelism of course, that’s what we did before we had multiple cores, we still had multiple processes running on the same core, just being multiplexed, so that’s concurrency without parallelism. And then of course most programs are not concurrent and not parallel , but is also possible to do parallelism without doing concurrency and that’s the really interesting one and that’s what the focus of my talk is about.

4. How do you do parallelism without concurrency, can I do that as a Haskell programmer or do the Haskell implementers have to give me something to do it? Both. You get to write programs which do not express any concurrency themselves, which means you can’t make any of the concurrency mistakes, you can’t introduce deadlocks, there are no locks, you're not writing anything which has locks, you are not describing any threads. There’s no non-determinism. So all these nice things, you just write a declarative program and somehow the underlying machinery evaluates it in parallel. You have to give some hints or descriptions about how it should be evaluated in parallel or what should be evaluated in parallel but you’re not explicitly writing anything that is concurrent.



The people who do the implementation they have to do a lot of hard work to make that happen but it means that you get to write a nice program. Let me give you an example, to make it a bit more concrete: SQL and the database engine, a database execution engine. When you write a SQL query, that’s very clearly you are not writing a concurrent program. It’s declarative, pure and you expect the answer, there’s no non-determinism in a SQL query, it should always give you the same answer given the current state of database, it’s a kind of pure declarative thing but when it’s executed by the database, the database has the opportunity to execute that query in parallel, not all the standard open source ones do that but maybe Oracle does, it’s possible.



So that gives the distinction between what the implementation is doing, the evaluation mechanism, may be using threads and locks, it may be using concurrency, but it’s achieving parallelism, but the program that you wrote was not concurrent, which means you couldn’t make any of those concurrency bugs, you couldn’t introduce any of those mistakes. So that’s an example, SQL is not a full program language obviously, but that kind of idea of the program you write, having no concurrency in it, but the way it’s executed, runs on multiple cores. That idea can extend into programming languages. And so, Haskell has several different ways of doing this, libraries for doing parallelism without concurrency.

5. In a way, you have to give Haskell or these libraries your problem in a high level declarative way, is that how you achieve that? Yes, one example is describing algorithms that work on arrays, like bulk operations on arrays and this is something, for example, that you do in Mathematica or these sorts of systems, you describe your algorithm in terms of operations on vectors, arrays, matrices and then the library and execution engine takes care of splitting that work up and doing it on multiple cores, so you write a program which just looks ordinary, sequential, purely functional and yet it gets translated into a program that runs on multiple cores.

6. So essentially in Haskell you have parallel map or things like that, is that how it is? Parallel map is a nice simple example of that sort of thing, that’s part of this idea of “parallel evaluation strategies”. So you’ve got pure expressions; because we're working in Haskell, these are expressions that don’t have side effects, so it’s always possible to evaluate them sort of more or less in any order, it doesn’t matter, which means you can evaluate it in parallel and you’ll still get the same answer. If you’re working in a hybrid language that’s mixed, functional, imperative, then you’ve got side effects to worry about and you don’t always have that guarantee that if you evaluate it in this order, the side effects would happen one way around or the other way around in a pure functional language or in a hybrid language where you’re very careful then you could evaluate multiple parts of the expression in parallel and get the same answer and not have to worry about scheduling giving you different answers depending on… you know, run it one time it gives you one answer, it will always guarantee you’ll get the same answer and that’s why parallel map is a good example of that, you’ve got a big long list of items and you have a function you want to apply to each one, you can do that in parallel and there’s libraries in Haskell which give you strategies for building things like parallel map or more complicated strategies.

7. What would be a more complicated example? You might have a strategy over a tree or you might have a strategy where there is parallelism within work items, so you might have maybe a short list of complicated structures, so there might be parallelism available within each of those structures, so you might have… so we have a function in the strategies library called parList and it takes as an argument another strategy, which is the strategy to use for each of those elements, so that might be whether you can combine strategies or another good one is… ok, granularity is always a problem with parallelism, no matter what language you’re dealing in. You always have to make sure that the amount of work you’re doing in parallel, each sort of chunk or task of parallel work, is big enough to overcome overheads.



There's always going to be overheads in doing stuff in parallel and you’ve got to make sure that the amount of work that you are doing overcomes overheads. So sometimes what you need to do is split things up to get more work items or combine things together to get fewer work items to get a reasonable number. So you can use strategies like I just mentioned parList, but there is a strategy called parListChunk where it divides… if you’ve got a very long list with lots of tiny work items, you might take a hundred of them together in a batch and say let’s do that in parallel and then take the next hundred and do that in parallel… and that’s reducing the overheads, you’re batching things up and that’s another strategy. So you can take these strategy combinators and combine them together to get something and tune the parameters to work well between two and sixteen cores or something. So that gives different strategies, it depends on what sort of data structures, what sort of algorithm you’ve got.

8. And so essentially you have to avoid sequential dependencies between tasks? There was a talk by Guy Steele “Foldl considered harmful” so essentially that’s what he is saying, you have to…? Yes, he wants you to write in terms of collections and then associative reductions rather than linear reductions because you have to think very carefully about the data dependencies within your language, within your program. That’s always easier when you’re in a functional setting. You don’t have to worry about side effects having hidden dependencies due to the side effects. You don’t have to worry about the explicit dependencies. So you have to stop thinking about lists which are fundamentally sequential and start thinking about arrays or sequences or trees for example and then think about not summing up your list or your tree sequentially with a foldl but doing it in a tree like associative reduction, so that’s what Map/Reduce is all about, you do a map across all the elements and then you do a tree like reduction and you could do the whole thing in parallel. So you still have to rewrite this program to achieve good parallelism, there’s no free lunch, doing parallelism without concurrency, gets rid of the concurrency bugs but achieving good parallel speedups is hard, you still have to think about how your program is structured.

9. You spend your time on the right problems, instead of messing around with locks? Yes, you think about the issues of granularity and data dependencies and profiling your program, you’ve got profiling tools to help diagnose some of these problems. So you expect it to get a speed up because you rearranged the code and you don’t and you scratch your head and look at the profiler and then you change some things and you… often it's down to this issue of granularity, that’s the standard problem.

10. Let’s move on to concurrency. How do you do concurrency in Haskell? Concurrency in Haskell is very traditional, it’s got “forkIO” which is a function that forks off a new thread, so you’ve got threads and this all happens in the IO monad, so if you know anything about Haskell, you know that all the imperative programming that we do in Haskell goes on within the IO monad, so concurrency in Haskell is part of that IO world, so you fork off multiple threads of control and they can communicate via shared mutable variables.



Now of course in Haskell we don’t really have this notion of mutable variables, by default everything is pure and immutable but for the concurrency we introduce these things called “MVars”, or there’s also transactional variables and you explicitly identify the mutable variables that would be shared between threads and those come with a lock, it's a combination of a shared mutable variable and a lock so you can never accidentally read or write the mutable variable without taking a lock. But it’s basically very traditional, still threads and locks and you can still get deadlocks and all those traditional things are still there. On top of that, you can build things like actor libraries and async, futures, all that sort of stuff is built on top of threads and these MVars mutable variables.

11. So these MVars, do they behave like single locations? Yes, this MVar is a single mutable variable, there are mutable arrays as well which you can use and obviously you can build structures of these things.

12. Can you also listen to changes to them? Yes, when you read from an MVar, it’s called “taking an MVar”, so MVars have a sort of empty or full notion. When they’re empty, if I’m trying to read from an empty MVar, I block until someone else fills it, so in a way it acts like a one place queue and that gives you the synchronization between threads, as well as the shared variable aspect. That gives you a slightly cleaner way of doing the same sort of things you do ordinarily with condition variables and mutexes and locks and shared variables. But it’s still fundamentally ordinary, concurrency with locks and you can still get deadlocks by taking locks out of order.



Additionally to that we’ve got software transactional memory, which was sort of originally prototyped in Haskell and it works really nicely in Haskell where it doesn’t work quite so nicely in other languages and that gives you composable concurrency, so instead of MVars, which are locks, you’ve got transactional variables and you perform atomic transactions, transactions consist of multiple reads and writes on a whole set of transactional variables and then you could perform that atomically and the runtime sometimes takes care of doing it and if there was a conflict, rolling it back and trying it again. That gives you composable concurrency whereas normal lock based programming, its just fundamentally noncomposable.

13. How do you compose the transactions, what kind of syntactic means do you use to combine them together? A transaction looks just like any other ordinary IO code, but it’s in a different monad, it’s in the STM monad, so for the duration of your transaction, you’re in a STM monad and the atomically function wraps that up and performs it atomically as an IO action, so inside, syntactically it looks just the same as ordinary IO code in Haskell, you’re using 'do' and you are reading from variables and writing to variables and that sort of thing and then you say atomically do that whole thing and that’s an atomic transaction. So you can write one bit of code in STM and another bit of code in STM and combine them together into a single atomic transaction and that’s the sort of composability that you can get with STM and you can’t get with MVars and with locks.

14. Do people have a preference of MVars versus STM? Do we know the downsides of locks…? So we’ve had MVars around for much longer, so people, I think initially were… it’s taken people a little while, I think, to get into the idea of STM and also people initially worried that they have a big performance overhead but it turns out that’s not really the case. Once you take into account doing everything safely with MVars and that means taking account exceptions, async exceptions, you have to have… you have to be quite careful with your try/catch/finally, all that sort of stuff when you’re using MVars. So once you’re doing that properly, the costs end up being rather similar, whereas with STM, you don’t have to really worry about exceptions, if you get an exception during a transaction, it's just rolled back and you don’t have to set up extra exception handlers on a stack in the implementation. So it means that in the end they come up pretty similar, which is quite nice, I hadn’t really realized that until recently, Simon Marlow has been going around telling people that STM is not as slow as you think it is, it’s actually a pretty reasonable default and the properties are just much nicer.



Werner: Ultimately it doesn’t have to be slower than anything else, because you have to do the same thing anyway, you have to do the same tasks anyway.



Yes, there is a slight overhead to this validating of the transaction log, that’s where the overhead comes from, but when you balance that out against all the other things you want to do with MVars then the costs end up being quite similar in the end and it’s so much nicer from a programming point of view using STM than MVars, so people are gradually moving towards that as the sort of “go to de-facto tool for that sort of thing”.



Werner: And it's not transparent STM, as you say, you have to mark out these things.



Again you're explicitly identifying your transactional variables, because by default all your variables in Haskell are immutable so you are specifying exactly which are your transactional variables and that’s one reason it's cheap in Haskell because you only have to record in your transaction log the modifications to the transactional variables and there may be very few of those whereas in an ordinary imperative language you might have to record… you doing mutation everywhere for everything, that’s how things work, so you’re having to record in your transaction log lots of accesses reads and writes to all the mutable variables that you are using so in the end that’s one of the reasons that’s actually cheaper in Haskell than it is in other languages because we can do it with just the transactional variables that are really essential for doing the synchronization or whatever.



Werner: It seem to be similar to the way Clojure’s STM works, it also uses explicit variables for that.



Yes, again by default they’re pure so you’ve got to explicitly indicate when you want to have a mutable transactional variable.



Werner: It turns out that there are in large Clojure programs, there are only a tiny number of these mutable places that are necessary, so Rich Hickey mentioned in Datomic, which is a database, they need two or three variables.



Yes, that’s very often the case, if you’re going to look at some of the abstractions built on top of STM or MVars like semaphores and queues and all these other sort of things, you’ll find they use a very small number of transactional variables and very often, yes. I was going to say another cool thing about concurrency in Haskell, is about IO, how we do file or network IO in Haskell with concurrency. Because I think this is one of the really good stories about IO and concurrency in Haskell as compared to many other mainstream languages, is that we have proper lightweight threads and traditional blocking IO.



So you know there is the debate about events and threads, performance and how nice it is to program and that all sort of stuff. Basically in Haskell I think we’re doing it right, we have very lightweight threads, you can have tens of thousands of threads, no problem, so following the model of one thread per client is perfectly ok, even if you’ve got tens of thousands of clients and then IO in a sense of file network, input output, is the way you’d want it to happen, you block just that thread, just that thread blocks, all the other threads carry on and then you get ordinary sequential actions in your thread, whereas if you look at a Node.js program, you write something out to the network and then on the call back you chain off to the next thing and then when that one completes, you chain to the next thing and what are you doing? You are just doing ordinary sequential IO in the sense of “Do this, do that, do the next thing…” but each of those things could block, but if you’re using lightweight threads and blocking IO, you could just write it “Do this, do that…”and each of them will block at the appropriate points, so that gives you simple model, this is just simple imperative program.



Werner: The way it’s supposed to work



The way it’s supposed to be, exactly, and even in languages like .NET languages, where they do have threads, so it’s not nearly as bad as in Node.js where you have got to chain of these callbacks, but you still have this async business, because their threads are really expensive, they don’t have proper library threads, they have to, again, they’re chaining these async things together and yes, ok, you can use monads to make that slightly better and blah, blah, blah, but it’s not as simple as just classical, simple, imperative programming , reading from this file writing to a network just in a single block and if you have lightweight threads and blocking IO, you get a nice programming model and you get the performances advantages of events, right? So, in the runtime system it really is using epoll and all that sorts. It’s running , the GHC runtime system is using one OS thread per core and multiplexing your tens of thousands of library Haskell threads on top of that and then it’s using epoll on the next and other similar things on other operating systems to do the event style handling. So you get the performance advantages of that event stuff but the nice programming model of classical, imperative programming, I would admit it's a slightly funny thing to say. In Haskell we have classical, imperative programming, but that is a very sensible model.



Werner: It’s basically an m:n sort of scheduler



Exactly, it’s m:n threading, yes, whereas all the other systems go for this 1:1, which there’s advantages in that too but it doesn’t mean that you can’t do… it means you have to play all these tricks like using async for everything. So we have async in Haskell as well, you can build async and futures on top of… and do that when that’s the right model for what your program is doing. If you really are doing multiple things at once simultaneously, than async and futures is great, but if you look at most examples of async and futures in sort of .NET languages, what they’re really doing is sequential things they're saying “fork this off” do this asynchronously and when this one completes do the next one, and when that one completes, do the next one… That’s not async really at all, that’s just sequential programming, so, we can do the sequential when it’s sequential and asynchronous one when you want it to be asynchronous.



Werner: Basically let the runtime do all the nasty things for you, as it’s supposed to be, let the solved problem be done by the runtime or the compiler.



And it’s very efficient, so the webserver frameworks in Haskell are able to achieve extremely good performance dealing with tens of thousands of clients and scales across multiple cores, and then benchmark it throws Node.js out of the water, it's competitive against things like nginx and others.



Werner: You just made a lot of friends in the Node.js community. You might want to wait for those, but it’s clearly an advantage here, over systems like Java or .NET which have the heavy weight threads and also have to these tricks in async to become competitive again.



Yes, exactly because they want to avoid the model of having ten thousand Java threads and because they're so a heavy weight and that would be a bad idea.

15. Moving on from all the concurrency stuff, you are working on something called “Cloud Haskell”, what’s Cloud Haskell, does it involve a Cloud and Haskell? Well, the cloud part of that is not my fault. What Cloud Haskell really is, it is distributed concurrency in Haskell. So distributed across multiple physical machines, so distributed memory, that’s what it really means, distributed memory and of course these days, Cloud is the buzzword to use, so that’s why it got called, that’s just the way it ended up. So what it’s doing, it is taking the Erlang model of distributed concurrency, that is lightweight processes that communicate by message passing and monitoring and linking for error handling all that sort of stuff. It’s basically a complete copy of the Erlang model because the ErlangErlang model is, by demonstration extremely successful at this.



So it's just taking that model and implementing it as a library in Haskell so that we get the benefits of that model for distributed concurrency and we can use it in Haskell along with the other advantages that we have in Haskell so we can use it with… So we only have to use that for the distributed aspect, you don’t have to structure your entire program as actors, you might structure it using… pure parallelism within a single machine and only using the distributed actors for cross machine communication. So it means that you have a substrate for writing reliable distributed applications. So you have these light weight processes, they communicate with message passing, they can monitor each other, so that if one goes down the other one gets notified of it and that’s the right underlying infrastructure the Erlang guys have demonstrated that, that’s the right underlying infrastructure for building reliable applications. So on top of that you can libraries like… what Erlang's OTP give you on top of that basic substrate that Erlang has, is like supervision trees and also sort of higher level patterns for doing reliable distributed concurrency.



This was an idea developed by Simon P.J. at Microsoft research and we were implementing for him, with him, a new implementation of this, which is reliable and robust and cleaner code and better architected. It was a prototype, and we were doing a new version of it, and we released that about a month ago or so. Now there are people working on OTP style libraries for the supervision trees and that sort of thing. We think it’s pretty good, we’re quite happy with it. We think it’s a very promising idea.

16. Do you see any uptake of it, have people started using it, have you had some experience? People just started to use it because we started telling people to use it about a month ago [Editor's note: the interview was recorded in December 2012], so there are some people using it, experimenting with it and people looking at writing these higher level libraries on top of it… so far so good. We’ve also got, one of the nice things is that we have multiple back ends for the system. Because there are lots of different environments in which you want to do distributed concurrency, you’ve got the Cloud like Azure and Amazon so they’re cloud environments and then you’ve got your local cluster and these are rather different environments: one uses like a cluster job scheduler and the other uses the Azure web interface for managing instances, and the way things are configured, the way they start up are different in these different environments.



So we have these different backends to allow you to… which is sort of customized to these different environments to make it easier to write code for those environments. So we have an Azure backend and we have a very simple backend that just broadcasts on the local network to find peers. So that’s one of the key things that are different between different environments is how do you discover your peers, and you might discover them or might be told about them or you might create them if you’re in a VM environment you create your peers by starting up new instances. That aspect gets dealt with by the backend and then the main part of your program is independent of the backend. At that level you just got nodes and you spawn process on nodes and processes talk to each other, and you have a reasonable separation between the generic distributed parts of the code and a sort of initialization set up environment parts.

17. The backend is basically abstracted over the nitty gritty API calls, how to fire up a VM,...? And so the backends are quite small, that’s quite nice, it’s quite easy to write a new backend. It’s harder to write a new transport layer. We also have a design which allows multiple transport layers, we already implemented a TCP layer but that allows for maybe a TCP with SSL or… People here… Yesterday we were talking about OpenFlow and you could integrate perhaps OpenFlow into that transport layer and your high level Cloud Haskell program doesn’t have to worry about that, it’s just thinks in terms of abstract nodes, ids for the nodes and it’s network that it knows about and it’s separate from the physical real underllying network.



Werner: And the backend is just a configuration…?



The backend just sort of sets things up so it maps between the real peers on the network, the real nodes and these abstract node ids that you have within the system.



Werner: So it’s Cloud Haskell, I guess…?



Yes, you google for Cloud Haskell, you go to the wiki page, there’s a whole bunch of resources and links to the packages on Hackage and videos and other documentation [Editor's note: we googled it for you, here's the link: http://www.haskell.org/haskellwiki/Cloud_Haskell ]

18. Let’s get back to some functional stuff to wrap up. What’s your favorite monad? My favorite monad, well I have to admit I do quite a lot of programming in the IO monad but I don’t think I could say that’s my favorite one. It wouldn’t be really politically correct for me as a Haskell programmer to say IO monad. Tricky, there’s so many.

19. What about the STM monad? STM is a pretty cool one, I don’t use it very much, I use it quite selectively. Ok, there is one I had to write the other day, I had to write a new monad from scratch, which doesn’t happen very often. Normally I just combine existing ones like State or others So I had to write one the other day which was to do with matching and a confidence of matches… you’re trying to look at a bunch of strings and match against various things. So that was quite nice. That’s not a standard one.



Werner: What does it do, it matches… so monads sequentialize operations, I think, is that fair to say?



That’s one thing that it can do, so certainly for State and Error and IO and STM, you’re thinking like in an imperative way, but there are also monads which are like logic backtracking monads and my matching one was actually one of those types, it is able consider many different possibilities.



Werner: Like a parsing monad?



A little bit like a parsing monad or the list monad. The list monad is rather unlike, sort of imperative style monads. It’s more like logic programming that you can select from… take the current value of this variable which might be one of many possibilities and that’s represented by the list of all these possibilities and so the matching thing I was doing it was, in way, kind of a special case of this sort of logic stuff that you’re trying all the possible matches and you need to be able to combine those things, because once you’ve matched this… all the possibilities matching that, and that… and then you just combine them, and that combining turns out to have a monadic structure. So it’s an interesting monad in that way, like the list or logic backtracking monads…



Werner: Have you named it?



It was called the Match monad. It was just used in internally in one program I’ve been working on.