I've been trying to learn Haskell lately, and the hardest thing seems to be learning about monads. "Monads in Ruby" and "Monads for Functional Programming" helped me finally to break the ice, but more importantly, they gave me a glimpse behind the veil. I finally began to understand something that is trivial to people such as Philip Wadler, but something that never had been explained to me so succinctly. In computer science, we enjoy using mathematic models, but the science still works if you violate the math. And, much to the dismay of purely functional programming enthusiasts, we almost always do. However, when we embrace the math, sometimes the loss in flexibility is more than compensated for by the increase in functionality.

Monads as a Design Pattern Somewhere, somebody is going to hate me for saying this, but if I were to try to explain monads to a Java programmer unfamiliar with functional programming, I would say: "Monad is a design pattern that is useful in purely functional languages such as Haskell. Although the concept of a monad has been taken from mathematics, this design pattern can be applied in Haskell to implement a wide range of features." Philip Wadler says, "Monads provide a convenient framework for simulating effects found in other languages, such as global state, exception handling, output, or non-determinism." The Java programmer could argue fairly, "Hey, Java has all those things, and monads aren't even mentioned in Design Patterns. What's the deal?" I'd respond: "Design patterns fluctuate by language. Just as each language has its own set of native features, it also has its own set of both idioms and design patterns that make sense within that language; naturally, there's a lot of crossover between similar languages. So although I don't expect monads will be the next great thing in Java, they're a necessary part of understanding Haskell." [see Footnote 1] The Java programmer might continue, "By why would I care about Haskell, especially if I have to learn about this whole monad thing, simply to do the things I already know how to do in Java?" I'd slyly say: Have you been paying attention to the The International Conference on Functional Programming Contest over the last several years? It's one of the largest programming contests of the year, and it's often mentioned on Slashdot. Users of any language may enter, and a wide variety of languages are routinely used by the contestants. During the last several years, the winners of the ICFP Programming Contest were mostly Haskell, OCaml and Dylan programmers, all of which are functional programming languages. Learning new things occasionally can be painful. Someone once said, "If a programming language doesn't teach you a new way to think about problems, it's not worth learning." I've found that learning about monads and Haskell in general has both been painful and rewarding.

A Trivial Introduction to Monads Both Wikipedia and "Monads for Functional Programming" offer introductions to monads from a category-theoretic perspective, but they aren't for the faint of heart. Despite having a degree in math, I found that it took a while for me to grasp what was going on. Perhaps I'm rusty, but I think a simpler explanation is possible. "Monad" is an interface. For any type "<InnerValueType>" (notice the template-ish syntax), a class "MonadExample" that implements the "Monad" interface has the following two methods: It has a constructor with a parameter named "value" of type "InnerValueType". It has a method named "pass" that takes a callback. The callback must accept a parameter of type "InnerValueType", also named "value". It must return an instance of "MonadExample". [see Footnote 2] The "pass" method invokes the callback, passes "value" and immediately returns the result. Whacky interface, eh? It may help if you start by thinking of monads as a special type of container. Just as "List <Integer>" makes sense, so should "MonadExample <InnerValueType>". Some classes that implement the Monad interface always contain exactly one value. Others might contain zero or more values, as List <Integer> does. Continuing, let's imagine a simple function "f()" that accepts a value and returns another. Now, instead of simply passing around values, let's wrap the values in monads and pass around the monads. To use the function f() on that value, you must call the monad's "pass" method. Doing so, you suddenly get two benefits. First, you can put a lot of other stuff in the monad besides the value. Second, you can do whatever you want when someone calls the pass method. For example, you can keep track of how many times it was called. What's even more interesting is you can pass different objects to f(); each implements the Monad interface. Per polymorphism, they can do totally different things, and f() doesn't have to know or care. In Java, this behavior is not novel--any method may have side effects. But, in a language such as Haskell, side effects are strictly forbidden; functions should take values and return values. Use of the monad, in such a language, enables you to make use of side effects where otherwise impossible.

An Example Philip Wadler has written several papers detailing how monads can be used to implement a variety of language features in a purely functional programming language, such as Haskell. This article barely scratches at the surface. However, I'd like to offer one example that is somewhat interesting. The "Maybe" class implements the Monad interface. The Maybe class has two states: it either can contain a real value or it can contain nothing. It's a fairly common cause of bugs in a lot of languages to end up with NULL in a place you weren't expecting it. Sometimes NULL is a valid value in some parts of the code, and sometimes that NULL escapes into code not expecting NULL. The Maybe monad "captures" this behavior in a way that is useful, easy to understand and verifiable at compile time in Haskell. This may be of interest to anyone who's stumbled across such a bug. [see Footnote 3] At this point, the reader may be begging for some code. One of the hardest things about learning Haskell is you can't understand Haskell without understanding monads, and you can't understand most of the monad examples without understanding Haskell. Hence, here's the Maybe example in Python: class Maybe: def __init__(self, value=None): self.value = value def __repr__(self): return "Maybe(%s)" % self.value def pass_to(self, callback): # "pass" is a keyword, so I've used the name "pass_to". if self.value is None: return Maybe(None) return callback(self.value) def add(left_monad, right_monad): monad_class = left_monad.__class__ # This strange indentation is perhaps more readable. return left_monad.pass_to(lambda left_value: ( right_monad.pass_to(lambda right_value: ( monad_class(left_value + right_value))))) n1 = Maybe(5) n2 = Maybe(6) n3 = Maybe(None) print n1, "+", n2, "=", add(n1, n2) # => Maybe(5) + Maybe(6) = Maybe(11) print n2, "+", n3, "=", add(n2, n3) # => Maybe(6) + Maybe(None) = Maybe(None) Here the same example in JavaScript: function Maybe(value) { this.value = value; } Maybe.prototype.constructor = Maybe; Maybe.prototype.pass = function(maybe, callback) { if (maybe.value == null) { return new Maybe(null); } return callback(maybe.value); } Maybe.prototype.toString = function() { return "Maybe(" + this.value + ")"; } function add(leftMonad, rightMonad) { // Pass leftMonad and function taking leftValue. return leftMonad.pass(leftMonad, function(leftValue) { // Pass rightMonad and function taking rightValue. return rightMonad.pass(rightMonad, function(rightValue) { // Return the resulting monad. leftValue and // rightValue are accessible via a closure. return new leftMonad.constructor( leftValue + rightValue); }) }); } n1 = new Maybe(5); n2 = new Maybe(6); n3 = new Maybe(null); document.write(n1 + " + " + n2 + " = " + add(n1, n2) + "<br>"); // => Maybe(5) + Maybe(6) = Maybe(11) document.write(n2 + " + " + n3 + " = " + add(n2, n3) + "<br>"); // => Maybe(6) + Maybe(null) = Maybe(null) A few things deserve note: To implement this, your language must support closures. The "add" function works with any monad. That's part of the beauty of monads. All the knowledge about Maybe was encapsulated in the Maybe class (Python) or prototype (JavaScript). The successively nested anonymous functions used in the add function are perhaps a bit difficult to understand, but this is a literal translation of how it would work in Haskell. Fortunately, Haskell has syntactic sugar that makes this style programming very easy to read. It's fair to say that the translation into either Python or JavaScript is a bit awkward in comparison.

What's Up with All the Math? If you read either Wikipedia's or Philip Wadler's introduction to monads for functional programming, you may wonder, "what's up with all the math?" Why is it that computer scientists feel compelled to justify things mathematically before they feel comfortable using them as techniques in computer science? Sure, early computer scientists were mostly mathematicians, but is the connection really so crucial? I can think of three reasons: Mathematicians are trained to base their work on previous work. Euclid taught us how to create chaos out of definitions, postulates and common notions. If you base your work on an existing body of work that already has been scrutinized via proof, you reasonably can assume that things probably will turn out better. If you can prove that your work meets certain criteria, you can let the math do a lot of the work for you. For instance, if you can prove that your code implements a true monad, everything else you know about monads will "just work". At the risk of stating the obvious, mathematical models aren't perfect. If you add two cups of one substance to two cups of another substance, you might end up with four cups because 2+2=4--or you might end up with an explosion. Obviously, the model must fit the facts, not the other way around. Only then can you use the model to try to ascertain additional facts.

What Happens If You Start Ignoring the Math? The idea of a function in computer science comes from the Lambda Calculus. What happens if you start ignoring the math? C already fails to comply with Lambda Calculus because of the side effects, but let's take it further. Imagine a version of C called C' that passes all arguments by reference instead of by value and has no return statement. Hence, when you call a function, you can pass a bunch of arguments (note, some of those arguments might contain only junk values). The function itself uses some of those parameters to do something. Then it may, as a side effect, change some of the parameters. Because the arguments are passed by reference, the function is really changing the caller's variables: /* Increment x. */ void inc(int *x) { (*x)++; } /* ... */ int x = 5; inc(&x); We suddenly have a language with no return values and only side effects. Nonetheless, it's easy to see that any program in C easily can be translated into C'. Furthermore, it'd be easy to write a pure Lambda Calculus programming language in C'. C' still would be a useful language if you had nothing else. Next, let's consider the associative nature of integers: (a + b) + c = a + (b + c) In languages that permit operator overloading, such as C++, Python and Haskell, you can implement the "+" operator for objects. There's nothing in the compiler that forces you to make + associative. Of course, it's usually foolish to implement + in a way that isn't associative, but there's nothing stopping you. Similarly, in Haskell, you can declare that a type is a member of the type class "monad"--that is, it implements the interface--even if your type doesn't mathematically qualify as a monad. I'm sure that you can expect bad behavior, in some manner or another. But as intelligent as Haskell compilers are, they won't stop you (which is mentioned here).

Why the Universe Won't Come Crashing Down If You Violate the Identity Principle To belabor the point in the interest of humor, consider the identity principle: a thing must equal itself. My buddy, Leon Atkinson, use to jokingly argue with me that "Nothing makes sense if you violate the identity principle". Well, observe: $ python >>> class A: ... def __eq__(self, other): ... return False ... >>> a = A() >>> a == a False Notice that the universe didn't cease to exist. I didn't get a core dump or even an exception. I bet if I embedded this naughty class in some dark corner of my Python program, and never made use of it, my program would run. Now consider that in SQL, "NULL" can best be translated as "nothing"; it's not FALSE, 0 or "". Observe the identity principle in SQL: mysql> SELECT FALSE = FALSE; +---------------+ | FALSE = FALSE | +---------------+ | 1 | +---------------+ 1 row in set (0.06 sec) mysql> SELECT NULL = NULL; +-------------+ | NULL = NULL | +-------------+ | NULL | +-------------+ 1 row in set (0.00 sec) Notice that NULL does not equal NULL. This leads to my point: In computer science, nothing [still] makes sense [even] if you violate the identity principle.

Why Less Is More Okay, so what? Should we forsake our math background? Is purely functional programming a lame holdover from a bunch of bitter mathematicians? If it works, it works. Right? No. Sometimes less is more. In math, additional constraints on a set of elements may lead to more theorems that you can prove about that set of elements. Sometimes being more restrictive about what code is allowed to do allows you to make assumptions that lead to optimizations. Sometimes, entire new features can result. Consider the bad old days of assembly programming. As programs got larger, and as more pieces relied on one another, assembly programmers eventually discovered that you have to be pretty stringent about who gets to modify what. These days, we've learned to be rather controlling about which subroutines are allowed to modify which registers. You can push and pop all you want, but by the time you leave a subroutine, the registers that aren't supposed to be changed better still have their original values intact. Similarly, as we moved away from structs in C to objects in C++, we started being a lot more strict about who could modify or even know about the internal structure of an object. As soon as you break encapsulation and start poking around the inside of an object in code using that object, your own code becomes tied to the implementation of that object. What seemed like freedom is really slavery. Now, consider the order of evaluation of C function parameters. Expert C programmers will know where I'm going with this. What does the following program print? #include <stdio.h> int *inc(int *x) { printf("%d

", *x); (*x)++; return x; } void printboth(int x, int y) { printf("x: %d, y: %d

", x, y); } int main(int argc, char *argv[]) { int x = 0; printboth(*inc(&x) + 10, *inc(&x) + 100); return 0; } The question comes down to which inc(&x) comes first. I believe it's actually part of the ANSI standard, although I could be wrong, that arguments are evaluated right to left: $ ./test 0 1 x: 12, y: 101 Notice that y ends in 1, whereas x ends in 2. Indeed, the y argument was evaluated first. Can you guess what would get printed if you translated this code into Java? [see Footnote 4] It makes me wonder, if a function's side effects can't influence anything beyond the monad in a purely functional programming language such as Haskell, and if the programmer is completely in control of the monad, does that make the program easier to understand? If you don't have to worry about the side effects of some random function that you didn't even write, do you get fewer bugs? As a further point, consider what "Joel on Software" reminds us of when explaining MapReduce, the algorithm that makes Google so massively scalable [see Footnote 5]: if a list of functions [isn't] allowed to have any side effects, you can safely call them all in parallel. Considering how hard it is to make something thread safe, suddenly parallel programming in a functional programming language such as Haskell, Erlang or Alice seems a lot nicer!

Summary Purely functional programming languages such as Haskell are very strict in their approach to side effects. Monads can be used to compensate for this strictness in order to implement a lot of the features we have come to take for granted in non-strict languages. They can do this without compromising the strictness of the language as a whole. Although staying true to an underlying mathematical model isn't fundamental to the continued existence of the universe, sometimes doing so can have substantial benefits. Sometimes the ideas that are hard to understand and that seem overly restrictive can open up whole new worlds in the universe of computer science.

Acknowledgments I'd like to thank Brandon L. Golm and Brennan Evans for their substantial feedback on this article.