Abstract for Haskellers:

This is a long, involved defense of purity, both in terms of what it affords us in terms of power and in how it enforces “good” programming practices.

Haskell is often defined in terms of what it cannot do: variables can’t be changed once declared, the results of IO computations can’t directly be used in functions, execution flow can’t be controlled, etc. While this characterization is strictly correct, it paints a rather ascetic, when not outright negative, outlook of the language that practitioners not only know to be misleading, but also to be the key to the great power latent in functional programming.

We discuss how and why certain “restrictive” features have become succesful in computer programming while others haven’t, and procceed to explain referential transparency and what it affords us. Finally, we conclude that adopting this particular set of “restrictive” features is useful and positive, even in contrast with the experience of non-useful “restrictions” that seem to have common characteristics.

Abstract for non-Haskellers:

This essay explains why Haskell is cool.

Inconvenience-oriented languages and “proper” programming practices

A few days ago, concatenative programming advocates in #factor argued that the main benefit of stack-based languages is that they enforced constant refactoring because managing large stacks is rather inconvenient. This rather zen-like approach to self-restrain as the path to enlightenment might still offend those enamored with the real programmer mythos of wizardly power, but self-restrain tactics have been popular in the pursuit of better programming at least since the days of “Goto considered harmful” and structured-block programming.

While there isn’t a formal calculus of while/for/break structured-block programming (at least to my knowledge), its benefits seem intuitive: by abstracting away common if-p-then-goto control structures, less repetitive, fluffy flow control code is written and it’s easier to spot the actual program logic in the raw unordered source code. In other words, while/for/break structured-block patterns enforce separating the concerns of flow control from program logic: irrelevant ‘counter’ variables are often hidden, automatically created or declared in the context of flow control statements, and program logic is sandwiched between these collapsed flow control pattern notations.

This is different from the “gotta-factor-or-will-go-nuts” approach concatenative programming advocates seem to be emphasizing. Stack-based programming functions (“words”) are merely lego pieces of code that are referenced by a short name; that they always work in any context, in spite of the fragile appearance of this method is a virtue of the RPN approach to computing, something that’s been very popular in hand calculators. It intuitively feels “weaker” than the previous example. But what precisely makes this inferior to the structured-block approach?

Structured-block patterns allows for an approach to inconvenience-oriented programming we could call the “Pascal approach”. Basically, Pascal removes gotos and one is forced to use structured-block statements. What that entails is that programmers are inconvenienced away from using “spaghetii” flow control (the Bad Practice the Pascal approach seeks to remove) and into adopting the standard while/for/break “calculus” — IIRC, Pascal doesn’t even have “break”, a tell-tale sign that Nick Wirth might have fantasized about 18th-century mathematicians wearing stiletto heels and a riding crop at some point. The fact that modern so-called structured languages don’t bother to remove the goto statement — and yet it is seldom used — is a witness to the fact that inconvenience is really more of a culture shock meant to induce rapid change away from the Bad Practice than an essential part of not indulging in “guilty productive pleasures”!

What happens in concatenative languages is different. In a sense, the complexity of running long stack-based code in your head might be regarded, in the terms defined for the Pascal approach, as a strategy to inconvenience programmers away from the Bad Practice (entire programs written as one long subroutine) and into dividing their code in glance-sized pieces that can be individually grokked. The problem with this is that there seems to be no proper “calculus” of correct strategies to writing code being given in exchange for the great inconvenience of writing stack-based code. That is, stack-based programs tend to be highly-factored not because programmers were persuaded, maybe with a little use of force, that it’s a better approach, but because it’s just impossible to do it another way. Programmers are inconvenienced away from the Bad Practice of one-subroutine code into well, whatever gets them through the night. This might have been important in the days of Forth, when Real Programmers roamed the land chewing tobacco and stealing horses, but what sense does it make nowadays, when programmers are educated from the get-go into trying to do things in a somewhat structured fashion, not only with structured-block patterns but also objects and classes?

These two case studies seem to indicate that inconvenience-oriented programming is a Bad Practice in language design that might indicate that you don’t really have a solution for the Bad Practice you’re trying to inconvenience people away from, and good Alternative Practices don’t need to be imposed by language design to be succesful.

Haskell’s purity and BDSM

You give me the reason

you give me control

I gave you my purity

and my purity you stole

Did you think I wouldn’t recognize

this compromise?

Am I just too stupid to realize

this compromise?

Grey would be the colour

if I had a heart

(Nine Inch Nails, “Ringfinger”)

The result of this case study comparison does seem to spell bad news for Haskell, which seems to attempt to enforce ‘proper’ programming practices as suggested by a much better theoretical calculus basically by removing the ability to write code outside that calculus. This particular point is raised often by advocates of quasipure languages like Dylan, Lua and Erlang. It could be right, at least to a point, if it wasn’t for the sheer depth and strength of the formal calculus behind Haskell — and how important it is to why functional programming matters anyway.

Let us spell this out for non-Haskellers, in hope of reaching a wider audience. The principal sense in which we can call Haskell “pure” is in that it’s referentially transparent: any calls to a given object always mean the same regardless of context. This is true of all Haskell objects: types, type classes, functions and modules. This is even true, for a wide range of meanings of “mean” and “the same” for functions that by definition are supposed to return different values everytime, like getChar or a random number generator.

Haskellers and interested outsiders alike know, at least on a name basis, the ‘magic’ that allows referential transparency to be strictly kept in all corners of Haskell even in the need of ‘awkward’ (from a pure standpoint) needs like I/O, concurrency, exceptions and foreign-language calls. One of the key researchers in the Haskell community has even written a tutorial on how this is managed in a purely functional setting with monads; beginners interested in a more superficial view of how monads model the I/O problem can consult the excellent step-by-step construction of the IO monad by Bulatz.

I shall not spend time explaining how monads solve the problem of having mutable objects of various forms in a referentially transparent context. One should note, as an important sidenote, that monads are not a rigged-up hack enabling this, but an abstract mathematical concept that happens to apply to this problem in Haskell, as well to a few other Haskell concepts — like lists and Maybe types; recent research on co-monads (the categorical dual to monads) by both high academics and rapid-fire math hackers shows that this business of importing abstract mathematics is useful for many other contexts, like generalized cellular automata and signal processing in a jaw-dropping general fashion.

There are two facts, one formally provable and one perceived here to be contrasted here.

The hard fact is that even awkward corners of programming that make other functional languages (like *MLs and Erlang) bend over and accept some impurity are in fact managed in a ‘pure’ (read ‘referentially transparent’) environment in Haskell by the use of monads. The perceived fact is that this ammounts in practice to making code-jockeys jump through hoops to get the tasks in the ‘awkard squad’ accomplished in the programming model enforced by Haskell’s lack of certain features like destructive updates.

If one is to accept the ‘perceived’ fact, the hard fact seems to imply that Haskell is solidly in the “RPN languages enforce code refactoring” ground of bondage-and-discipline enforcing practices programmers would rather not use if a way out was available. The perception that monadic IO entails “jumping through hoops” is arguably an artifact of the sheer culture shock between programmers trying to get things done quickly and computer scientists trying to prove formal theories of program calculation. We shall dispute this notion later; for now, I want to spend some time reviewing what it is that this referential transparency gives us in exchange for the removed features.

What purity affords us, part 1: Declarativeness

First of all, there’s lazy evaluation. While laziness in an impure context is conceptually possible, its results would be “mighty interestin’ “, as humorously put by Matthew Daniels in the #haskell IRC channel. Lazy evaluation in a nutshell means that computations are only done as needed; an expression like let bignum = product [1..10^9] in 1+1 , if strictly evaluated, would take a long time to run and might even overflow the stack; under lazy evaluation, the useless computation of bignum is never done, since it’s not needed. Evidently, lazy evaluation in a non-referentially transparent environment would be virtually chaotic: programs could yield different, unpredictable results.

The fact is that there aren’t any lazy, non-referentially transparent languages, and for the reasons outlined above there can’t be any useful languages that are so. My cannonical example of why lazy evaluation is critical in writing very complex code in a manageable fashion is the CA comonad, but simpler examples can be built around infinite lists. Basically, what lazy evaluation brings to the table is that data-generating functions can be written abstractly, manipulated implicitly and computed only as actual results are requested.

I’ve been trying to eschew specific examples so far, but it becomes useful here. Take the Fibonacci sequence. As explained in the popular tutorials, a simple one-liner that returns it is

fibs = 1: 1: zipWith (+) fibs (tail fibs)



If one attempts to call fibs directly, by typing “fibs” into an interactive environment or writing a main function that’s simply main = print fibs , the computer will simply print the Fibonacci numbers indefinitely, until someone presses Ctrl-C or the computer dies after its years of useful service:

Prelude> fibs

[1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946,17711,28657,46368,75025^C

Interrupted.



An infinite loop could have been written in an imperative, strict environment almost as simply:

x : = 1

y : = 1

print "[1,1,"

while (True) {

x1 := y

y := x + y

x := x1

print y

print ","

}



The fibs function we have defined is nevertheless much more manipulable. Let’s say we want new structures that respectively return the squares of Fibonacci numbers, the offset-by-one Fibonacci numbers and even the difference between these two:

squarefibs = map (^2) fibs

nextfibs = tail fibs

diff_fibs = zipWith (-) squarefibs nextfibs



Each one of these examples becomes progressively more complicated using the imperative, strict pseudocode above. An infinite diff_fibs becomes unmanageable in the absence of sophisticated coroutine control flow structures that are both rare in mainstream languages and hard to use. The cellular automata comonad takes this to the next level by defining comonadic datastructures that are infinite in two different directions (think Conway’s game of life, where individual cells’ state depends on the state of its neighbours) and which can nevertheless be transformed as simply as above by defining its mathematical structure in a way not too different from the way we have defined the Fibonacci sequence. As a cherry in the top of the cake, we can just pick and choose what elements of the infinite data structures we want if we wish:

five_diffs = take 5 diff_fibs



This simple example hacked up in five lines of Haskell code (defining respectively fibs, squarefibs, nextfibs, diff_fibs and five_diffs) would be incredibly more involved — there’s a Perl golf competition waiting to happen here — and moreover reusing code from “fibs” to “squarefibs” and so on until “diff_fibs” would just not happen. Just imagine what can be done with more involved mathematical structures and an actual computer programmer doing them.

This style of programming (defining data in abstract terms and manipulating them as mathematical objects) is often called “declarative programming”; I like to call it DWIM (do what I mean)programming, but this is a little misleading, since Haskell understands only very strict mathematical definitions and hasn’t (yet) evolved to the point where it understands the vast, dark, little-understood space of human desire.

Haskell isn’t a psychoanalist yet, but I’m sure it won’t take us long.

What purity affords us, part 2: Program Transformation

We’ve seen that referential transparency affords us writing blocks of code as mathematical structures that can be later be transformed. The second thing that purity affords us is that entire programs are mathematical structures themselves and can be manipulated not only in the sense above but also in that theorems satisfied by specific functions can be employed to optimize code — not anymore in the heuristic-working-in-the-dark sense of ordinary optimizing compilers, but in the sense of understanding the actual structure to be done.

This is probably what motivated the research program of program construction calculi in first place. There’s a good review of program construction calculi papers in the first pages of Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire, followed by a calculus that’s quite close to what Haskell is today.

Basically, a program construction calculus (of which the while/for/break “calculus” could be considered an informal instance, if some requirements are dropped) is a theory of how a program can be built out of well-formed “lego pieces” with well-understood mathematical properties. For example, where an imperative language would apply functions to every element of a list-like data structure as follows,

function double (list) {

newlist = list.new

for (i in 1:length(list) {

newlist.append(2*list[i]);}

return (newlist)

}

function square (list) {

newlist = list.new

for (i in 1:length(list) {

newlist.append(list[i]^2);}

return (newlist)

}



a functional language can employ higher-order functions and define a “map” function that allows one to say just

double = map (2*)

square = map (^2)



Years of research have improved the theory of such program-construction ‘blocks’ like map, reduce/foldr, etc. to the point where there’s a mature calculus where we know many theorems — invariant properties that can be always applied — about these blocks that can be employed in practice. For example, a simple theorem about “map” is

where m is map.

Since Haskell compilers know about this theorem — theorems like that have been developed for a class of program construction blocks large enough to build anything in a large class of data-structures — code like

doublesquares list = double (square list)



is automatically “fused” so not to construct the entire data structure and iterate through everything twice, composing just the mapped function as god intended. This kind of optimization is pervasive in modern Haskell compilers. Translating this to an imperative setting, this mean a compiler knows how to transform code like

x = double (square (list))



which would be evaluated into

newlist = list.new

for (i in 1:length(list) {

newlist.append(list[i]^2);}

newnewlist = list.new

for (i in 1:length(list) {

newnewlist.append(2*newlist[i]);}



into code like

x = doublesquare (list)



where

function doublesquare (list) {

newlist = list.new

for (i in 1:length(list) {

newlist.append(2*list[i]^2);}

}



All this is done automagically!

Again, as the proverbial cherry on the top of the cake, Philip Wadler has proven that

From the type of a polymorphic function we can derive a theorem that it satisfies.

How far we have come from the “programming is giving a robot a detailed list of instructions” paradigm!

The whip-cracking sound: what purity enforces

We have, for the two preceding sections, engaged in a starry-eyed tour of what purity affords us that we can’t have in the absence of it. That alone should place the restrictions that referential transparency requires squarely outside the “dominatrix” scenario to computer programming where the user’s toys are taken away so he does things the right way; giving up destructive updates becomes closer to a doctor advising you to give up trans fats in the name of your health and all the other fun things you can do with a healthy life.

We have, nevertheless, promised to compare the restrictions of the purely-functional model with those of the two other restrictive models mentioned in the first section of this essay — the structured-block and stack-based/word-substitution paradigms. What we intend to show here is that the programming style enforced by purity falls well into the ‘good’ case of better practices arising from prohibiting bad ones we’ve witnessed before from the rise of structured-block flow control.

Stack-based languages enforce code “refactoring” — the structuring of code in self-contained separate blocks — by making managing code too complicated otherwise. This could also be argued of Haskell, to a point. Much like Forth and Factor programs are expressed like one long unmanageable stack tower in the absence of words, a Haskell program could be expressed as a long unmanageable chain of lambda forms in the absence of named functions.

That Haskell functions are something much more powerful than Factor words comes straight from the simple examples we’ve seen before, but let’s set this aside for a moment. Haskell does seem to enforce the kind of modularization Factor enforces. It would, if we didn’t have the where keyword. “Where” blocks have local function definitions and type signatures, and enables you to shoot yourself in the foot all you want. A stupid Haskell programmer (or maybe a very smart AI learning how to program by itself!) might rewrite the very simple code block

double list = map (*2) list

square list = map (^2) list



as

double list = act 2 list where{

act n [] = [];

act n (x:xs) = (n * x) : act xs;

}

square list = sprinkle 2 list where {

sprinkle n [] = [];

sprinkle n (x:xs) = (x ^ n) : sprinkle xs;

}



and it’s easy to see how they could screw up (x:xs) as well by defining their own “head” and “tail” functions. There’s no stopping human stupidity.

This example shows how a programmer can go out of his way and eschew everything that’s good and saintly about functional programming if he wants to. What he can’t do, and this is by virtue of purity, is mix stateful code with pure code. That is, while in the imperative version of print_fibs I/O and actual program logic are completely intermingled,

function print_fibs() {

x : = 1

y : = 1

print "[1,1,"

while (True) {

x1 := y

y := x + y

x := x1

print y

print ","

}

}



in the Haskell version the I/O logic is almost forcefully separated:

fibs = 1: 1: zipWith (+) fibs (tail fibs)

print_fibs = print fibs



Part of what forces this problem to be separed into two is the recursive nature of the definition of “fibs”; for a number of other, simpler problems, the mindless programmer can indulge in as much intermingling as he wants

print_doubled num = print (num * 2)



and get away with it. Input is also not that difficult; a function that reads a number and prints it doubled would be

print_doubled = getChar >>= print . (*2) . read



which is as intermingled as it gets. This would not work so simply if we needed a recursive function, for example — unless we appeal to recursive definitions elsewhere, which is forced refactoring happening already. It also doesn’t leave us with a numeric trail; while in an imperative language

input num

print 2*num



leaves us with a num variable that can be used in later calculations, print_doubled as defined above kills the doubled number. Complications seem to arise exponentially.

From these code snippets, the kind of “forced refactoring” brought up by the whip-cracking sound of purity seems clear: what Haskell is encouraging is separation of concerns, more by the convenience of working in “straight Haskell” and not trying to blunder it by intermingling concerns than by the sheer inconvenience of not doing so.

In other words, a very interesting form of refactoring comes up as an artifact of purity. Pieces of program logic are easily separated — and can be combined in clever and novel ways, like in the monadic combinators. Flow-control is a non-issue because of laziness, but the kind of computation brought about by flow-control patterns is also easily left abstracted away. (That the recursive patterns that solve the flow-control problems of structured procedural programming are mostly defined in the standard Prelude is a testimony to the fact that the Haskell designers have sought out to make things easier for us). In one sentence, the whip cracks not for perverted pleasure but for personal growth in learning the practices that make for powerful computer programming.

Acknowledgements

The interesting parts of this essay are research done by People With Big Brains, Wadlermen and other funny alien races with thought processes that shame us ordinary people. The uninteresting parts are random musings trying to connect some dots by yours truly, a nosy economist trying to make way into the fascinating world of functional programming. By now I know more computer programming than I ever would if I had learned it on other languages; the fact that ordinary people like me can grok how to unleash such great power is a testimony to the long-term positive effects of that cracking whip.

I probably couldn’t have written it either without the help of too many denizens of the #haskell IRC channel to mention individually. They have helped me with my english grammar, confirmed my shaky intuition on many points and encouraged me to go on. Hey, this weepy acknowledgement list is beginning to sound like it’s a freaking PhD thesis. It’s just a blog post, and it has more to owe to the amazing community around Haskell than to any minor thought-plumbing achievement of the author. Ok, this is the end. This is my last sentence.

Important notes and retractions

Apparently a long essay by a nonexpert can’t come out without mistakes and/or blunders. This is a growing list of them as they’re pointed out.

Apparently there actually are formal methods for for/while structured-block programming. Thanks! Maybe I am giving concatenative languages a bum rap, maybe I’m not. The fact is that when I tried to learn Factor, that was they key advantage given. Addressing Slava, specifically, I’m sure all the above code snippets could be written in Unlambda or Malbolge, but newbies like me don’t want to. I invoke the Colbert-like concept of “mathiness”: Factor, Java, etc. are probably amenable to formal analysis like I’m told structured-block programming is, but Haskell’s mathematical underpinnings are closer to what simpletons like me grok. Oh. Turns out Slava Pestov, who commented before, is the creator of Factor, and is already distilling bile in his own blog. Maybe if I diss C++ I can get Bjarne Stroustroup visiting here. Some people seem to be encouraging me to go into full flame mode with him, and I can think of one or two humorous petards, but I think silent enmity better suits his overinflated ego and my own overinflated ego. Still, I’m making enemies already, I must be doing something right! This thread at reddit ellaborates a little further on lazy evaluation, call-by-need evaluation and other nomenclature details. This has been said