In June 2012, I promised myself that I’d learn Clojure “as a mind expander”. As a long-time Python programmer who has been using Python full-time in my work at Parse.ly, I wanted to explore. I wrote then:

I don’t know whether Clojure programs will be better or worse than equivalent Python programs. But I know they will be different.

It took me awhile, but in January of this year, I started teaching myself the language.

Rich Hickey, and the “Cult of Personality”

My approach was to first learn the underpinnings of the language from books and online videos. If you embark on this for Clojure, you will inevitably run into the copious publicly-available material from the language’s creator, Rich Hickey.

In stark contrast to Guido van Rossum in the Python community, Rich Hickey is undeniably not just the Clojure language’s creator, but also a kind of spokesperson for a functional programming renaissance. Guido van Rossum generally lays low and lets the Python language and community speak for itself, and tries to avoid controversy. To him, Python is just a popular tool he happened to create, and it doesn’t represent any major paradigm shift in programming. It’s a positive evolutionary improvement supported by a great open source ecosystem and community. To Hickey, however, “traditional” programming languages — but especially popular ones with an object-oriented focus, such as Java and C++ — are just plain wrong. He proposes Clojure as an antidote of sorts.

You can get the gist of this from his motivating videos, such as Hammock-Driven Development, Are We There Yet?, and Simple Made Easy. For a thorough overview of Clojure as a language, you can also get a walkthrough by Hickey, given to a room full of Java developers, in Clojure for Java Programmers Part I and Part II.

Here is a summary of the viewpoint. Most languages are missing some important attributes that can help us tackle the most complex issues in programming projects:

True Immutability : Data structures should be immutable, and the details of maintaining a revision history for data structures should be an abstracted detail, like memory management is in most modern languages with garbage collection / reference counting.

: Data structures should be immutable, and the details of maintaining a revision history for data structures should be an abstracted detail, like memory management is in most modern languages with garbage collection / reference counting. True Composability : OO languages purport to offer a way to re-use code, but the mechanisms for doing so often rely upon type inheritance; in functional languages, composability falls naturally out of having simple functions with immutable inputs and outputs and higher-order functions for laying out their execution order.

: OO languages purport to offer a way to re-use code, but the mechanisms for doing so often rely upon type inheritance; in functional languages, composability falls naturally out of having simple functions with immutable inputs and outputs and higher-order functions for laying out their execution order. True Scalability : Most traditional languages either assume only operating in a single core, or provide low-level mechanisms for working with threads and locks. The free lunch is over for single core, and threads and locks are too complicated to get right. Clojure bundles a Software Transactional Memory (STM) implementation, that, when combined with composable functions and immutable data structures, can simplify parallelism and concurrency.

: Most traditional languages either assume only operating in a single core, or provide low-level mechanisms for working with threads and locks. The free lunch is over for single core, and threads and locks are too complicated to get right. Clojure bundles a Software Transactional Memory (STM) implementation, that, when combined with composable functions and immutable data structures, can simplify parallelism and concurrency. True Productivity: Some languages attempt to solve the above problems with a restrictive or verbose layer of static typing. Others try to solve the problems with complicated toolchains and compilation. A truly productive language has a small core, an interactive development flow (typically oriented around a REPL), and declarative, concise code forms.

Notice that I put the word “True” in front of each of these attributes. This is because if I were to reflect on Python as a language, I’d say it has all these attributes. You can build programs in Python that center around immutable data structures, have composable functions. You can write Python programs that scale up, and you can do so with a high degree of developer productivity. But Clojure tries to make these attributes fall naturally out of using the language, through a slew of built-in facilities, rather than enforcement of these attributes being a conscious design decision of the programmer (as is often the case in Python).

Hickey’s forceful arguments in his presentations are that the above attributes matter more than you might think. Defaults matter. Whatever is default is widespread.

Consider immutable data structures. In Python, we can code defensively using the copy module. Some languages, like Java, have immutable data structures as a third-party library, such as Guava Collections and its Immutable Collections support. But neither of these are widely used. You might even get negative code reviews from your colleagues for using them excessively. But in Clojure, immutable data structures are the default, thus they are widely used. Mutability is the opt-in behavior that draws strange looks from your programming colleagues.

Likewise, you can write composable functional programs in Python, but it’s probably just as common to write object-oriented programs and class heirarchies. I actually recently wrote a module in functional style on my team, and a few of my colleagues thought it was pretty weird because the code had “so many different entry points”. One even proposed rewriting it in terms of classes for clarity. In Clojure, these debates (class vs function) never happen, because functions are seen as the superior unit of composition. The debate centers around how to name, organize, and structure functions, not whether to use them altogether.

You can scale most dynamic languages beyond a single core using distributed computation frameworks like Hadoop and Storm, but it’s not using built-in language facilities and generally involves complex mechanisms. In Clojure, going from one core to multi-core is usually just a matter of using a different higher-order mapping function. Going from single-node to multi-node also becomes easier to reason about, because shared state is rare in Clojure programs.

So, all of this is to say, Clojure is not a revolutionary language. But it is different — that’s for sure. If you are used to Python, there are lessons to learn from Clojure, its community, and its code. This post, walking through some “Clojonic” examples of “Pythonic Clojure”, could serve as a good starting point.

Clojonic iteration

This is my favorite starting Python program example:

nums = [ 45 , 23 , 51 , 32 , 5 ] for idx , num in enumerate ( nums ) : print idx , num # 0 45 # 1 23 # 2 51 # 3 32 # 4 5 nums = [45, 23, 51, 32, 5] for idx, num in enumerate(nums): print idx, num # 0 45 # 1 23 # 2 51 # 3 32 # 4 5

The equivalent code in Clojure could be written:

( let [ nums [ 45 23 51 32 5 ] ] ( for [ [ idx num ] ( map - indexed vector nums ) ] ( println idx num ) ) ) ; 0 45 ; 1 23 ; 2 51 ; 3 32 ; 4 5 (let [nums [45 23 51 32 5]] (for [[idx num] (map-indexed vector nums)] (println idx num))) ; 0 45 ; 1 23 ; 2 51 ; 3 32 ; 4 5

If you squint, this code looks pretty similar, but there are some important differences. The main aesthetic difference is a few more parentheses and brackets. Beyond that, the Python programmer will observe that in the Clojure program, many aspects of the program are implied, rather than annotated by special syntax. Many Clojure proponents will say that Clojure has a “simple syntax”, but I think this is misleading. They don’t mean simple as in “easy to read without prior context”. They mean simple as in “unceremonious”. Perhaps they mean simple as a contrast to “baroque” (how you might describe Java or C++’s syntax). Clojure does have a syntax, but it is implicit from the layout of the code in the list data structures. I’ve decoded the above code visually below:

So, in Python, the code was a combination of lightweight data structures ( [] , aka list ), functions ( enumerate() ), statements ( for , print ), and explicit syntax ( nums = [...] binding, for idx, num unpacking). In Clojure, the code is a combination of lightweight data structures ( [] , aka vector ), special forms ( let ), macros ( for ), functions ( println ), and implied syntax ( let [nums [...]] binding, for [idx num] destructuring). Seems like just a bunch of different names for the same concepts — and indeed, in this simple program, Python and Clojure share a lot of language facilities in common. But, we can start to dig deeper on the Clojure version to make it diverge more from Python.

So, firstly, what’s the difference between the Python list and the Clojure vector ? Immutability. They have a similar syntax, but when Clojure’s list-like data structure is created, it cannot change. Other functions can accept the list as input and produce a new list as output.

So, you might ask the question, “How do I add an element to the nums list in Clojure?” The answer is that you can’t. You must, instead, pipe the list through a function that might generate a new list. For example, the conj function will construct a new immutable list with the additional element, but leave the original list intact. In other words, there is no equivalent to the .append() method.

( let [ nums [ 45 23 51 32 5 ] bigger ( conj nums 75 ) ] ( println nums ) ( println bigger ) ) ; [45 23 51 32 5] ; [45 23 51 32 5 75] (let [nums [45 23 51 32 5] bigger (conj nums 75)] (println nums) (println bigger)) ; [45 23 51 32 5] ; [45 23 51 32 5 75]

It would be as if we wrote a conj function in Python like this:

import copy def conj ( a_list , elm ) : new_list = copy . copy ( a_list ) new_list. append ( elm ) return new_list import copy def conj(a_list, elm): new_list = copy.copy(a_list) new_list.append(elm) return new_list

You might then ask, isn’t that very expensive? If a_list is large, that has to create a full copy in-memory of all its elements. If you wrote a function like this in Python, your colleagues would give you strange looks.

But, in Clojure, these details are handled by the language itself and in an optimized way. It uses a trick called Persistent Data Structures, which leverage a technique called “structural sharing”. Essentially, all of the list values are modeled as a tree that is maintained internally by the language and run-time. The immutable vectors you see as a programmer are “views” of the same arrangement of in-memory elements.

Let’s now look closely at the replacement for the enumerate() built-in we used in the Clojure code above. We had to use the more verbose function call to map-indexed below:

( map - indexed vector nums ) ; ([0 45] [1 23] [2 51] [3 32] [4 5]) (map-indexed vector nums) ; ([0 45] [1 23] [2 51] [3 32] [4 5])

This illustrates a Clojure higher-order function. The docs for map-indexed describe its operation:

Returns a lazy sequence consisting of the result of applying f to 0 and the first item of coll, followed by applying f to 1 and the second item in coll, etc, until coll is exhausted. Thus function f should accept 2 arguments, index and item.

The function we passed as f is vector , which is similar to list in Python — it simply creates a vector from its list of arguments. So, this higher-order function will make repeated calls of the following form to our nums elements:

( vector 0 45 ) ; => [0 45] ( vector 1 23 ) ; => [1 23] ( vector 2 51 ) ; => [2 51] ( vector 3 32 ) ; => [3 32] ( vector 4 5 ) ; => [4 5] (vector 0 45) ; => [0 45] (vector 1 23) ; => [1 23] (vector 2 51) ; => [2 51] (vector 3 32) ; => [3 32] (vector 4 5) ; => [4 5]

And this gives us our pairs of “indexed” elements. We could convert this higher-order function call into another function that operates more like the Python enumerate() counterpart.

( defn enumerate [ coll ] ( map - indexed vector coll ) ) (defn enumerate [coll] (map-indexed vector coll))

And we could then use (enumerate) in our Clojure code as follows:

( let [ nums [ 45 23 51 32 5 ] ] ( for [ [ idx num ] ( enumerate nums ) ] ( println idx num ) ) ) (let [nums [45 23 51 32 5]] (for [[idx num] (enumerate nums)] (println idx num)))

There, that’s looking a bit closer to the Python. Now, the astute Python programmer will observe that the enumerate() built-in in the Python language is actually an iterator over the indexed values. It lazily returns the enumerated (indexed) values from the passed-in sequence. You could write you own version of enumerate by using a generator.

type ( enumerate ( nums ) ) # enumerate enumerate ( nums ) . next ( ) # (0, 45) list ( enumerate ( nums ) ) # [(0, 45), (1, 23), (2, 51), (3, 32), (4, 5)] type(enumerate(nums)) # enumerate enumerate(nums).next() # (0, 45) list(enumerate(nums)) # [(0, 45), (1, 23), (2, 51), (3, 32), (4, 5)]

How does our Clojure version compare?

Well, here comes the next surprise. In Python, you opt-in to lazy iteration by writing your own iterators or letting the language write iterators for you by using generator functions via the yield keyword. In Clojure, functions operating on sequences are lazy by default.

It turns out our Clojure enumerate function is more like a Python generator than it appears. This is because map-indexed “returns a lazy sequence”. It’s only once the sequence values of enumerate are fetched, an element-at-a-time, that they are evaluated as index pairs. This is done by the for macro, which is itself lazy. From the docs, the for macro “yields a lazy sequence of evaluations”. The for macro in Clojure is actually closer to a generator expression in Python than it is to the imperative for looping statement.

Indeed, some Clojure programmers will question my reasoning for using the for macro as an iteration construct. There is another construct, doseq , that has a similar interface, but is meant to be used for iteration. From the docs:

Repeatedly executes body (presumably for side-effects) with bindings and filtering as provided by “for”. Does not retain the head of the sequence. Returns nil.

So, the following code would be a more idiomatic form of looping:

( doseq [ [ idx num ] ( enumerate nums ) ] ( println idx num ) ) (doseq [[idx num] (enumerate nums)] (println idx num))

Another Clojure programmer may also look upon the doseq and suggest it is a bit too verbose, not taking advantage of the fact that often iteration is achieved in Clojure via higher-order functions. The following code is just as good, and perhaps more idiomatic for Clojure:

( map println ( enumerate nums ) ) (map println (enumerate nums))

Or, even the following, which may look cryptic now but will be explained shortly:

( ->> nums enumerate ( map println ) ) (->> nums enumerate (map println))

Beyond Python constructs with macros

I mentioned that for is a macro in Clojure. Another macro I used briefly was defn , which is a macro that composes the def (global variable declaration) and fn (function) forms. The cryptic ->> symbol in the last example above is also a macro, called “thread-last”.

Macros may be the most fascinating feature of the Clojure language to advanced Python practitioners who have gotten a lot out of Python’s metaprogramming facilities, such as decorators, context managers, and metaclasses.

Macros are like a generalization of these features. They give you, the programmer, a generic compiler hook to transform code. In short, you can make constructs that look like function calls, but operate as code transformations. These are both powerful, and dangerous.

Let’s go back to our enumerate example. I could actually rewrite the enumerate function as a macro, as follows:

( defmacro enumerate [ coll ] ` ( map - indexed vector ~coll ) ) (defmacro enumerate [coll] `(map-indexed vector ~coll))

The first backtick indicates that this is a literal list. It should not be evaluated as a function call. It should not call map-indexed upon evaluation.

The ~coll is telling Clojure that the input to enumerate should be placed, untouched, in the location at ~coll . So, in plain English, this says, “Define a macro, enumerate , that looks like a function, but actually gets replaced, at compile-time, with a function call to (map-indexed vector coll) , but where coll is taken from the argument list of enumerate at compile-time.”

In other words, the body of the macro is like a template for a code replacement operation, and doesn’t have the overhead of a function call.

This is a pretty trivial example, but let’s explore it a little. Clojure provides macroexpand , which will show what the given macro call actually evaluates into at compile-time.

=> ( macroexpand ' ( enumerate [ 1 2 3 ] ) ) ( clojure . core / map - indexed clojure . core / vector [ 1 2 3 ] ) ; Translation: the code above is *replaced* with: ( map - indexed vector [ 1 2 3 ] ) => (macroexpand '(enumerate [1 2 3])) (clojure.core/map-indexed clojure.core/vector [1 2 3]) ; Translation: the code above is *replaced* with: (map-indexed vector [1 2 3])

In this case, the overhead of the function call is probably negligible, so the complexity of the macro probably isn’t worth it. But anytime you think, “this group of functions has a lot of code repetition” — where in Python you might reach for decorators or context managers — you can use Clojure’s macros to achieve similar feats of code re-use. This gets at Clojure’s notion of True Productivity.

Let’s go back to the ->> thread-last macro from before. This macro does nothing more than rewrite the code that comes after it in “pipeline form”. Instead of thinking of “mapping the println function over the lazy sequence of indexed pairs generated by enumerate”…

( map println ( enumerate nums ) ) (map println (enumerate nums))

… it might make more sense to think of it as “pipe the numbers through a function that generates index pairs, then pipe those pairs through a function that prints the results”:

( ->> nums enumerate ( map println ) ) ) (->> nums enumerate (map println)))

And indeed, these are equivalent formulations of the same code, thanks to the thread-last macro!

Working through a bigger example

So, given this introduction, Clojure probably feels very different, but also somewhat familiar. Let’s look at a bigger example program in Python:

# in twitter.py import json def with_twitter_data ( filename , rdr_fn ) : with open ( filename ) as rdr: return list ( rdr_fn ( rdr ) ) def read_tweets ( rdr ) : for line in rdr: apikey , timestamp , entry = line. split ( "|" , 2 ) yield apikey , timestamp , json. loads ( entry ) with_twitter_data ( "data/tweets.log" , read_tweets ) # in twitter.py import json def with_twitter_data(filename, rdr_fn): with open(filename) as rdr: return list(rdr_fn(rdr)) def read_tweets(rdr): for line in rdr: apikey, timestamp, entry = line.split("|", 2) yield apikey, timestamp, json.loads(entry) with_twitter_data("data/tweets.log", read_tweets)

This example defines two Python functions, with_twitter_data and read_tweets . The read_tweets function takes an iterator of lines that are formatted as follows:

1|2014-10-31|{"user": "amontalenti", "tweet": "some text"}

It splits these lines to get the strings apikey and timestamp and the parsed Python dictionary representing the JSON data, entry . It yields (apikey, timestamp, entry) as a 3-tuple lazily by using a generator function.

Meanwhile, with_twitter_data takes a function as an argument that knows how to parse those log lines, and eagerly evaluates the parsed results into an in-memory list. It then closes the file.

This code can be ported to Clojure very easily:

;; in twitter.clj ( ns twitter ( : require [ clojure . data . json : as json ] [ clojure . java . io : as io ] [ clojure . string : as str ] ) ) ( defn with - twitter - data [ filename rdr - fn ] ( with-open [ rdr ( io / reader filename ) ] ( doall ( rdr - fn rdr ) ) ) ) ( defn read - tweets [ rdr ] ( for [ line ( line-seq rdr ) ] ( let [ [ apikey timestamp entry ] ( str / split line # " \| " 3 ) ] ( vec [ apikey timestamp ( json / read - str entry ) ] ) ) ) ) ( with - twitter - data "data/tweets.log" read - tweets ) ;; in twitter.clj (ns twitter (:require [clojure.data.json :as json] [clojure.java.io :as io] [clojure.string :as str])) (defn with-twitter-data [filename rdr-fn] (with-open [rdr (io/reader filename)] (doall (rdr-fn rdr)))) (defn read-tweets [rdr] (for [line (line-seq rdr)] (let [[apikey timestamp entry] (str/split line #"\|" 3)] (vec [apikey timestamp (json/read-str entry)])))) (with-twitter-data "data/tweets.log" read-tweets)

What’s different? Well, Clojure’s import facility is a little different. We use the ns macro to declare a namespace, which is similar to a Python module that is implicitly defined using module files. The :require [namespace :as alias] clause is equivalent to Python’s import module as alias syntax.

We can see that read-tweets likewise lazily evaluates the lines of the input rdr , and uses str/split and json/read-str to parse the log lines. Vectors are yielded back lazily thanks to the for macro.

The with-twitter-data function uses with-open , which works similarly to Python’s combination of the with keyword and the open() context manager. It automatically closes the file when it’s done processing. But it implements this using a Clojure macro that surrounds the body of your code with a (try) ... (finally) exception handler and a call to close .

The call to (doall (rdr-fn rdr)) may seem curious. This is forcing eager evaluation, similar to Python’s typical use of list() to materialize all the values of a lazy sequence. It’s saying, “repeatedly call rdr-fn on the lines of input until there are no more lines left.”

Similarities and differences

My exploration of Clojure so far has made me realize that the languages share surprisingly more in common than I originally thought as an outside observer. Indeed, I think Clojure may be the most “Pythonic” language running on the JVM today (short of Jython, of course). Let’s look at the similarities:

However, the Clojure language also brings many new ideas to the programming community, while also improving upon ideas found in prior languages (like Common Lisp). I tried to summarize many of the core differences I’ve observed here:

I didn’t cover all of these differences, but we did take a look at Macros, Immutable Data Structures, Lazy Evaluation, and Code as Data very briefly. You can probably already see how a language with true immutability, composability, scalability, and productivity can emerge out of these building blocks.

If you are interested in going further down the Clojure rabbit hole, you’ll probably enjoy some of the additional resources I’ve curated below. I found these particularly helpful to me as a Pythonista exploring the world of functional programming through Clojure.

Are you a Pythonista who enjoyed this article? If that’s the case, you’d probably also enjoy working on cutting-edge web analytics problems at Parse.ly. We are looking to hire software engineers for our real-time analytics platform (Python, Storm, Kafka) and our elegant visualization dashboards (JavaScript, AngularJS, d3.js). We are also looking for people interested in engineering / product management. To apply right now, email [email protected] with a link to Github, your CV and/or other relevant background! If you reach out about a job opening, be sure to mention this post.

Articles

Videos

Books

The two best books out there seem to be:

There is also a free book out that explores Clojure from a beginner’s perspective, called Clojure for the Brave and True.

More materials related to this post