I’m starting to wind down my Clojure research, but I’m feeling a little guilty about having exposed people to my klunky Lisp-newbie code, perhaps giving a false impression of how the language feels. So I’d like to show you what it looks like when it’s created by someone who’s actually part of the tribe and thinks in it more natively than I probably ever will.

[This is part of the Concur.next series.]

Technomancy · That’s the online handle Phil Hagelberg goes by, and I like it too much to resist a chance to use it. He reacted to my first Wide-Finder-related Clojure fumblings with in which things are mapped, but also reduced, including code which may be perused here.

I think it’d be worth your time to pause for a minute and think about it.

John from Milo · That would be John Evans of Milo, which looks like an interesting site. His first reaction to Phil’s code was this:

(ns my-wide-finder "A basic map/reduce approach to the wide finder using agents. Optimized for being idiomatic and readable rather than speed. NOTE: Originally from: http://technomancy.us/130 but updated to use pmap." (:use [clojure.contrib.duck-streams :only [reader]])) (def re #"GET /(\d+) ") (defn count-line "Increment the relevant entry in the counts map." [line] (if-let [[_ hit] (re-find re line)] {hit 1} {})) (defn my-find-widely "Return a map of pages to hit counts in filename." [filename] (apply merge-with + (pmap count-line (line-seq (reader filename)))))

I grabbed that but for some reason couldn’t get it to run against the actual Wide Finder dataset. I pinged John and he provided me with this revised version:

(ns batch-pmap-wide-finder "A basic map/reduce approach to the wide finder using agents. Optimized for being idiomatic and readable rather than speed. Updated to deal with batches of lines instead of individual lines. " (:use [clojure.contrib.duck-streams :only [reader]] [clojure.contrib.seq-utils :only [partition-all]])) (def *batch-size* 50) (def re #"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) ") (defn tally [line] (if-let [[_ hit] (re-find re line)] {hit 1} {})) (defn count-lines [lines] (apply merge-with + (map tally lines))) (defn find-widely "Return a map of pages to hit counts in filename." [filename] (apply merge-with + (pmap count-lines (partition-all *batch-size* (line-seq (reader filename))))))

Processing the big dataset, it ran in 1h28m, while burning about 7h25m of CPU. On impulse, I changed his *batch-size* to 100 and this had no effect on the elapsed time but cranked the CPU to just over 8h. Concurrency is weird.

Once again, if you’re not already a Lisper, take a minute to look at and think about this code.

I Look At This Code · And what do I see? First, these guys have internalized the APIs and libraries, and in particular the list- and sequence-processing functions, just like a seasoned Perlmonger or Java-head have internalized those languages’ key APIs. And you’re not really a Clojure programmer until you’ve done that.

It’s remarkable the degree to which you can push all your boring arithmetic and book-keeping down into the guts of declarative/functional calls like partition-all and merge-with . In particular you have to admire the elegance of how John’s tally function flows into merge-with .

The compactness of this code compared to, for example, mine, is remarkable.

Is It Expressive and Readable? · Which is to say, maintainable? Until there are some measurements in a controlled-experiment kind of setting, the answer to that has to be personal and anecdotal. So I’m not going to offer mine right now; I’d like to hear others’ opinions.