Experiments with parallel genetic programming in Clojure

I’ve been experimenting with genetic programming, learning Clojure as I go. I came to the point where I wanted to make my program parallel.

First of all, I am amazed at how readable and concise the code turns out to be. As an example, take a look at this function:

(defn choose-reproducing-parents [individuals] (take 2 (sort-by :fitness > (select-randomly 5 individuals))))

It doesn’t get more readable than this!

But the real joy came when I started to parallelize my code. Normally, the process would involve extra libraries, lots of fussing around with locks, and hours spent debugging deadlocks. So let’s look at an example function I needed to parallelize. Generating random code takes quite a bit of CPU time, especially if one needs to generate code for thousands of individuals. There is a function for that:

(defn create-random-individuals [number code-generator] (map create-individual-from-code (take number (repeatedly code-generator))))

And here’s the reworked parallel version:

(defn create-random-individuals [number code-generator] (pmap create-individual-from-code (take number (repeatedly code-generator))))

Can you spot the difference? Yes, that’s it. The little letter ‘p’ is all it takes for the work to be spread among all of my CPU cores.

Other functions required more work (again, notice how concise and readable the code is):

(defn produce-offspring [population number] (take number (repeatedly #(reproduce (choose-reproducing-parents population)))))

For those unfamiliar with Clojure, repeatedly produces a lazy sequence whose elements are produced by the function (in this case an anonymous function) supplied as the argument. take simply takes the first number elements of a sequence.

And now for the parallel version:

(defn produce-offspring [population number] (pmap reproduce (take number (repeatedly #(choose-reproducing-parents population)))))

Encouraged by this, I then moved on to parallelize the most time-consuming step of all GP programs: fitness evaluations. I’ll spare you the boring details of extended parallelization work I did on the function (but see examples above). The result was:

(defn test-generation [population fitness-function] (pmap #(set-fitness % (fitness-function %)) population))

After fitness was evaluated for all individuals, the next generation was produced and the fitness evaluations were run in parallel again.

This worked fine, but I thought there should be a better way. pmap limits me to a single multicore machine. This is fine for now, but in the future I plan to move to a distributed cluster, where the synchronous nature of map would be limiting. So I tried to write an asynchronous implementation.

First, I defined my pool: a collection where individuals are gathered once their fitness is evaluated:

(def pool (agent (vector)))

The pool is a Clojure agent. Agents are a synchronization primitive: you can send them actions (functions), which will be queued and executed in order.

As you can see, the pool initially starts as an empty vector. So how do individuals get to the pool? Their fitness needs to be evaluated, and then they need to be added to the pool. It all starts with this function:

(defn run-individuals [individuals] (dorun (map #(send % test-individual *fitness-function*) (map #(add-watcher (agent %) :send pool fitness-tested) individuals))))

First, we make each individual an agent, so that we can add a watcher to it. Watchers are a cool Clojure feature — they let you watch for state changes. Using add-watcher we add a watcher to each individual, telling it to send a fitness-tested action to the pool (which is also an agent, remember?). Then, once we’ve set up watching for state changes, we send a test-individual action to each individual, giving it the fitness function as a parameter. test-individual is a really simple function, all it does is call the fitness evaluation function and return the new state of the individual.

The dorun is necessary, because we’re dealing with lazy sequences and discarding the result (sending agent actions is a side effect). If the dorun wasn’t there, the entire sequence would never get evaluated and actions would never get sent.

Let’s see what happens once the pool is notified of a state change in an individual:

(defn fitness-tested [watched-population individual] (let [population (conj watched-population @individual)] (if (>= (count population) target-population-size) (let [new-population (prune-population population)] (run-individuals (produce-offspring new-population (- target-population-size (count new-population)))) new-population) population)))

First, we add the new individual to the pool. If we haven’t gathered enough individuals, we simply return the pool with the individual added — this updates the global pool.

The fun part begins when we have enough individuals to produce the next generation. Then, we prune the population, deleting the poorest individuals and produce a new batch of individuals, letting them go using the previously described run-individuals.

If you’ve done any parallel programming, you’ll probably worry about multiple threads modifying (pruning) the population simultaneously. Not to worry — Clojure agents are monitors, you are guaranteed that only one action will execute at a time.

We now have a fully asynchronous parallel GP implementation. Notice how there aren’t any queues, locks, thread pools to manage. All we have is a single global variable and a couple of simple functions. We don’t need any new data structures! The beauty of this solution is that because we’re using agents and watchers, Clojure does the queueing for us. Look, ma, no queues!

I am very happy with how easy and clean this solution turned out to be. I can now see why people keep raving about Clojure. Somebody finally did some serious thinking and implemented a new approach to parallel programming, not just a rehash of old ideas.

In less than an hour I went from a sequential implementation to a parallel asynchronous one. I’d say that’s impressive. And most importantly, the same code still runs on a single-cpu machine, with minimal performance impact. I am very impressed.