When it comes to optimization, there are generally two prevailing camps: Optimize Early and Optimize Late, with the latter being the by far larger group, and both having good arguments for their case. The main arguments used by the Optimize Late crowd are that optimized code is harder to read, harder to maintain, less flexible, often contains bugs and above all that it’s often only 10% of a code base which drastically impact performance. On the other hand, the Optimize Early crowd argues that the slow 10% in reality never exist in isolation, are scattered around, hard to find and hence optimizing them usually is limited to piecewise micro-optimizations and therefore requires a large amount of refactoring and re-testing, all of which can be avoided by simply being more aware of performance critical sections during the design and implementation. For them, it’s a matter of better understanding language constructs, algorithms and how the machine actually operates and therefore write more efficient (rather than just functional/working) code in the first place. System response times are/should be part of the design spec and been given time budgets, as e.g. is often done in game development and embedded software with hard real time limitations. We can’t argue that this is a bad thing, can we? (Just for the record, I’m trying not to be ignorant of either way and unconsciously aim for an happy compromise between these polar extremes)

With this in mind, as part of this first exercise we looked at:

Awareness & understanding overheads of idiomatic language patterns

The textbook approach to encoding a 2D data grid in Clojure/script is using a nested vector, which then can easily be processed using map / reduce to produce the next generation in the simulation. Accessing individual grid cells is also straightforward using (get-in grid [x y]). However, in the GOL simulation we need to access 9 cells (1 cell + 8 neighbors) in order to compute the new state of each cell. So in a 1024 x 1024 grid this use of get-in will result in the creation of 9,437,184 temporary Clojurescript vector objects (the vectors for the lookup coordinates) per frame, exercising a huge pressure on the garbage collector. In addition, since get-in can take lookup paths of any length and works polymorphically using protocol methods, each invocation also incurs a call to reduce, resulting in even more temp objects, an iteration loop and a load of protocol dispatch functions for its internal use of get — altogether a lot of (way too much!) work for a simple 2D index lookup.

In some situations (only if the lookup path is static, as in our case), we could write a macro version of get-in, expanding the lookup calls at compile time and thereby removing at least the overhead of a vector allocation and the use of reduce at runtime:

Benchmarking this example with criterium under Clojure (which has somewhat different/faster protocol dispatch than in Clojurescript), the macro version results in 43.61ns vs 205.18ns for the default get-in (~5x faster).

Often these things are relegated as micro-optimizations and in some ways they are, but considering that core functions like get-in are heavily used throughout most Clojurescript applications, being more aware of the inherent costs is useful and can help us looking into alternative solutions when needed.

Btw. One of the intermediate steps taken to speed up our simulation was using transduce instead of map & reduce to compute the number of alive neighbor cells, however this ended up actually being ~15–20% slower in this case. We have not looked into the reasons for this (yet)…

Persistent vs mutable datastructures

The more obvious improvement to speed up the simulation was using a flat 1D vector to encode the grid and calculate cell indices for the 2D coordinates, much like in a pixel buffer. This not just gives us better cache locality, but instead of get-in we could now just use nth, gain a ~6x speed up and somewhat simpler code.

The final step (leaving out some other stages) of this exercise was an introduction to JS Typed Arrays, creating typed views over byte buffers and updating the canvas not via its 2D drawing API, but making use of direct pixel manipulations via the canvas context’s ImageData. Since all our data (both simulation grid and pixels) are stored in typed arrays, we switched to only use loop instead of map / reduce (thereby removing millions of internal function calls) and altogether gained a ~650x speedup compared to the original.

A live version of the exercise is here: http://demo.thi.ng/ws-ldn-8/gol/ (Please be aware that the UI for the “naive” mode and largest grid size will completely freeze for ~10 seconds)

Some of the other things we talked about:

avoid keywords or collections as functions (use get instead)

use named functions instead of closures for map/reduce fns

protocol function dispatch overhead

loop vs doseq

deftype vs. defrecord (code size, memory efficiency, protocols)

controlled use of set! and volatile! to achieve mutability

WebGL

To anyone interested in directly utilizing the GPU in the browser, WebGL is a huge & fascinating topic, but it can also be very daunting for newcomers to graphics programming, since efficient use of it requires a multitude of prerequisite knowledge and terminology about 2D/3D geometry, linear algebra, spatial thinking in multiple spaces (coordinate systems), low-level data organization, the OpenGL state machine (with 100’s of options), GPU processing pipelines, knowledge of the GLSL shading language, color theory etc. Not all of it has to do with actual coding and it’s often the theory moments when A-level maths knowledge comes back knocking on our door — it’s a lot to take in, especially in a 3-day workshop, but we tried to cover most of the core topics (and altogether probably spent most of the time on that) and we put theory to practical use with the help of various thi.ng/geom examples. Later on we walked through an early prototype for a WebGL game written in Clojurescript, going into more advanced topics, incl. creating mesh geometries from scratch and creating a path-following camera etc.