But first, let’s measure the performance characteristics of the original code. I have made a few simplifications from the code above. I use the vanilla (i.e clojure.core) version of transduce , not the one from core.async, and to check if we have an error here, we simply check if a chunk has an :error key .

Let’s build some chunks to try it out:

I used a batch of 10 chunks, each containing 999 “datoms”. Here the datoms are simply maps with a single :a key associated with the value 0, 1 or 2 (so a chunk is an alternating sequence that looks like this: ... {:a 0}, {:a 1}, {:a 2}, {:a 0} {:a 1} ... ). Each intermediary frequency maps built during the transduction will be {0 333, 1 333, 2 333} , and the final result of the transformation should be {0 3330, 1 3330, 2 3330} .

(If you want to play with it yourself, the code that inspired this post is here: https://github.com/chpill/transducers-deep-dive/, there is a fair bit of noise but you should easily find the parts discussed here). Okay, we now have our point of comparison. Let’s rewrite the transformation using xforms:

The main difference with the original implementation is that we no longer have intermediary calculations. The cat transducer unpacks the chunks into a stream of maps that is then consumed by the (x/by-key :a x/count) transducer.

If you really want to understand what is going on with x/by-key , you should check out this page of xforms’ wiki. In our particular case, x/by-key calls :a on the maps, and for each distinct value it sees, it spawns a new transduction context in which it applies the x/count transducer. So basically, it counts the number of occurrences of each distinct value returned by :a . When x/by-key has consumed the whole batch (that is to say, its input is reduced ), it terminates the spawned transduction contexts and start passing downstream the pairs of [distinct value, number of occurrences]. If you look at the inner working of into , you will see that it adds conj as a final step for the transduction. So the pairs produced by x/by-key are incrementally added to our destination empty hash-map {} , producing the frequency map!

If you just read the wiki page, that last paragraph, and still have no idea what is going on, it’s okay. Take your time, it sure took me quite a while grasp… Try to play with it a bit to get a feel of what x/by-key does. For example, consider that (x/by-key :a x/count) achieves the same as (comp (map :a) (x/by-key identity x/count)) .

I find that the computation is pretty clean expressed that way, and because it does the frequency calculation in one pass, it is also a good deal faster than the original version. Sadly, there is something I must confess to you now… It does not work when there are errors!