What are the different concurrency models , and what does Clojure do differently?

In this post I hope to give an introduction to some of the different concurrency models , what they are and what characteristics separate them.



Then we’ll take a look at the model Clojure has opted for , how we can use it and what are the advantages and disadvantages when it comes to implementing it.



Now that sounds like a lot to go through in one article , and it is ! So let’s jump into it !

Say I’m getting ready for work , I need to get dressed and make breakfast. Parts of my routine involve making an omelette and putting a shirt on . Sure , I can crack an egg into the bowl and then put an arm through one sleeve , crack another egg and slip into the other sleeve but the problem is I can’t do them at the same time . I have to swap from one task to the other. It doesn’t really matter what order I go about things , I can stop , start and resume each task when necessary but ultimately I don’t have four arms to do both.



So we can say ‘concurrency’ is where we can stop , start and complete tasks that have the ability to overlap. But parallelism would be when both tasks can be completed at the exact same time , hence the need for four arms.



We could say concurrency is when the set of instructions (a program) dictate the order of tasks being executed , both tasks of which are making progress but not at the exact same time. Bringing this back to our example , we can set ourselves a plan for cracking eggs and getting dressed whatever way we choose , but to be parallel means a fundamental change in our infrastructure to handle tasks at the exact same time. This “change in infrastructure” would be us gaining another pair of arms but in computing terms this would be something like a multi-core processor to handle a greater number of problems.



Alright , now with that out of the way let’s look at some concurrency models . One popular way of programming concurrently would be to use the actor model , made infamous by the Erlang Runtime proclaiming reliability and fault-tolerance.



While this is an over-simplification , you can think of an Actor similar to an object in OOP , which will accept an incoming message and proceed to execute some kind of computation based on it.



Moreover in OOP there isn’t one object for all messages , there are many objects for different kinds of messages , the same is true for actors. Many actors are defined which form a system , allowing them to send messages to each other.



But.



The thing that separates the actor model from another models is no shared memory. Each actor is given their own , independent , slice to work with and handle messages. Therefore if one actor would like a different actor to compute a given message it has to copy it (in the use case of fp as data is immutable) and store it in the actors’ mailbox. The mailbox behaves like a queue , with the first message being taken out and worked on by the actor and then the next message is sequentially executed.



While an actor works on messages sequentially , this would not be considered synchronous. This is because the actor may choose to designate others to work on that problem , shifting into another actors’ mailbox. Moreover the actor moves onto the next message in the mailbox and doesn’t sit around waiting to find out what happened to that message , allowing for asynchronous message passing.



The last point I’ll make here before weighing the practicality of this model , it to consider the ease of parallelism. Because if you think of actors separate to each other , only aware of each others addresses then it doesn’t matter if they’re local or on completely different computers. This is how languages like Erlang employ the “let it crash” philosophy because actors don’t share any state , so if one fails it can be restored without the system as a whole being concerned.



Now let’s look at this model from a pragmatic standpoint with our new-found knowledge.



The argument for this model is that it is less costly for data to be copied , than to be suspended (locked) . If we look at the Akka docs for example , an implementation of the model , it is said → “locks seriously limit concurrency , requiring heavy lifting from the operating system to suspend the thread and restore it later”. While this is true , it can also be argued that complexity rises with how easy it is to fall into the mindset of implementing everything as actors. And what I mean by this is to break down all concurrent processes into many actors which handles very few (or just one) message. With a prolific amount of actors risks large amounts of receive/send messages . Moreover it is crucial that the design prohibits this from happening to avoid performance deficits.



As some Clojurists may have noticed , this goes against one of the core philosophies . That data should reside collectively in large data structures and we define many functions that operate on that block. But the actor model emphasises the use of small , isolated state away from other parts of the program.



One more thing I’ll mention on this topic is that these units of computation (actors) operate on many different types of messages. The solutions being many actors of different types , but this can lead to a lot of code being written just to communicate to other actors. Or, to not have typed actors making them , in my opinion , unpredictable. Moreover with the way that state is handled (internally and hidden from other actors) it makes it much harder to test and reason with the side effects that are produced.



Now don’t get me wrong , this model can fit some problems very nicely. Take the WhatsApp architecture for example. They needed to be able to process millions of messages per second with no real downtime. Actors fit well here , as data doesn’t need to be collectively shared , just processed and stored . In 2014 it was recorded that they were able to scale this up to 70 million messages per second.



So if actors don’t fit the clojure methodology , what could we use instead ?



Well , one way would be to utilise the standard multi-threading model already available by the Java Runtime.



I really like this answer from Quora , as it explains very nicely the pros and cons of such a model



"

When lots of data is shared, especially when that data is read-only, a threading model may be faster. However it's probably not easier to program.



However a threading model doesn't work well when critical sections (a more general word for "transaction") might be nested. If you call someone else's function how do you know it doesn't use a lock or monitor? And if it tries to use the same lock as you're using you may create a deadlock.



"

So while the model of threading allows us to keep data central , it suffers the same issues as the actor model regarding complexity.



However , Clojure has the innate advantage of being functional , so the deadlock issue could be mitigated with composable functions…



What if there was a model that could leverage threads and provide an alternative to locking.



What if we could also keep the database-style storage of data for reads and writes and also allow for composability.



I introduce , Software Transactional Memory !



Could this be the thing we’re looking for ?



Let’s start by taking a look at one of the differences of STM from standard multi-threading. While not a technical difference , but opposing mindsets. Threading will assume the worst , while STM can be rather optimistic. What I mean by this is that if you’re updating or changing one of the variables in the database , then it will be up to the programmer to lock that variable from all other parts of the program until it is not needed. With this assumption we now need to think about the overlapping updates and operations that could cause conflicts. Not only is the process of doing this very difficult to predict and implement , but it takes a lot of time and it is error prone .



Instead , how about we just assume nobody is working on the variable we are and if they are , how could we protect ourselves from race conditions (uncontrollable events) ?



STM will instead make a copy of the global variable you are referring to and let you perform whatever operations to it that you desire. After we have made any changes we like , we then want to commit our changes (sort of like a push to the master branch in github) . But if we find out that the value has actually changed over the course of time that we where operating , then we go back and start over.



So let’s call the state that can be changed atoms. To make changes to atoms , it has to be done atomically.



And how do I make changes ‘atomically’ ? I here you ask.



We must signal that our function should be repeated if our initial call doesn’t result in our atom being changed. But it is important to remember that these functions should not include side effects ! We should not be operating on any global state outside of our parameter list , because then depending on the value of this state (at the time of calling) it may be different . This is talked about in the clojure.org reference on atoms.



And with that I think I’ve given a clue about which of the models clojure has implemented...



When looking at the characteristics of clojure it just makes sense →



1.Lisp style philosophy of data-first . To have one data structure , our main abstraction , which allows many functions to build upon it.



2.Immutability and persistence to be able to have piece of mind that the data being changed cannot be done so directly or without record of the previous state.



3.With the functional paradigm emphasising composable functions , it becomes natural to think and write concurrent programs that also compose. What this means is that different functions which make changes can easily be layered , as in theory the functions only work on their inputs.



E.g. one function that takes an atom (which is a number in our case) and doubles it could be layered with a function that takes an atom and chooses to square root it , pipe lining the state down to the other functions.



4.Being functional may also mean that the cost of copying in more distributed models becomes too high.



Another issue with the STM model is that because the transactions that are made should be reversible , as failed transactions should mean changes to the variable can be undone for other processes. What this means is that the actual number of valid transactions we can perform becomes more limited.



The solution to handling these types of problems is to add operations of this type onto a queue to be performed at a later date. These transactions will be performed when no other transactions are taking place . It’s sadly because of this reason that STM implementations are widely inefficient and impractical for the real world.



So what does the clojure STM implementation look like ?



What Clojure will do is leverage the threading capabilities of the JVM and place tasks that can be executed concurrently inside that thread. Some STM implementations do lock resources , but in this implementation MVCC (Multi Version Concurrency Control) is used instead of locking. A simplified explanation of this is that a snapshot of the database is taken at a point in time. So readers will read from that snapshot without fear of inconsistency and overwriting from other processes. Likewise , writers to storage will receive a snapshot to work with and then that snapshot becomes the new one that gets used when the transaction is completed. Much more on this at Wikipedia →



There are reference types that clojure uses to manage access to state , some of which are atoms , refs and agents.



Note : vars will not be mentioned in this post.

We mentioned atoms a little while ago , now let’s explore how atoms are used in clojure.



As mentioned by Clojure.org → “Atoms provide a way to manage shared , synchronous , independent state”.



Independant state refers to state that it not linked to others. For example , if an item in a shop is sold it does not affect the current state of all the other items. We shall be working with dependant states with refs later on .



So lets define some state with an atom →



(def jumper (atom {:sold “false”}))

Now over time , someone may want to purchase this item. How could we accommodate this change?



We can do so by using the swap! macro .



(swap! jumper assoc :sold true)

But what if someone is in the middle of purchasing and someone is also trying to buy the jumper?



One of the reasons why atoms are useful is that even in the situation that multiple threads could change the :sold property of our jumper the act of updating is done synchronously.



What makes this process relatively straightforward is that swap! Itself implements “compare-and-set” semantics →.



1.read the state of the atom at that point in time and then attempt the swap



2.compare the global state of the atom after the update function has been run



3.set the data of the atom to the new data if there was no change to the atom during this time



In conclusion , atoms provide a nice way of dealing with independent pieces of state , synchronously.



Before we move onto agents now , I’ll leave a few links here describing the other functions and characteristics of atoms :



→ Stack Overflow



→Clojure Reference on Atoms



Agents



This is another reference type available to us for mutating state in a concurrent setting.



But what separates it from an atom ?



While agents and atoms are both provide uncoordinated access to their respective state , the difference is that atoms are synchronous while agents are asynchronous.



Remember that :

Synchronous → Execution of a task is started and completed without switching to another task



Asynchronous → Execution of a task is started and then set aside for another task , not waiting for it to be completed.



When we create a thread that works with our agent , changes are made outside of that thread of execution that called them (meaning that the execution of tasks is done asynchronously). So we would use agents for when we have a value that could be read/written by multiple parties , of which you do not have control over.



Moreover agents have the unique ability to perform IO and other functions with side effects , in a safe manner.



But how?



Agents have a similar characteristic to actors , where they have a queue of ‘actions’ that get sent to them to be performed. Each action that an agent receives however is serialised (translating a data structure into a format which can be stored in things like a file or memory buffer). So what we can do is set the state of an agent to be something like a database connection , and then we can have piece of mind that actions will have their own copy of that connection while their evaluated.



Now how do we send actions to agents you ask ?



Say we have a given agent , a :



(def a (agent 1))

Now to send an action to the agent



(send a #(* % %)) ;; #object[clojure.lang.Agent 0xd2c768 {:status :ready, :val 25}]

We can see that the action changed the data in the agent to 25.



Another function we can do is called send-off . However we should use this strictly for IO operations. Send is used for operations that are CPU bound. For a nice explanation on this subtle difference check out this extract on agents here:



Now let's move to Refs !



Refs



The standard example for explaining the use of refs is to use the bank account transfer example.



If I’m sending over £50 to a friend , then I must update both of our accounts . Me being £50 poorer and the friend being £50 richer.



So refs deal with many identities , let’s see how.



Refs , or references , are bound to a single storage location for their lifetime . Refs also only allow mutations of it’s state to occur within a transaction rather than atoms (because they are independent , the timing of change relative to other elements doesn’t really need to be considered).



So for refs to be relying on transactions we need to have the STM do some of the work for us.



The functions in the core.async namespace is going to help us write some concurrent clojure.

If we do



(doc dosync)

We see a nice short explanation of what this function offers us →



user=> (doc dosync) ------------------------- clojure.core/dosync ([& exprs]) Macro Runs the exprs (in an implicit do) in a transaction that encompasses exprs and any nested calls. Starts a transaction if none is already running on this thread. Any uncaught exception will abort the transaction and flow out of dosync. The exprs may be run more than once, but any effects on Refs will be atomic.

So dosync starts up a transaction for us , which means we can mean changes to our refs through it.



So let’s start by defining a ref of bank accounts



(def alex-account (ref {:name “Alex” :balance 500)) (def jake-account (ref {:name “Jake” :balance 500}))

To give £50 to Jake →



;; call by doing (transfer 50 @alex-account @jake-balance) (defn transfer [deposit sender reciever] (dosync (let [sender-balance (:balance @sender) reciever-balance (:balance @reciever)] (alter sender - sender-balance deposit) (alter reciever + reciever-balance deposit))))

Mutations like alter need to be done in a transaction , so if they fail they can be retried of if the value of the state changes during computation it can be retried cleanly.



Check out other ways of mutating ref data here →



And with that , we have just hit the tip of the iceberg of this concurrent and crazy world ! To further your reading and understanding of the topics covered here , as well as things like futures , delays and promises and the rest I'll leave some



If I’ve peaked your interest , I definitely recommend these :



For the Brave and True:

O'Reilly Clojure Programming Book

