Consistency models

Why this post?

I have found somehow difficult to find an authoritative source which describes what is the contract that a certain consistency model defines with the programmer. Moreover, some resources seem to contradict each other.

The idea of this post is to document my findings around some of the most common consistency models that can be found in distributed systems which we interact with. When possible I'll add the resource where I found the definition.

Consistency model as abstraction

In distributed systems, a consistency model is a contract between the system and the developer who uses it. A system is said to support a certain consistency model if operations on memory respect the rules defined by the model.

They are a powerful abstraction which helps to describe a system in terms of its observable properties.

From a practical point of view, a consistency model answers to questions like: If Client 1 adds an object to a distributed data store at 1:00 pm and Client 2 tries to read the same object at 1:01 pm, will Client 2 be able to get the new object? Or will it get a 404 because the resource was not found?

The answer is that...it depends. There are many factors to consider to answer such a question. One of them is the consistency model adopted by the distributed data store.

Sequential consistency

The result of any execution is the same as if the operations of all processors were executed in some sequential order and the operations of each individual processor appear in this sequence in the order specified by its program.



Conceptually there is one global memory and a switch that connects an arbitrary processor to the memory at any time. Each processor issues memory operations in program order and the switch provides the global serialization among all the processors.

Therefore: Sequential consistency is not deterministic because multiple execution of the distributed program might lead to a different order of operations.



In the example above, four possible execution orders are possible no matter what is the actual order in which the operations were issued.

Important: no matter what the order of the operations are, it is guaranteed that is the same among all the processors. This also means that each process will observe the same history of operations. For example, in the first case b.1 will read the state which is the result of all the operations completed before b.1 : so b.1 is going to read a value for the account which is '150'.

Strict consistency

Also called Strong consistency or Linearizability.

Like sequential consistency, but the execution order of programs between processors must be the order in which those operations were issued.

As analogy, this is pretty much what we would get from a multi threaded program in which there is no cache between the cores and the main memory, and neither the compiler nor the CPU is allowed to reorder instructions.

Therefore: If each program in each processor is deterministic, then the distributed program is deterministic.

Causal consistency

Relax sequential consistency allowing also reordering of the operations in the same processors when they are not 'causally related'.

PRAM consistency

The write operations (and only the writes) of each processor appear in the order specified by the program.

Serializability

Unlike the models described above, serializability defines guarantees on group of operations (called transactions). It guaranties the execution of a set of transactions (usually read and writes operations) over multiple items is equivalent to some serial execution of the transactions.

Unlike linearizability, serializability does not impose a deterministic order, in this sense is similar to sequential consistency but for a group of operations.

Strict serializability, instead, guaranties the same constraint expressed by linearizability but for transactions: transactions must be executed in the order they were issued.

The definition looks a bit different from the ones above because it comes from database people, while sequential consistency, etc. come from distributed system people.

Read after write

The model guaranties immediate visibility of updates to an item to all the clients. The concept is similar to the one of Memory Barrier present in modern processors.

Note: Sometimes what is called Read after write in some systems, it's actually Read after create (for example in S3).

Read after create

The model guaranties immediate visibility of newly created data to all the clients. Updates to an existing object might not be immediately visible to all clients.

Conclusions

Those were my two cents about one of the many aspects around distributed systems. I'll keep adding more to the list as I go. Of course, any feedback would be greatly appreciated!

Additional resources