Distributed computing had been a research area for quite some time and now the industry is embracing it with both hands. New buzz words in the industry like NoSQL, NServiceBus, Hadoop, Bigdata etc. are all one or other form of distributed computing. There are multiple algorithms/principles which evolved before what distributed computing became what it is today. One of approach is eventual consistency across distributed systems.

So what is it?

Eventual consistency is a way of maintaining the consistency of system/application state across machines with acceptable delays. This means that data in the distributed systems will not be the same for some period of time before it achieves consistency. Eventual constancy is not related only to database systems but also applies to other systems. Below are few mentions where we can see eventual consistency happens

NoSQL databases like (Cassandra, Gemfire, RavenDB etc.)

Messaging based systems (including NServiceBus)

Domain Naming System (DNS)

SQL Server Replication (of course it’s been there for some time)

Amazon’s Dynamo

Why eventual consistency:

Any general purpose relational database like SQL Server, Oracle, MySQL etc. follow the ACID (Atomicity, Consistency, Isolation and Durability) properties. In order to follow these rules, they are heavy dependent on resource locking (rows, pages, table, and database). Let’s take an example,

UPDATE dbo.employee SET salary=salary + 500 WHERE employee_id=1508

Here we are updating the salary information of an employee. In order to execute this query, the database server needs to lock the row/page to ensure that no one else is viewing stale data (depending on Isolation level) or updating it while the query execution is in progress. The DB server needs to do this for each of the query executed against the database. If there are 2 million rows updated, the database server will have to create locks for each of the 2 million rows to ensure data consistency (of course lock escalation will take care of this). This is too much to handle for any database engine. The result of all this locking is poor performance.

Data analytics reveals that 80% of the applications does not require immediate consistency but requires very good performance. So in distributed systems, the data is scattered across multiple systems and there are replicas maintained to ensure good read performance. Since the data is distributed, traditional locking mechanisms cannot be used. Few good engineers decided to move the logic of locking out of the database systems and place them in the application so that they have more control on data consistency. The result of this approach is called eventual consistency.

In eventually consistent systems, the data/application state is not updated in all the replicas at the same time. But the primary data owner is updated and the replicas will get the update eventually. The UPDATE can be triggered either based on READ or WRITE or asynchronously. Eventually, consistent systems need to maintain a history of transactions so that they can make application state consistent or tolerate the inconsistency and take corrective actions.

Can we tolerate inconsistency?

This purely depends on the application. Let’s take few examples where NoSQL (or any eventually consistent implementation is done).

In a sales application, a user (a sales person) has updated a consumer’s date of birth/age, address, and phone number. Let’s assume the user information is updated in only one of the copies. What if another user (say billing person) reads the same consumer information from another data replica which is not in sync with the latest update? Of course, the data is inconsistent and the second user will read stale data (unless we have read triggered consistency). How important is this information? What action does the second user going to take based on this information? With a casual glance, we may say the date of birth or age can be inconsistent for some time. What about the address and phone number? Can we tolerate this inconstancy? Yes and No. Yes, when there is no action taken by the billing system. When the billing system is going to use the consumer address for shipment we will not be able to use stale data.

Let’s take another example. In ticket booking system, two users are trying to book a ticket to a cinema hall. Let’s assume both of them are trying to block the same seat (say A21) by connecting to two different data replicas. Can both the users be allowed to block the same seat? Yes, both the users can block the same seat until one of them makes the payment and confirms the ticket. When the second person makes the payment, we need to confirm if the same seat has been already used. If the seat is already booked, we need to intimate the user and refund the amount back to the user.

Critical section:

Every workflow has some portion of logic inside them where data in-consistency cannot be tolerated. We can call this portion of code as a critical section. In the billing system above, using the consumer address for shipment is a critical section for address data. In the ticketing system, the seat availability data enters the critical section when we receive the payment confirmation. The critical section of the workflow is a place where it should know if the data is consistent or not (dirty flag) so that the application can decide to live with the inconsistency or to enforce consistency.

Few points to think about, before we implement eventual consistency in our applications.

We need to identify the data specific critical sections in our application where data inconstancies cannot be tolerated.

If there are inconsistencies, what are the rollback or corrective workflows that will solve the problems of inconsistency?

What are the policies around the corrective workflow? They should be clearly documented and made available to the consumers.

End Note:

In SQL Server, changing the isolation levels can make significant changes to the way locks are handled. I remember SQL Server version 7.5 had only 4 isolation levels and now there are 7 isolation levels in SQL Server 2012. This is also a sign of how developers are increasingly taking control of parallelism in their applications rather than relying on the DBMS to handle it. With distributed databases, the control is more in the hands of developers. The world is moving towards faster and more distributed systems and eventual consistency is one of the things that are fueling this. I know this is a hot topic to discuss for everyone so don’t restrain yourself, please feel free to provide your comments/thoughts on the topic.

Read about Micro-Services at below:-