The system model

WriteRegion == 1

ReadRegions == 2..NumRegions

chan = [n \in 1..NumRegions |-> <<>>];

The write region actions

Commit the write in the write region

Multicast the write to the read regions

Receive/process an ack message from a read region

MAXN

MAXN=5

The read region actions

Invariant checking and testing

CP == [][\A i \in ReadRegions:

CommittedLSN'[i] = CommittedLSN[i]

\/ CommittedLSN'[i] = CommittedLSN[i] + 1]_vars

What are some other properties we can test here?

SyncStep == \A i \in ReadRegions :

CommittedLSN[i]> CommittedLSN_[1] -3

SyncStep2 == \A i,j \in ReadRegions:

CommittedLSN[i]# CommittedLSN[j] -3

SyncStep

CommittedLSN

CommittedLSN_

SyncStep2

CommittedLSN=3

Things I tried to speed up the TLA model checking

Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away. --Antoine de Saint-Exupery.

MAXN

CompletedLSN

MAXN

CommittedLSN

MAXN

MAXN=5

CompletedLSN

CompletedLSN

We had covered consistency properties in the recent posts . Now to keep it more concrete and to give you a taste of TLA+ , I present a model of a distributed data store that provides the consistent prefix property.In my next post, I will extend this model slightly to present bounded and strong consistency properties. And, in the post after that, I will add a client model to show how this data store can be extended to provide session consistency. The TLA+ (well, PlusCal to be more accurate) code for this model is available here. We assume there is a write region (with ID=1) and read regions (with IDs 2 through NumRegions). The write region performs the write and copies it to the read regions. There are FIFO communication channels between the write and read regions.The write region has 3 actions to perform:These three actions can be performed in any arbitrary order inside the process loop, and the model checker will ruthlessly try all possible combinations of these actions interleaved with the read region actions to check if there is a violation of the invariant properties entered.The first action selection is to commit a write in the write region. (We don't get into how that is implemented in the region; maybe commit is done by replicating the update to a write quorum in the region.) As a result the CommittedLSN is incremented. CommittedLSN keeps track of the write operation. We are not modeling the content of the write, we will model the write only with its CommittedLSN and try to get the writes replicated in the read regions in the same order.The second action selection is to multicast a committed write to the read regions. To send the writes in order through the FIFO channels, the write region maintains a SentLSN variable to denote the last CommittedLSN write that is forwarded to the read regions. If there are some CommittedLSN writes that are yet to be forwarded, these are multicasted in order of the LSN numbers without any gap. However, note that, this is an asynchronous replication. The messages may be sent whenever this action is chosen, and this send action is not blocked on waiting any acknowledgements from the read regions.The final action selection is to receive an Ack message from a read region. This model does not use the Ack messages for updating anything, but they can be used for tracking the progress of the read regions. This will become important for the bounded consistency and strong consistency properties.The loop is bounded by a constantso that the model checking space is limited.is an acceptable number. If you have worked with a model checker, you know that checking 5 repeat of the operation without the invariant broken is a very good indication it works. Yes, this is not induction proof, but since the model checker tries everything that could possibly go wrong, it would have found a counterexample (e.g., race-condition that violates an invariant) even with 3 repeats or so.The read regions only react to the replicated messages from the write region. So they have only two actions.The first action selection is to receive a message pending in the channel from the write region. CommittedLSN variable is then updated to match the LSN in the message. Here we don't check for the gaps, but since the channels are FIFO and the messages are sent by the write region in the increasing LSN order without gaps, the consistent prefix property holds ---as we later confirm when we model-check with the invariant.The second action selection is to send back an Ack message for any replication message that is not acknowledged yet. Since this action is also asynchronously chosen, the acknowledgement can be cumulative, it can skip over LSNs and acknowledge the highest seen, and that's OK.Our invariant is short and sweet. It checks that at any read region, the CommittedLSN --if updated-- is always incremented by 1 over its previous value. That is, there are no gaps in the commit sequence.An alternative would be to insert a PrefixConsistency boolean variable in the specification, which gets set to FALSE at the read region if the replica receives an out-of-order commit request. But that makes the model ugly; it is not nice to entangle the model and property to be checked.Another alternative would be to write a client, and to check for operation consistency among the client reads . But that is cumbersome, because you need to introduce a client and a read operation at the read regions. Furthermore, that will also cause the model checker to take longer time to complete.In the consistent prefix guarantee the write region can commit asynchronously and freely without waiting for the read regions to catchup. The read regions can catchup on their own pace.Let's write some conditions, "fake invariants", to test that this is the case.The model checker is quick to find counterexamples for these conditions. For thethe error trace provides a short counterexample where the write region raced ahead to commit 3 updates without broadcasting any updates to the read regions. (Since I usedvariable for both read and write regions, the TLA translator assigned"with underscore" to the write region's version to distinguish it from that of the read regions.)For the, the error trace is longer because it needs some setup. Here the write region performed 3 writes and broadcasted these to the read regions, but only the read region 2 performed the updates and get to, while the read region 3 has not performed any receive action from its channel.I believe it is important to share the mistakes committed as well as the end result , so others can learn better.My first model was large. My improvements involved trimming and simplifying that model.To keep model checking feasible I used. But I found that the model checker would still be crunching scenarios after ten minutes. I knew that this is a simple protocol and it shouldn't take so long to model check. Then I noticed that the problem is with bounding the run on the wrong thing: I was bounding thevariable with, but thewas still able to race unbounded above the. After I noticed my mistake the model checker took only a couple seconds forWhat is? In my earlier version of the model, the write region usedto keep track of the progress of the read regions, but I realized this variable was unnecessary for checking the consistent prefix, so I took that out entirely.I also performed other simplifications to reduce the model checking state space. Instead of sending messages to the regions one by one, I modified the write region to queue the messages at once via the multicast operation.For eventual consistency, we don't need to maintain CommittedLSN and a sequencing of writes. The write region can maintain a set for writes/updates, and multicast any write/update from that set at random. The read region just receives an update and performs it. There is also no need for acknowledgement from the read region. The repeated multicasts will eventually establish that every write/update is replicated.In my next post, I will extend this model to specify bounded staleness and strong consistency properties. As part of this extension, the write region would need to keep track of the updates performed at the read regions and slow down to make sure it is not going too far ahead of the read regions. Since session consistency is specific to a client, we will write a client model and let the client share/copy LSN numbers and get served using that LSN number. The "LSN" will be our session state. As I wrote above, I had several blunders before I could simplify this model to what I presented. And I learned a lot from developing this simple model. So, this was definitely worth modeling and working on.