Network Semantics for Verifying Distributed Systems James Wilcox

This is the first post in a series on Verdi. In this post, we'll get our feet wet by defining a formal model of how distributed systems execute on the network.

Distributed systems are hard to implement correctly. Because of the interactions between many potentially failing components, it is essentially impossible to perform exhaustive testing on any real-world implementation. Most known techniques for checking concurrent and distributed algorithms work only on models of the system, which results in a formality gap between the model and its implementation. For example, classical model checking has been shown to be extremely effective in finding concurrency and fault-tolerance bugs at the design level, but it does not scale to the complex data structures used in real systems. Formal verification with human-guided, semi-automated, machine-checked proofs is the only technique that can provide the desired guarantees for such systems.

Verdi is a Coq-based framework for implementing and verifying distributed systems. At the core of Verdi are a set of network semantics. Each network semantics encodes a set of assumptions about what kinds of things can happen to a distributed system. Network semantics can range from the idealistic to the pessimistic. For example, one might assume that all messages are eventually delivered and that nodes never fail. On the other hand, one might assume that packets can be dropped and duplicated and that some nodes behave arbitrarily or maliciously. Different systems are designed under different sets of assumptions, and network semantics capture those assumptions.

The main point of defining a network semantics in Verdi is to verify distributed systems against that semantics. After verifying a safety property \(\Phi\), Verdi enables programmers to extract executable code and run their systems on real networks. Assuming the network semantics correctly describes all possible behaviors of the system's environment, Verdi guarantees that \(\Phi\) holds on all executions of the system.

The rest of this post describes Verdi's network semantics by developing a slightly simplified version of them in Coq. For the real deal, see the file Net.v on github.

Assumptions common to all network semantics

Verdi systems consist of a set of nodes operating in an asynchronous network environment. Each node keeps some local state and can exchange messages with other nodes. Messages can be arbitrarily delayed and reordered. Nodes also interact with the external environment through input and output. Verdi systems are written as a set of event handlers. For example, when a message is delivered to a node, that node's message delivery event handler is run. An event handler can read and write the local state of the node, send messages to other nodes, and send output to the external environment. A special handler is run at the beginning of time to initialize the each node's state.

Verdi's network semantics are defined by describing the possible "steps" that a system can take during execution. For example, one step might be that a message in the network can be delivered to its destination and the message delivery event handler run. The possible steps are defined formally as a relation on "worlds". A world is snapshot of a system's execution, and consists of the local state of every node of the system, the set of packets that are in flight, and the trace of external events.

Section Verdi . Variable node : Type . Variable state : Type . Variable msg : Type . Variable input : Type . Variable output : Type . Definition externalEvent : Type := node * (input + output). Record packet : Type := { dest : node; payload : msg }. Record world : Type := { localState : node -> state; inFlightMsgs : list packet; trace : list externalEvent }.

Here we declare five variables that describe the system and its interaction with the environment. First, node describes the set of nodes (or node names, if you prefer) in the system. For example, this might be some set consisting of high level names such as "Database" and "Client". Second, state is the type of the local state that each node maintains. For example, in an in-memory key-value store, this might consist of some sort of map data structure. Third, msg describes the messages sent between nodes. For example, in a key-value store this might consist of read and write requests intended to be sent from the client to the database, as well as responses and acknowledgments, intended to be sent in the other direction. Fourth, input describes the events that nodes respond to from the external environment. For example, input can be used to describe requests that come from clients whose code we did not write or do not want to verify. Finally, output describes the events that nodes send to the external environment. For example, output can be used to respond to requests that were made via input .

An externalEvent is either an input or an output , together with the node at which the event occurs. A packet contains a message and a destination. The world consists of a function from nodes to their local state and a list of in flight messages together with their destination. Although we use a list of packets here, we are careful to treat it as a multiset; that is, the order of its elements does not matter. Future versions of Verdi may use a more abstract multiset type here instead. Finally, the world also contains trace , which is a list of externalEvent s. The trace records the history of all external events that any node has ever sent or received; it allows us to reason about the external behavior of the system over time.

Every semantics also relies on the system providing a set of event handlers.

Variable initState : node -> state. Variable processMsg : node -> msg -> state -> (state * list packet * list output). Variable processInput : node -> input -> state -> (state * list packet * list output).

When a node starts, the initState computes its initial state. When a message is delivered, the processMsg handler is run. Its first argument is the node on which the handler is running (that is, the destination of the message). Its second argument is the message that was delivered. The third argument is the current value of the node's local state. The handler returns the new value of the local state, a list of packets to send, and a list of outputs to produce. The processInput handler is very similar, except it takes an input instead of a msg .

The reliable semantics

It isn't too hard to hook up all these parts into a semantics that delivers messages from the network and allows arbitrary input from the external environment. We call this semantics the reliable semantics because no messages are dropped or duplicated, and no nodes ever fail.

Inductive reliable_step : world -> world -> Prop := | step_input : forall w i n st' ms outs , processInput n i (localState w n) = (st', ms, outs) -> reliable_step w (Build_world (update (localState w) n st') (ms ++ inFlightMsgs w) (trace w ++ [(n, inl i)] ++ record_outputs n outs)) | step_msg : forall w p st' ms outs , In p (inFlightMsgs w) -> processMsg (dest p) (payload p) (localState w (dest p)) = (st', ms, outs) -> reliable_step w (Build_world (update (localState w) (dest p) st') (ms ++ remove_one p (inFlightMsgs w)) (trace w ++ record_outputs (dest p) outs)).

Definition reliable_step_star := clos_refl_trans_n1 _ reliable_step. Definition initWorld : world := Build_world initState [] []. Definition reachable (w : world) : Prop := reliable_step_star initWorld w. End Verdi .

The reliable semantics is implemented as the relation reliable_step , which is defined in terms of two rules. The first rule delivers an arbitrary input i to an arbitrary node n by running its processInput handler, passing in i and the current local state. The results of processInput are used to build a new world where the local state is updated, new messages are added to the in flight list, and the trace is updated to record the input and any outputs.

The second rule delivers a packet p from the network by running the processMsg at dest p with the p 's payload message and the current local state as arguments. The results are used to update the world, similar to before, except that one copy of p is also removed from the network.

Other network semantics

Small-step operational semantics, like we've just seen in reliable_step , are a powerful and flexible way of modeling the behavior of systems. As an exercise, the reader (or at least, the reader who hasn't read our paper...) should try to design rules for dropping and duplicating packets, being careful not to depend on the order of the list of packets. Verdi also includes semantics modeling node failure, but it requires changing the world type to keep track of the nodes that have failed. For more details, see the file Net.v in the Verdi implementation.

What's next

In our next post, we'll show how to actually build and verify a system in Verdi using the reliable network semantics. Future posts will also discuss Verdi's verified systems transformers, which provably hide faults, so that application programmers don't have to reason about them.

Last updated: February 2, 2016