Use Chaos to test the distributed system linearizability

Prelude

We usually use jepsen to test the distributed system linearizability and we have already supported jepsen for TiDB, but Jepsen has some problems for us:

Clojure: Jepsen uses Clojure, a functional programming language running on the JVM. Although Clojure is powerful, I am not familiar with it and the same for most of my colleagues too. So it is hard for us to maintain TiDB in Jepsen with Clojure.

OOM: If we run Jepsen for a long time, it may cause OOM easily.

So I want to use Go language to build another linearizability testing framework (Nearly all of my colleagues are very familiar with Go). But it is hard to write a linearizability checker, so I don’t start this work. Fortunately, I meet Porcupine, and then I write a simple linearizability testing framework named Chaos.

Hello Porcupine

Before I introduce Chaos, I will show you how to use Porcupine at first. If you want to learn more about Porcupine, you can see the great article http://www.anishathalye.com/2017/06/04/testing-distributed-systems-for-linearizability/.

Porcupine is based on the paper of Faster linearizability checking via P-compositionality. Sadly, I can’t understand the algorithm, if you master it, please teach me. :-)

Using Porcupine is easy. The Porcupine has two similar verify functions, CheckOperations and CheckEvents . I usually use Event, so I will use CheckEvents for the explanation later.

Event

The definition of Event is as below：

type Event struct {

Kind EventKind

Value interface{}

Id uint

}

Kind is the kind of the event, it has two types: Call and Return. In all our operations, we send a request at first (the Call event), then receive the response (the Return event).

is the kind of the event, it has two types: Call and Return. In all our operations, we send a request at first (the Call event), then receive the response (the Return event). Value is an interface, correspond to the real request or response value in our operations.

is an interface, correspond to the real request or response value in our operations. Id is the unique identification for the event. We must use a same ID for the request and response in one operation.

Assume we build a simple Key-Value store, only operating for one key. The definition of request and response is as follows:

type Request struct {

// 0 for read, 1 for put

Op int

// put value

Value int

} type Resposne struct {

// for read value

Value int

// for put ok

Ok bool

}

So, a event history may look like:

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }

Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 }

Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }

Event {EventKind: ReturnEvent, Value: Response { Ok: true }, Id: 2 }

At last, we will define a check model for our case and use CheckEvents to check the event list to know whether the history is linearizability or not.

Model

The definition of Model is as below:

type Model struct {

// Initial state of the system.

Init func() interface{}

// Step function for the system. Returns whether or not the system

// could take this step with the given inputs and outputs and also

// returns the new state. This should not mutate the existing state.

Step func(state interface{}, input interface{}, output interface{}) (bool, interface{})

}

We ignore some irrelevant field functions, only focus two functions: Init and Step .

Init returns the initialized state. E.g., we can return 5 as the initialized value for the key.

returns the initialized state. E.g., we can return 5 as the initialized value for the key. Step transfers the state machine from one state to another state. Returning true means the state transferred successfully.

For our Key-Value store, Step may be as below:

func Step(state interface{}, input interface{}, output interface{}) (bool, interface{}) {

st := state.(int)

inp := input.(Request)

out := output.(Response)

// for read

if inp == 0 {

return st == out.Value, st

}



// for write

if !out.ok {

return true, st

}

return out.ok, inp.value

}

For read request, we can return true only when we read the same value with last state. For write request, any OK write will return the new state.

So for the previous event history, it is obviously linearizable. But if we have following event history:

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }

Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 }

Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }

Event {EventKind: ReturnEvent, Value: Response { Ok: true }, Id: 2 }

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 3 }

Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 3 }

The history is not linearizable, because before read operation of ID 3, the write operation of ID 2 has already written a new value.

Timeout

All above examples are easy, but in the real distributed environment, we may have three status for one operation: success, failure or timeout.

E.g., a write operation is timeout, we can’t know whether it writes successfully or not. For this operation, we have to let Porcupine think it will be finished at any point between the invoke time and infinite time. If we meet a timeout return event, we should reverse the associated call event and add a return event in the end of the history, only in this way can Porcupine handle correctly. For more information, you can see issue 1.

In the above example, we can add an Unknown field in the response. For an unknown response, it should return true. So for write, it is return out.Ok || out.Unknown, inp.Value .

We will add the unknown response to the end of the event history, like:

Event {EventKind: CallEvent, Value: Request { Op: 0 }, Id: 1 }

Event {EventKind: CallEvent, Value: Request { Op: 1, Value: 10 }, Id: 2 }

Event {EventKind: ReturnEvent, Value: Response { Value: 5 }, Id: 1 }

Event {EventKind: ReturnEvent, Value: Response { Unknown: true }, Id: 2 }

If we don’t do it, Porcupine will think the history is not linearizable.

Chaos Architecture

I talk about how to use Porcupine above, now I will introduce the linearizability testing framework: Chaos.

Similar to Jepsen, Chaos also runs DB service on 5 nodes. We name node from n1 to n5, and can connect the node with the name directly, E.g., we can use ssh n1 to login to node n1.

Chaos has a control node named controller, to manage whole cluster, including initializing the testing DB, creating the associated client to run the real test, starting nemesis to disturb the cluster and verifying whether the event history is linearizable or not.

The architecture is as below:

Unlike Jepsen which uses SSH to execute remote command on the node, Chaos runs an agent on every node and let controller communicate with agent through HTTP API to execute remote command on the node. The reason is I just want to use Go functions to implement the DB and nemesis, not through Linux command like Jepsen.

But we have to start the agent on the node at first, this may cause a little un-convenient for use. Luckily, we can use scripts to make it easy.

Because Go is a static language, we need to implement the interfaces and register to Chaos if we want to test our own DB linearizability. I will use TiDB for example.

DB

If we want to test our own DB, we should first implement the DB interface:

type DB interface {

SetUp(ctx context.Context, nodes []string, node string) error

TearDown(ctx context.Context, nodes []string, node string) error

Start(ctx context.Context, node string) error

Stop(ctx context.Context, node string) error

Kill(ctx context.Context, node string) error

IsRunning(ctx context.Context, node string) bool

Name() string

}

We will use SetUp to set up the DB cluster and use TeerDown to tear down the cluster. The name of other functions are obvious so I ignore explaining them. Name returns the name of the DB, the DB must be unique because we need to use the name to register the DB to Chaos. E.g., we use “tide” here.

The argument nodes contains all nodes, usually it is ["n1", ... "n5"] . node is the name of the node which the DB runs on.

In TiDB, we will download the TiDB binary, archive the binary to the specified directory, update the configuration and then start the TiDB cluster in SetUp . We will send kill to kill the cluster in TearDown .

After we implements our own DB, we will use RegisterDB to register the DB, and then we can use the DB name to get the DB instance in the agent.

Client

Client is used to communicate with the DB on the controller. The definition of client interface is as below:

type Client interface {

SetUp(ctx context.Context, nodes []string, node string) error

TearDown(ctx context.Context, nodes []string, node string) error

Invoke(ctx context.Context, node string, request interface{}) interface{}

NextRequest() interface{}

}

The client uses Invoke to send command to the DB, the argument for the command request and the response of the command are all an interface type because we don’t know the real type here.

The NextRequest returns the next request to be invoked, and only the client can know how to construct the request.

In TiDB bank test case, we define a bank client, and in the NextRequest , we will choose reading all accounts’ salaries or transferring some money from one account to another.

If the NextRequest returns a read request, the response is the read salaries. If returns a transfer request, the response is OK or not. We also need to consider the unknown status in the response.

The controller will create an independent client for each node, so Chaos defines a client creator interface to create multiply clients, see:

type ClientCreator interface {

Create(node string) Client

}

Linearizability check

For every client invoke, the controller will record the request and the response to a event history file. The request and response event belong to one operation.

For simplicity, we use JSON format to encode the request and response and write them to the history file line by line. After the test finished, we need to implement a record parser to parse the history file and to verify the history linearizability.

type RecordParser interface {

OnRequest(data json.RawMessage) (interface{}, error)

OnResponse(data json.RawMessage) (interface{}, error)

OnNoopResponse() interface{}

}

When the parser reads one record, it will first detect the record is request or response, then call the corresponding function to decode.

Please notice the OnNoopResponse here. If we meet an unknown response, we need to return nil in the OnResponse function to let Chaos ignore this event and use OnNoopResposne to generate a response to complete the previous operation.

We also need to define our own Porcupine Model and use VerifyHistory(historyFile string, m porcupine.Model, p RecordParser) to verify the history.

In the TiDB bank test case, the most important Step function of the Porcupine Model is defined as follows:

Step: func(state interface{}, input interface{}, output interface{}) (bool, interface{}) {

st := state.([]int64)

inp := input.(bankRequest)

out := output.(bankResponse) if inp.Op == 0 {

// read

ok := out.Unknown || reflect.DeepEqual(st, out.Balances)

return ok, state

} // for transfer

if !out.Ok && !out.Unknown {

return true, state

} newSt := append([]int64{}, st...)

newSt[inp.From] -= inp.Amount

newSt[inp.To] += inp.Amount

return out.Ok || out.Unknown, newSt

}

For read operation, we just check whether the balances is the same with last state’s. For transfer operation, we need to do a transfer based on the last state and return the new state.

Nemesis

when the controller executes the tests, it can also run some nemeses to disturb the cluster, E.g., killing all DB or dropping network packages from one node.

The definition of the nemesis interface is as below:

type Nemesis interface {

Invoke(ctx context.Context, node string, args ...string) error

Recover(ctx context.Context, node string, args ...string) error

Name() string

}

The name of the nemesis must be unique because it must be registered to Chaos.

We use Invoke to disturb the cluster and use Recover to do the recovery. We also need to use RegisterNemesis to register our own nemesis, so that we can use it through name in the agent.

On the controller side, we need to implement a NemesisGenerator：

type NemesisGenerator interface {

Generate(nodes []string) []*NemesisOperation

Name() string

}

The Generate function will generate a nemesis operation for each node. The nemesis operation contains the name and the arguments of the nemesis, and the running time. The controller will send the nemesis operation to the agent and let the agent to execute the associated nemesis.

Agent and Controller

After we define our own DB, client, and nemesis, we need to combine them all. We need to register DB and nemesis in the agent, E.g, for TiDB in the cmd/agent/main.go file, the register code is:

// register nemesis

_ "github.com/siddontang/chaos/pkg/nemesis" // register tidb

_ "github.com/siddontang/chaos/tidb"

Then we start the agent and use NewController to create a controller.

The controller needs a ClientCreator and a list of nemesis generator. The configuration contains the DB name, the total number of the request and the total running time, etc.

Then we can start the controller, run the test and verify the history file.

Epilogue

Chaos is in the very early stage now, and needs to be improved gradually. Now I only supply a TiDB bank test case and will introduce more test cases for TiDB. It is very appreciated that you can help me add more test cases.

chaos: https://github.com/siddontang/chaos