An odd number of nodes establishes an high-availability etcd cluster. When the different number of nodes starts to fail, what is the behavior of the cluster? It’s well known when the quorum is lost, that is (n/2+1) nodes fails, the cluster is not able to receive any request of write based on the raft consensus algorithm. Can the cluster still serve the request of read?

Testing is the best answer. Given the tools like Multipass, set up an etcd cluster is no longer a luxury task on a laptop.

Setup a 5 nodes etcd cluster

I am not going to test it on the K3s with etcd operator as I need to start/stop the etcd node manually for testing purpose. Let's create 5 VMs.

multipass launch --name etcd0 --cpus 1 --mem 1G --disk 5G

multipass launch --name etcd1 --cpus 1 --mem 1G --disk 5G

multipass launch --name etcd2 --cpus 1 --mem 1G --disk 5G

multipass launch --name etcd3 --cpus 1 --mem 1G --disk 5G

multipass launch --name etcd4 --cpus 1 --mem 1G --disk 5G

Then for each of the VM, download and install the etcd,

curl -LO https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz tar zxvf etcd-v3.3.13-linux-amd64.tar.gz

cd etcd-v3.3.13-linux-amd64

sudo cp etcd etcdctl /usr/local/bin/

Create a systemd service file as below,

Replace the IP address and name accordingly based on the nodes. (I have automated this as a Magefile task)

The {{ .members }} is a list of the etcd cluster members such as etcd0=http://192.168.64.8:2380,etcd1=http://192.168.64.9:2380,etcd2=http://192.168.64.10:2380,etcd3=http://192.168.64.11:2380,etcd4=http://192.168.64.12:2380

Start the service

sudo systemctl daemon-reload

sudo systemctl enable etcd

sudo systemctl start etcd

Check the etcd status

multipass shell etcd0

multipass@etcd0:~$ etcdctl endpoint status -w table multipass@etcd0:~$ export ETCDCTL_API=3 && export ETCDCTL_ENDPOINTS= http://192.168.64.8:2379,http://192.168.64.9:2379,http://192.168.64.10:2379,http://192.168.64.11:2379,http://192.168.64.12:2379 multipass@etcd0:~$ etcdctl endpoint status -w table

The result is shown below. 5 member cluster is healthy with the 1st node as the leader.

Testing node failure

Before we start the testing, let's add a key into the etcd

etcdctl put /clock "$(date)" etcdctl get /clock

/clock

Sun May 12 21:38:58 +08 2019

Stop the etcd on the last node, etcd4,

multipass exec etcd4 — sudo systemctl stop etcd

Check the status, the leader is still there.

Continue to stop the etcd service on node etcd3 and etcd2.

When there are only 2 nodes left, the quorum is lost, no leader presents. Then the write action fails as expected.

multipass@etcd0:~$ etcdctl put /clock "$(date)"

Error: context deadline exceeded multipass@etcd0:~$ etcdctl get /clock

Error: context deadline exceeded

The read activity could not proceed either.

Linearizable Read of etcd Raft

Based on the Raft documentation, all the read activities are linearizable, and the implementations all interact with the leader to make sure the data retrieved are the most recent.

Therefore, when the quorum is lost, both read and write activity in etcd cannot be performed properly.