etcd 3.2 now with massive watch scaling and easy locks

• By Anthony Romano

The etcd team is pleased to announce etcd 3.2.0, the latest feature release in the 3.x series. This edition has proxy improvements, boosted backend concurrency, distributed coordination services, a slimmer Go client, JWT authentication, and more.

This post showcases a few new etcd capabilities, calling out relevant bits and pieces along the way. First, there’s multi-tenancy: the new namespacing client and proxy provide isolated complete keyspaces. Next, for scaling, the gRPC proxy can now serve a million events per second. Finally, to better coordinate all kinds of systems, this release introduces new concurrency RPC services with built-in distributed locks and elections that are easily accessible through any gRPC client.

Namespaces

Applications sharing an etcd cluster avoid corrupting each other by keeping to their own keys. Although etcd’s authentication mechanism protects separate users’ keys with permissions, it does nothing to stop potential collisions between keys sharing the same name. Instead, applications often take a prefix argument which is then prepended to all the application’s keys’ names, effectively namespacing the keys. Since correctly implementing this prefixing can be tedious and error-prone, etcd now includes both client namespacing and proxy namespacing as off-the-shelf reusable components.

Namespaces organize etcd keys into separate complete keyspaces. A namespace is a view of all keys under a prefix. A namespaced key name is prefixed when viewed through the base etcd keyspace, but unprefixed when viewed through the namespace. Likewise, any etcd client connecting to a namespaced etcd proxy (optionally masquerading as separate clusters to avoid exposing core endpoints), can only access keys under the proxy’s namespace prefix. For example, in the etcd topology illustrated below, a client accesses etcd through a proxy “/app-A/”, and the proxy translates a request for a key “abc” into “/app-A/abc”. Similarly, “app-B” only accesses its “abc” through “/app-B/abc”; the namespaces “/app-A/” and “/app-B/” are isolated.

An example etcd topology with two namespace proxies

Namespace configuration is a breeze. The example’s topology can be tested with the following commands:

# start etcd $ etcd & # launch namespace proxies for /app-A/ and /app-B/ $ etcd grpc-proxy start --endpoints=http://localhost:2379 \ --listen-addr=127.0.0.1:23790 \ --namespace=/app-A/ & $ etcd grpc-proxy start --endpoints=http://localhost:2379 \ --listen-addr=127.0.0.1:23791 \ --namespace=/app-B/ & # write to /app-A/ and /app-B/ $ ETCDCTL_API=3 etcdctl --endpoints=http://localhost:23790 put abc a $ ETCDCTL_API=3 etcdctl --endpoints=http://localhost:23791 put abc z # confirm the different keys were written $ ETCDCTL_API=3 etcdctl --endpoints=http://locahost:2379 --prefix /app

A million events per second

Any change to a key causes etcd to immediately broadcast the event to all interested watchers. For a large etcd deployment, tens of thousands of concurrent watchers should be no surprise. Unfortunately, these watchers aren’t free.

Too many watchers can overload etcd. The graph below illustrates watch overloading with a tiny etcd instance serving rising watchers on a continuously updating single shared key. The etcd server is a weak n1-standard-1 (1vCPU + 3.75GB memory) machine while clients used a more powerful n1-standard-32 (32vCPU + 120GB memory) machine to issue 500 writes/second and saturate the server with watches. Adding more watchers eventually causes the write rate to drop and the event rate to fall short of the ideal.

Overloading an etcd server with many watchers on one key

etcd’s gRPC proxy rebroadcasts events from one server watcher to many client watchers. Each proxy coalesces related incoming client watchers into a single etcd server watcher. The proxy fans out the coalesced watcher events to many clients. These clients share one server watcher; the proxy effectively offloads resource pressure from the core cluster.

By adding proxies, etcd can serve one million events per second, as shown below. Using the same testbed as before, for every proxy added on the cluster, 100 new watches attach to that proxy. Each proxy ran on its own n1-standard-1 like the etcd instance. Given 20 proxies, watches linearly reach a million events per second, as expected, and without overloading an etcd cluster to fewer than 500 writes per second.

Event throughput increases by increasing watch proxies

You can run similar experiments locally with the etcd benchmark tool. Here’s an example of measuring watch latency for 100 connections distributed over three proxies:

$ etcd & $ etcd grpc-proxy start --endpoints=http://localhost:2379 --listen-addr=127.0.0.1:23790 & $ etcd grpc-proxy start --endpoints=http://localhost:2379 --listen-addr=127.0.0.1:23791 & $ etcd grpc-proxy start --endpoints=http://localhost:2379 --listen-addr=127.0.0.1:23792 & $ benchmark watch-latency \ --clients=100 --conns=100 --endpoints=http://localhost:23790,http://localhost:23791,http://localhost:23792

Distributed coordination services

One advantage of a consistent distributed key value store like etcd is it can coordinate and synchronize distributed systems. Typical primitives for this coordination tend to be distributed shared locks and leadership elections. In etcd 3.2, both distributed shared locks and elections are exported as RPC services, greatly simplifying distributed coordination while also improving performance in high latency environments.

Early development of the etcd v3 API involved writing distributed recipes to “kick the tires.” Judging by that experience, efficient coordination algorithms are non-trivial; expecting every etcd language binding to implement a good locking protocol is asking too much. Usually a third-party etcd binding, if it has locks at all, supplies a simple custom lock that spin-waits (with sleeps calls to rate limit requests), making the contended path both unfair and slower than necessary. Since etcd’s gRPC protocol promises easy portability, providing efficient locks as a service is the next logical step.

A nice benefit from server-side locks is less network chatter. A client-side lock must open a watch and wait for a response to know when it acquires ownership if another process already holds the lock. With a lock RPC, on the other hand, the request completes with a single round trip and therefore suffers less from poor network latency; if the lock is already held, etcd internally dispatches the watch, and the RPC only returns after acquiring the lock. The graph below shows the effect of network latency on client locks and RPC locks. The RPC locks have average lower latency, with the difference closing as network round-trip time drops.

Lock latency with increasing contention and latencies (lower is better)

The RPCs reuse etcd’s client-side coordination code for cross-compatibility. To wire the etcd server back to itself as a client, the services use a new embedded client that maps the etcd server’s internals to an etcd client. For the server to claim client sessions, there’s a new session resuming feature for building sessions with pre-existing leases. As a result of this careful code reuse, both server-side RPC locks and elections behave the same as etcd’s client-side locks and client-side elections.

The etcdctl command line utility offers quick way to get started with etcd’s locks. Here’s a simple example that concurrently increments a file f in the shell; the locking ensures everything adds up:

$ echo 0 >f $ for `seq 1 100`; do ETCDCTL_API=3 etcdctl lock mylock -- bash -c 'expr 1 + $(cat f) > f' & pids="$pids $!" done $ wait $pids $ cat f

The lock service endpoint accepts JSON requests through etcd’s grpc gateway. Here’s an example demonstrating acquiring a lock with JSON and waiting for the lock to be released:

$ lid=$(ETCDCTL_API=3 etcdctl -w fields lease grant 5 | grep '"ID"' | awk ' { print $3 } ') # acquire lock by name "mylock" (base64 encoded) using JSON $ curl localhost:2379/v3alpha/lock/lock -XPOST -d"{\"name\":\"bXlsb2Nr\", \"lease\" : $lid }" >/dev/null $ date # lease expires in five seconds, unlocking "mylock" and letting etcdctl acquire the lock $ ETCDCTL_API=3 etcdctl lock mylock date

Learn more

The latest and greatest etcd developments can be found in the etcd github repository. The project also hosts signed binaries for 3.2.0 and historical releases on the etcd release page. The github repository also has the most up-to-date etcd documentation for operating etcd clusters and developing etcd applications.

As always, the etcd team is committed to building the best distributed consistent key value store; feel free to report any bugs, ask questions, or make suggestions on the etcd issue tracker.

This release of etcd will of course be included in future versions of Tectonic. If you are interested in learning about the power of distributed computing based on etcd, Kubernetes, and other technologies from CoreOS, you can try Tectonic for free.