zetcd: running ZooKeeper apps without ZooKeeper

• By Anthony Romano

Distributed systems commonly rely on a distributed consensus to coordinate work. Usually the systems providing distributed consensus guarantee information is delivered in order and never suffer split-brain conflicts. The usefulness, but rich design space, of such systems is evident by the proliferation of implementations; projects such as chubby, ZooKeeper, etcd, and consul, despite differing in philosophy and protocol, all focus on serving similar basic key-value primitives for distributed consensus. As part of making etcd the most appealing foundation for distributed systems, the etcd team developed a new proxy, zetcd, to serve ZooKeeper requests with an unmodified etcd cluster.

ZooKeeper is the first popular open source software in this vein, making it the preferred backend for many distributed systems. These systems would conceptually work with etcd as well, but they don’t in practice for historical reasons. An etcd cluster can’t drop-in for ZooKeeper; etcd’s data model and client protocol is incompatible with ZooKeeper applications. Neither can ZooKeeper applications be expected to natively support etcd; if the system already works, there’s little motivation to further complicate it with new backends. Fortunately, the etcd v3 API is expressive enough to emulate ZooKeeper’s data model client-side with an ordinary proxy: zetcd, a new open source project developed by the etcd team. Today marks zetcd’s first beta release, v0.0.1, setting the stage for managing and deploying zetcd in production systems.

The zetcd proxy sits in front of an etcd cluster and serves an emulated ZooKeeper client port, letting unmodified ZooKeeper applications run on top of etcd. At a high level, zetcd ingests ZooKeeper client requests, fits them to etcd’s data model and API, issues the requests to etcd, then returns translated responses back to the client. The proxy’s performance is competitive with ZooKeeper proper and simplifies ZooKeeper cluster management with etcd features and tooling. This post will show how to use zetcd, how zetcd works, and share some performance benchmarks.

Getting started with zetcd

All zetcd needs to get running is a go compiler, an internet connection to fetch the source code, and a system that can run etcd. The following example will build zetcd from source and run a few ZooKeeper commands against it. This is not suggested for serious deployments due to building etcd and zetcd from development branches, but it’s the simplest way to give it a try.

First, get the source and build the binaries for etcd and zetcd:

go get github.com/coreos/etcd/cmd/etcd go get github.com/coreos/zetcd/cmd/zetcd

Next, run etcd and connect zetcd to the etcd client endpoint:

# etcd uses localhost:2379 by default etcd & zetcd -zkaddr localhost:2181 -endpoints localhost:2379 &

Try zetcd by watching and creating a key:

go install github.com/coreos/zetcd/cmd/zkctl zkctl watch / & zkctl create /abc "foo"

Conceptually, the example is organized as a zetcd layer on top of a single etcd instance:

A simple zetcd server topology

So what is the zetcd layer doing?

ZooKeeper into etcd3

Under the hood, zetcd translates ZooKeeper’s data model to fit etcd APIs. For key lookup, zetcd converts ZooKeeper’s hierarchical directories to etcd’s flat binary keyspace. For managing metadata, zetcd leverages transactional memory to safely and atomically update ZooKeeper znode information when writing to the etcd backend.

ZooKeeper lists keys by directory (getChildren), whereas etcd lists keys by interval (Range). The figure below illustrates how zetcd encodes keys in etcd to support efficient directory listing. All zetcd keys in etcd have a prefix including directory depth (e.g., “/” and “/abc/” have depths of 0 and 1 respectively). To list a directory, zetcd issues a prefix range request (e.g., the range [“/zk/key/002/abc/”, “/zk/key/002/abc0”) for listing /abc/) matching all keys with the directory’s depth and path. The depth limits keys to the directory itself; if zetcd used the path as a prefix without the depth, then all keys under the directory, instead of only its immediate children, would be returned by etcd and dropped by the proxy.

Organization of a ZooKeeper key hierarchy in etcd

Each ZooKeeper key carries metadata in its ZNode about the key’s revision, version, and permissions. Although etcd also has per-key metadata, that metadata is simpler than a ZNode: there’s no children versioning since there are no directories, no ACLs since etcd uses role based authentication, and no timestamps since real clocks are out of scope. This extra metadata maps to a bundle of keys (see figure above) that describes a full ZNode. To adjust the metadata, zetcd updates subsets of the keys atomically with software transactional memory, keeping the ZNodes consistent without expensive locking.

Additionally, zetcd can dynamically validate its behavior against an authentic ZooKeeper server. To compare, zetcd connects to both etcd and an external ZooKeeper server. When a client issues a request to zetcd in this mode, the request is dispatched to both zetcd and ZooKeeper servers. If the two server responses semantically disagree, zetcd flags the responses with a cross-checking warning.

Microbenchmarks

With all the data translation and the additional network hop, it may be easy to dismiss the emulation as impractical. Although there is some additional cost over a pure ZooKeeper or etcd cluster, zetcd holds an advantage when an etcd installation is available but there’s some application that expects a ZooKeeper for coordination. For example, early user reports claim encrypting traffic in zetcd through etcd’s TLS is simpler than encrypting a similar classic ZooKeeper configuration. In these cases, performance is less important than simply having a reliable cluster that speaks the ZooKeeper protocol.

Benchmarking with zetcd’s command line zkboom utility can help judge whether a zetcd installation’s performance is adequate. The interface and reports are similar to etcd’s benchmark tool. Other ZooKeeper benchmarking tools should work with zetcd as well; zkboom is provided for convenience. To try it out, run zkboom to test key creation:

go get github.com/coreos/zetcd/cmd/zkboom zkboom --conns=50 --total=10000 --endpoints=localhost:2181 create

zetcd should provide adequate performance for small workloads. Latency microbenchmarks over a simple two-node configuration indicate zetcd’s emulation is acceptable for modest request rates. The setup included two modern Linux machines connected through a gigabit switch, with one machine running the proxy and server software over a spinning disk RAID and the other machine generating client requests. Latencies were measured with zkboom by creating and reading 128 byte key-value pairs from an initially empty key store, rate limiting to 2500 requests per second, and increasing total concurrent clients. ZooKeeper 3.4.10 and etcd development branch results are included as a basis for comparison.

The graph below shows zetcd’s average key creation latency over client concurrency. Since etcd has a latency advantage over ZooKeeper between 5ms and 35ms for this benchmark, zetcd has some headroom to accommodate the proxy hop and processing. The zetcd proxy still underperforms ZooKeeper by a margin of about 20ms, but judging from throughput data, it is not queuing since it sustains the 2500 request rate. One explanation zetcd’s slower writes is that it must both read keys from etcd and write several keys into etcd for each new ZooKeeper key due to data model differences.

zetcd’s average key creation latency over client concurrency (lower is better)

The graph below shows zetcd’s average key fetch latency over client concurrency. ZooKeeper’s fetch latency is slightly faster than etcd, by about 2ms, so zetcd would need further etcd improvements before possibly serving data faster than ZooKeeper. However, the zetcd latency hit only adds about 1.5ms latency over etcd key fetches, despite requesting extra keys from etcd to emulate ZooKeeper znode metadata. The zetcd key fetch operation only costs a single round-trip since the read requests are bundled into one etcd transaction.

zetcd’s average key fetch latency over client concurrency (lower is better)

Toward v0.0.1 and beyond

So far zetcd has promising results. The performance is reasonable, easily sustaining more than a thousand operations per second with acceptable latency. Its emulation is close enough to ZooKeeper be a drop-in replacement for Mesos, Kafka, and Drill. There’s still room to tune zetcd for performance gains. Likewise, testing more ZooKeeper applications will further pose zetcd as a replacement for ZooKeeper servers.

zetcd’s been available to the open source community since October and has just pushed its first tagged release, zetcd v0.0.1. By marking its first beta release, zetcd is ready for stable management and deployment in future production systems. When paired with the etcd operator, these systems running zetcd will effectively have a self-driving “ZooKeeper” cluster with automated backend upgrades, backups, and TLS management. To learn more, ask questions, or request improvements visit the zetcd GitHub at https://github.com/coreos/zetcd/.

Join us in person at CoreOS Fest, the Kubernetes and distributed systems conference, on May 31 and June 1 in San Francisco. Register here to attend two days of talks from the community on the latest developments in the open source container ecosystem.