Highly available reads

Multiple racks and multiple data centers provide high availability. A client can connect to any node to read the data. Similar to writes, a node serves the read request if it owns the data, otherwise it forwards the read request to the data owning node in the same rack. Dynomite clients can fail over to replicas in remote racks and/or data centers in case of node, rack, or data center failures.

Pluggable Datastores

Dynomite currently supports Redis and Memcached, thanks to the TwitterOSS Twemproxy project. For each of the data stores, based on our usage experience, a pragmatic subset of the most useful Redis/Memcached APIs are supported. Support for additional APIs will be added as needed in the near future.

Standard open source Memcached/Redis ASCII protocol support

Any client that can talk to Memcached or Redis can talk to Dynomite — no change needed. However, there will be a few things missing, including failover strategy, request throttling, connection pooling, etc., unless our Dyno client is used (more details to in the Client Architecture section).

Scalable I/O event notification server

All incoming/outgoing data traffic is processed by a single threaded I/O event loop. There are additional threads for background or administrative tasks. All thread communications are based on lock-free circular queue message passing, and asynchronous message processing.

This style of implementation enables each Dynomite node to handle a very large number of client connections while still processing many non-client facing tasks in parallel.

Peer-to-peer, and linearly scalable

Every Dynomite node in a cluster has the same role and responsibility. Hence, there is no single point of failure in a cluster. With this advantage, one can simply add more nodes to a Dynomite cluster to meet traffic demands or loads.

Cold cache warm-up

Currently, this feature is available for Dynomite with the Redis datastore. Dynomite can help to reduce the performance impact by filling up an empty node or nodes with data from its peers.

Asymmetric multi-datacenter replications

As seen earlier, a write can be replicated over to multiple datacenters. In different datacenters, Dynomite can be configured with different number of racks with different number of nodes. This helps greatly when there are unbalanced traffic into different datacenters.

Internode communication and Gossip

Dynomite with built-in gossip helps to maintain cluster membership as well as failure detection and recovery. This simplifies the maintenance operations on Dynomite clusters.

Functional in AWS and physical datacenter

In AWS environment, a datacenter is equivalent an AWS’ region and a rack is the same as an AWS’ availability zone. At Netflix, we have more tools to support running Dynomite clusters within AWS but in general, both deployments in these two environments should be similar.

Client Architecture

Dynomite server implements the underlying datastore protocol and presents that as its public interface. Hence, one can use popular java clients like Jedis, Redisson and SpyMemcached to directly speak to Dynomite.

At Netflix, we see the benefit in encapsulating client side complexity and best practices in one place instead of having every application repeat the same engineering effort, e.g., topology-aware routing, effective failover, load shedding with exponential backoff, etc.

Dynomite ships with a Netflix homegrown client called Dyno. Dyno implements patterns inspired by Astyanax (the Cassandra client at Netflix), on top of popular clients like Jedis, Redisson and SpyMemcached, to ease the migration to Dyno and Dynomite.

Dyno Client Features

Connection pooling of persistent connections — this helps reduce connection churn on the Dynomite server with client connection reuse.

Topology aware load balancing (Token Aware) for avoiding any intermediate hops to a Dynomite coordinator node that is not the owner of the specified data.

Application specific local rack affinity based request routing to Dynomite nodes.

Application resilience by intelligently failing over to remote racks when local Dynomite rack nodes fail.

Application resilience against network glitches by constantly monitoring connection health and recycling unhealthy connections.

Capability of surgically routing traffic away from any nodes that need to be taken offline for maintenance.

Flexible retry policies such as exponential backoff etc

Insight into connection pool metrics

Highly configurable and pluggable connection pool components for implementing your advanced features.

Here is an example of how Dyno does failover to improve app resilience against individual node problems.

Fun facts

Dyno client strives to maintain compatibility with client interfaces like Jedis, which greatly reduces the barrier for apps that are already using Jedis when performing a switch to Dynomite.

Also, since Dynomite implements both Redis and Memcached protocols, one can use Dyno to directly connect to Redis/Memcached itself and bypass Dynomite (if needed). Just switch the connection port from Dynomite server port to the redis server port.

Having a layer of indirection with our own homegrown client gives Netflix the flexibility to do other cool things such as

Request interception — you should be able to plug in your own interceptor to do things such as

Implement query trace or slow query logging. Implement fault injection for testing application resilience when things go south server side.

Micro batching — submitting a batch or requests to a distributed db gets tricky since different keys map to different servers as per the sharding/hashing strategy. Dyno has the capability to take a user submitted batch, split it into shard aware micro-batches under the covers, execute them individually and then stitch the results back together before getting back to the user. Obviously one has to deal with partial failure here, and Dyno has the intelligence to retry just the failed micro-batch against the remote rack replica responsible for that hash partition.

— submitting a batch or requests to a distributed db gets tricky since different keys map to different servers as per the sharding/hashing strategy. Dyno has the capability to take a user submitted batch, split it into shard aware micro-batches under the covers, execute them individually and then stitch the results back together before getting back to the user. Obviously one has to deal with partial failure here, and Dyno has the intelligence to retry just the failed micro-batch against the remote rack replica responsible for that hash partition. Load shedding — Dyno’s interceptor model for every request will give it the ability to do quota management and rate limiting in order to protect the backend Dynomite servers.

Linear scale test

We wanted to ensure that Dynomite could scale horizontally to meet traffic demands from hundreds of micro-services at Netflix as the company expands its global footprint.

We conducted a simple test with a static Dynomite cluster of size 6 and a load test harness that uses dyno client. The cluster was configured to have replication factor of 3 i.e it was a single data center with 3 racks.

We ramped up requests against the cluster while ensuring that 99 percentile latencies were still in the single digit ms range.

We then scaled up both server fleet and client fleet proportionally and repeated the test. We went through a few of cycles of scaling i.e 6 -> 12 -> 24 and at each stage we recorded the sustained throughput where the avg and 99 percentile latencies were within acceptable range.

i.e < 1ms for avg latency and 3–6 ms for 99 percentile latency.

We saw that Dynomite scales linearly as we add more nodes to the cluster. This is critical for a datastore at Netflix where we want surgical control on throughput and latency with a predictable cost model. Dynomite enables just that.

Long Term Vision & Roadmap

Dynomite has the potential to offer server-based sharding and replication for any datastore, as long as a proxy is created to intercept the desired API calls.

This initial version of Dynomite, supports Redis and Memcahed sharding and replication in clear text, backups and restore. In the next few weeks, we will be implementing encrypted inter-datacenter communication. We also have plans to implement reconciliation (repair) of the cluster’s data and support different read/write consistency setting, making this an eventually consistent datastore.

On the Dyno client side we plan on adding other cool features such as load shedding, distributed pipelining and micro-batching. We are also looking at integrating with RxJava to provide a reactive API to Redis/Memcached which will enable apps to observe sequences of data and events.

— Minh Do, Puneet Oberai, Monal Daxini & Christos Kalantzis

See Also: