In these series of articles, I would be implementing a few famous papers related to distributed systems, primarily in Golang. (Read previous post on Consistent Hashing, using a Red-Black Tree)

This article is a simple implementation of the CHORD DHT paper, in Golang.

Introduction to DHT

Distributed Hash Table

A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.

DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative Web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent’s distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox instant messenger, Freenet, the YaCy search engine, and the InterPlanetary File System.

Properties of a DHT

DHTs characteristically emphasize the following properties:

Autonomy and decentralization: the nodes collectively form the system without any central coordination.

Fault tolerance: the system should be reliable (in some sense) even with nodes continuously joining, leaving, and failing.

Scalability: the system should function efficiently even with thousands or millions of nodes.

Examples of DHT protocol and Implementation: Apache Cassandra,BATON Overlay, CAN (Content Addressable Network), Pastry, Riak, Voldemort

What is CHORD?

In computing, Chord is a protocol and algorithm for a peer-to-peer distributed hash table. A distributed hash table stores key-value pairs by assigning keys to different computers (known as “nodes”); a node will store the values for all the keys for which it is responsible. Chord specifies how keys are assigned to nodes, and how a node can discover the value for a given key by first locating the node responsible for that key.

Chord is one of the four original distributed hash table protocols, along with CAN, Tapestry, and Pastry. It was introduced in 2001 by Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, and Hari Balakrishnan, and was developed at MIT. For more information, once can refer to the original paper.

Chord is based on consistent hashing, something which I’ve implemented in a previous article.

Implementation Details

The Chord protocol supports just one operation: given a key, it will determine the node responsible for storing the key’s value. Chord does not itself store keys and values, but provides primitives that allow higher-layer software to build a wide variety of storage system; CFS is one such use of the Chord primitive.

I’ll try to explain the basic implementation using the wiki article.

1. Basic Query

The core usage of the Chord protocol is to query a key from a client (generally a node as well), i.e. to find successor(k). The basic approach is to pass the query to a node’s successor, if it cannot find the key locally. This will lead to a O(N) query time where N is the number of machines in the ring. To avoid the linear search above, Chord implements a faster search method by requiring each node to keep a finger table containing up to m entries, recall that m is the number of bits in the hash key.

The i{th} entry of node n will contain successor((n+2^{i-1}) mod 2^m). The first entry of finger table is actually the node’s immediate successor (and therefore an extra successor field is not needed). Every time a node wants to look up a key k, it will pass the query to the closest successor or predecessor (depending on the finger table) of k in its finger table (the “largest” one on the circle whose ID is smaller than k), until a node finds out the key is stored in its immediate successor. With such a finger table, the number of nodes that must be contacted to find a successor in an N-node network is O(log N)

2. Node Join

Whenever a new node joins, three invariants should be maintained (the first two ensure correctness and the last one keeps querying fast):

Each node’s successor points to its immediate successor correctly. Each key is stored in successor(k) Each node’s finger table should be correct.

To satisfy these invariants, a predecessor field is maintained for each node.

As the successor is the first entry of the finger table, we do not need to maintain this field separately any more. The following tasks should be done for a newly joined node n:

Initialize node n (the predecessor and the finger table). Notify other nodes to update their predecessors and finger tables. The new node takes over its responsible keys from its successor.

The predecessor of n can be easily obtained from the predecessor of successor(n) (in the previous circle). As for its finger table, there are various initialization methods. The simplest one is to execute find successor queries for all m entries, resulting in O(M\log N) initialization time. A better method is to check whether i{th} entry in the finger table is still correct for the (i+1){th} entry. This will lead to O(log² N).

3. Stabilization

To ensure correct lookups, all successor pointers must be up to date. Therefore, a stabilization protocol is running periodically in the background which updates finger tables and successor pointers.

The stabilization protocol works as follows:

Stabilize(): n asks its successor for its predecessor p and decides whether p should be n‘s successor instead (this is the case if p recently joined the system).

Notify(): notifies n‘s successor of its existence, so it can change its predecessor to n

Fix_fingers(): updates finger tables/*

check_predecessor(): Periodically checks in predecessor is alive

Much of the details can be found in the paper. Here is a simple gif to demonstrate the working of the chord DHT.

As you can see, when one node is down, the keys from that node get transferred to the successor node.

Here is the code:

Feel free to comment and let me know if there were any mistakes/corrections.

References: