Distributed Hash Tables — Demystified

A Mini Guide to Distributed Hash Tables (DHT)

Steffi Keran
4 min readMar 27, 2021
DHT
Image courtesy: Wikipedia

What is a Distributed Hash Table?

The core of a DHT is a hash table. Key-value pairs are stored in DHT and a value can be looked up with a key. The keys are unique identifiers to values that can range from blocks in a blockchain to addresses and to documents.

What differentiates a DHT from a normal hash table is the fact that storage and lookup on DHT are distributed across multiple (can be millions) nodes or machines. This very characteristic of DHT makes it look like distributed databases used for storage and retrieval. There is no master-slave hierarchy or a centralized control among the participating nodes. All the nodes are treated as peers.

DHT provides freedom to the participating nodes such that the nodes can join or leave the network anytime. Due to this reason, DHTs are widely used in Peer-to-Peer (P2P) networks. In fact, part of the motivation behind the research of DHT stems from its usage in P2P networks.

Characteristics of DHT

  1. Decentralized: Since there is no central authority or coordination
  2. Scalable: The system can easily scale up to millions of nodes
  3. Fault-tolerant: DHT replicates the data storage on all the nodes. Therefore, even if one node leaves the network, it should not affect other nodes in the network.

Functionalities of a DHT

A normal hash table supports the following 2 functions:

  1. put(key, value)
  2. get(key) : returns the value

DHT contrasts from a normal hash table in its storage, lookup, and retrieval strategies. Following are the additional options one gets with DHT:

  1. lookup(key) : returns IP address (Chord lookup service)
  2. send-RPC(IP address, put, key, data)
  3. send-RPC(IP address, get, key): returns data

How does lookup happen inside a DHT?

Circular Doubly linked list
A circular doubly-linked list

Let’s see how lookup happens in a popular DHT protocol like Chord. Consider a circular doubly-linked list of nodes. Each node has a reference pointer to the node previous as well as next to it. The node next to the node in question is called the successor. The node that is previous to the node in question is called the predecessor. Speaking in terms of a DHT, each node has a unique node ID of k bits and these nodes are arranged in the increasing order of their node IDs.

Assume these nodes are arranged in a ring structure called identifier ring. For each node, the successor has the shortest distance clockwise away. For most nodes, this is the node whose ID is closest to but still greater than the current node’s ID.

To find out the node appropriate for a particular key, first hash the key K and all the nodes to exactly k bits using consistent hashing techniques like SHA-1.

SHA-1
SHA-1 (Image courtesy: crypto.stackexchange.com)

Start at any point in the ring and traverse clockwise till you catch the node whose node ID is closer to the key K, but can be greater than K. This node is the one responsible for storage and lookup for that particular key.

lookup in DHT
Basic Lookup in DHT

In an iterative style of lookup, each node Q queries its successor node for KV (key-value) pair. If the queried node does not have the target key, it will return a set of nodes S that can be closer to the target. The querying node Q then queries the nodes in S which are closer to itself. This continues until either the target KV pair is returned or when there are no more nodes to query.

This lookup is very suitable for an ideal scenario where all the nodes have a perfect uptime. But how to handle scenarios when nodes leave the network either intentionally or by failure? This calls for the need for a robust join/leave protocol.

Popular DHT protocols and implementations

  1. Chord
  2. Kademlia
  3. Apache Cassandra
  4. Koorde
  5. TomP2P
  6. Voldemort

--

--