Consistent Hashing: A Powerful Technique for Distributed Systems
Consistent Hashing is a technique used in distributed systems to distribute data across multiple servers while ensuring high availability and scalability. It works by mapping keys or requests to a fixed set of virtual nodes, which are then mapped to the actual nodes in the system.
The basic idea behind Consistent Hashing is to use a hash function to map keys to a fixed set of virtual nodes on a ring, also known as a hash ring. The hash function outputs a value that is used to determine the position of the virtual node on the ring. Each virtual node is assigned a portion of the ring, which represents a portion of the key space.
In the diagram below, I have designed a simple example of a hash ring with six virtual nodes and their corresponding positions on the ring.
Once the virtual nodes are mapped to the ring, keys or requests are also mapped to a position on the ring using the same hash function. The node responsible for serving the key or request is the node that comes after the virtual node on the ring. For example, if a key map to position 35 on the ring, the node responsible for serving that key is Node R2.
The beauty of Consistent Hashing is that it allows the system to scale horizontally without incurring a significant amount of re-mapping overhead. When a new node is added to the system, only the keys assigned to the virtual nodes closest to the new node’s position on the ring need to be remapped. Similarly, when a node is removed from the system, only the keys assigned to the virtual nodes that were owned by that node need to be remapped.
Suppose I have two requests, Req1 and Req2, that need to be served by a load balancer. Using consistent hashing, I can map these requests to positions on the ring and route them to the appropriate server.
In the diagram below, you can see that Req1 hashes to position 70 on the ring, which corresponds to server C. Similarly, Req2 hashes to position 35 on the ring, which corresponds to server B.
One of the key benefits of Consistent Hashing is that it can handle server failures and additions gracefully. When a server fails or is removed from the system, only the keys assigned to the virtual nodes that were owned by that server need to be remapped. When a server is added to the system, only the keys assigned to the virtual nodes closest to the new server’s position on the ring need to be remapped.
The diagram below shows an example of what happens when server B fails. As a result, the keys assigned to virtual nodes R2 and R3 have to be remapped. Req1, which was originally mapped to server C, now maps to server D, as this is the next server in the ring after virtual node R3. Similarly, Req2, which was originally mapped to server B, now maps to server A because it is the next server on the ring after virtual node R1.
Consistent Hashing is widely used in distributed systems, particularly in large-scale web applications, distributed caches, and databases. By ensuring that keys or requests are evenly distributed across servers, Consistent Hashing can improve the availability, reliability, and performance of the system.
In summary, Consistent Hashing is a powerful technique for distributing data across multiple servers while ensuring high availability and scalability. By mapping keys or requests to a fixed set of virtual nodes on a ring, Consistent Hashing can distribute requests evenly across servers, handle server failures gracefully, and scale horizontally without incurring a significant amount of re-mapping overhead.