Skip to content

Commit 3793059

Browse files
authored
Merge pull request #13 from Ja7ad/feat/consistent_hashing
feat: add consistent hashing algorithm
2 parents c074aab + 6704e70 commit 3793059

File tree

5 files changed

+359
-0
lines changed

5 files changed

+359
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ been validated in terms of functionality and testing.
2424
| [Reservoir Sampling Algorithm L](./rs/README.md) | Optimized reservoir sampling for large `N`, reduces unnecessary replacements using skipping. |
2525
| [Weighted Reservoir Sampling](./rs/README.md) | Selects items with probability proportional to their weights using a heap-based approach. Used in recommendation systems and A/B testing. |
2626
| [Random Sort Reservoir Sampling](./rs/README.md) | Uses a min-heap and random priorities to maintain the top `k` elements in a streaming dataset. |
27+
| [Consistent Hashing](./ch/README.md) | Used by distributed systems (CDNs, databases) to evenly distribute requests across servers. |
2728

2829
## 🚀 Installation >= go 1.19
2930

ch/README.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Consistent Hashing
2+
3+
In computer science, consistent hashing is a special kind of hashing technique such that when a hash table is resized, only
4+
$\displaystyle n/m$ keys need to be remapped on average where $\displaystyle n$ is the number of keys and
5+
$\displaystyle m$ is the number of slots. In contrast, in most traditional hash tables, a change in the number of array
6+
slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation.
7+
8+
## Project used algorithm
9+
10+
- Couchbase automated data partitioning
11+
- OpenStack's Object Storage Service Swift
12+
- Partitioning component of Amazon's storage system Dynamo
13+
- Data partitioning in Apache Cassandra
14+
- Data partitioning in ScyllaDB
15+
- Data partitioning in Voldemort
16+
- Akka's consistent hashing router
17+
- Riak, a distributed key-value database
18+
- Gluster, a network-attached storage file system
19+
- Akamai content delivery network
20+
- Discord chat application
21+
- Load balancing gRPC requests to a distributed cache in SpiceDB
22+
- Chord algorithm
23+
MinIO object storage system
24+
25+
## 📊 **Mathematical Formula for Consistent Hashing**
26+
27+
### **Problem Definition**
28+
Given a set of `N` nodes and `K` keys, we need to distribute the keys among the nodes **such that minimal data movement is required** when nodes are added or removed.
29+
30+
### **Hash Ring Representation**
31+
1. We define a **circular space** from `0` to `M-1`, where `M = 2^m` for an `m`-bit hash function.
32+
2. Each **node** `n_i` is hashed using function `H(n_i)`, assigning it a position on the ring: $P(n_i) = H(n_i) \mod M$
33+
3. Each **key** `k_j` is hashed to the ring using the same function: $P(k_j) = H(k_j) \mod M$
34+
4. A **key is assigned to the first node encountered in the clockwise direction** from its position.
35+
36+
### **Mathematical Proof of Load Balancing**
37+
The expected number of keys per node is given by: $E[\text{keys per node}] = \frac{K}{N}$
38+
where:
39+
- `K` is the total number of keys.
40+
- `N` is the total number of nodes.
41+
42+
If a node **joins**, it takes responsibility for keys previously mapped to the **next node**, meaning only: $\frac{K}{N+1}$
43+
keys are affected, significantly reducing data movement compared to traditional hashing (`O(K)` movement).
44+
If a node **leaves**, its keys are reassigned to the **next available node**, again affecting only: $\frac{K}{N-1}$
45+
keys instead of `O(K)`.
46+
47+
### **Time Complexity**
48+
| Operation | Complexity |
49+
|-------------------|------------|
50+
| **Node Addition** | `O(K/N + log N)` |
51+
| **Node Removal** | `O(K/N + log N)` |
52+
| **Key Lookup** | `O(log N)` (Binary Search) |
53+
| **Add a key** | `O(log N)`|
54+
| **Remove a key** | `O(log N)` |
55+
56+
57+
58+
## 🧪 **Mathematical Test Case for Consistent Hashing**
59+
### **Test Case Design**
60+
To validate **Consistent Hashing**, we check:
61+
1. **Keys are evenly distributed** across nodes (`K/N` per node).
62+
2. **Minimal keys move on node addition/removal** (`K/N+1` or `K/N-1`).
63+
3. **Lookups are efficient (`O(log N)`)** using binary search.
64+
65+
### **Example**
66+
#### **Initial Nodes (`N = 3`)**
67+
| Node | Hash Value (Position on Ring) |
68+
|------|-----------------------------|
69+
| `A` | `H(A) = 15` |
70+
| `B` | `H(B) = 45` |
71+
| `C` | `H(C) = 90` |
72+
73+
#### **Keys (`K = 6`)**
74+
| Key | Hash Value | Assigned Node |
75+
|------|-----------|--------------|
76+
| `k1` | `H(k1) = 10` | `A` |
77+
| `k2` | `H(k2) = 30` | `B` |
78+
| `k3` | `H(k3) = 55` | `C` |
79+
| `k4` | `H(k4) = 70` | `C` |
80+
| `k5` | `H(k5) = 85` | `C` |
81+
| `k6` | `H(k6) = 95` | `A` |
82+
83+
#### **After Adding `Node D (H(D) = 60)`**
84+
Only **`k3` and `k4`** move to `D`, while other keys remain unaffected.

ch/ch.go

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
package ch
2+
3+
import (
4+
"hash/crc32"
5+
"sort"
6+
"strconv"
7+
"sync"
8+
)
9+
10+
// Hash function type
11+
type Hash func(data []byte) uint32
12+
13+
// Map represents the consistent hash ring with generics
14+
type Map[T any] struct {
15+
mu sync.RWMutex
16+
hash Hash
17+
replicas int
18+
keys []int // Sorted virtual node positions
19+
hashMap map[int]string // Virtual node hash -> Real node
20+
data map[string]T
21+
}
22+
23+
// New creates a new Consistent Hashing instance
24+
func New[T any](replicas int, fn Hash) *Map[T] {
25+
m := &Map[T]{
26+
replicas: replicas,
27+
hash: fn,
28+
hashMap: make(map[int]string),
29+
data: make(map[string]T),
30+
}
31+
if m.hash == nil {
32+
m.hash = crc32.ChecksumIEEE
33+
}
34+
return m
35+
}
36+
37+
// AddNode adds a node to the hash ring
38+
func (m *Map[T]) AddNode(node string) {
39+
m.mu.Lock()
40+
defer m.mu.Unlock()
41+
42+
for i := 0; i < m.replicas; i++ {
43+
hash := int(m.hash([]byte(strconv.Itoa(i) + node)))
44+
m.keys = append(m.keys, hash)
45+
m.hashMap[hash] = node
46+
}
47+
48+
sort.Ints(m.keys)
49+
}
50+
51+
// RemoveNode removes a node from the hash ring
52+
func (m *Map[T]) RemoveNode(node string) {
53+
m.mu.Lock()
54+
defer m.mu.Unlock()
55+
56+
var newKeys []int
57+
for _, hash := range m.keys {
58+
if m.hashMap[hash] != node {
59+
newKeys = append(newKeys, hash)
60+
} else {
61+
delete(m.hashMap, hash)
62+
}
63+
}
64+
m.keys = newKeys
65+
}
66+
67+
// GetNode returns the closest node for the provided key
68+
func (m *Map[T]) GetNode(key string) string {
69+
m.mu.RLock()
70+
defer m.mu.RUnlock()
71+
72+
if len(m.keys) == 0 {
73+
return ""
74+
}
75+
76+
hash := int(m.hash([]byte(key)))
77+
idx := sort.Search(len(m.keys), func(i int) bool {
78+
return m.keys[i] >= hash
79+
})
80+
if idx == len(m.keys) {
81+
idx = 0
82+
}
83+
return m.hashMap[m.keys[idx]]
84+
}
85+
86+
// AddKey stores a key-value pair in the correct node
87+
func (m *Map[T]) AddKey(key string, value T) {
88+
node := m.GetNode(key)
89+
90+
// If no node found, no need to store the value
91+
if node == "" {
92+
return
93+
}
94+
95+
m.mu.Lock()
96+
defer m.mu.Unlock()
97+
m.data[key] = value
98+
}
99+
100+
// RemoveKey deletes a key from the system
101+
func (m *Map[T]) RemoveKey(key string) {
102+
m.mu.Lock()
103+
defer m.mu.Unlock()
104+
delete(m.data, key)
105+
}
106+
107+
// GetKey retrieves a value stored in the system
108+
func (m *Map[T]) GetKey(key string) (T, bool) {
109+
m.mu.RLock()
110+
defer m.mu.RUnlock()
111+
value, exists := m.data[key]
112+
return value, exists
113+
}

ch/ch_example_test.go

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
package ch
2+
3+
import "fmt"
4+
5+
func ExampleNew() {
6+
type UserData struct {
7+
Name string
8+
Email string
9+
}
10+
11+
chStruct := New[UserData](3, nil)
12+
chStruct.AddNode("NodeA")
13+
chStruct.AddNode("NodeB")
14+
15+
chStruct.AddKey("user123", UserData{Name: "Alice", Email: "[email protected]"})
16+
user, exists := chStruct.GetKey("user123")
17+
if exists {
18+
fmt.Println("User Data:", user.Name, user.Email)
19+
}
20+
}

ch/ch_test.go

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
package ch
2+
3+
import (
4+
"strconv"
5+
"testing"
6+
)
7+
8+
func TestConsistentHashing_NodeAddition(t *testing.T) {
9+
ch := New[string](3, nil)
10+
ch.AddNode("NodeA")
11+
ch.AddNode("NodeB")
12+
ch.AddNode("NodeC")
13+
14+
key := "my-key"
15+
node := ch.GetNode(key)
16+
17+
if node == "" {
18+
t.Errorf("Expected a valid node, but got an empty string")
19+
}
20+
21+
sameNode := ch.GetNode(key)
22+
if node != sameNode {
23+
t.Errorf("Expected consistent mapping, but got different results")
24+
}
25+
}
26+
27+
func TestConsistentHashing_AddGetKey(t *testing.T) {
28+
ch := New[int](3, nil)
29+
ch.AddNode("NodeA")
30+
ch.AddNode("NodeB")
31+
32+
ch.AddKey("user123", 99)
33+
ch.AddKey("user456", 42)
34+
35+
value, exists := ch.GetKey("user123")
36+
if !exists || value != 99 {
37+
t.Errorf("Expected 99, but got %d", value)
38+
}
39+
40+
value, exists = ch.GetKey("user456")
41+
if !exists || value != 42 {
42+
t.Errorf("Expected 42, but got %d", value)
43+
}
44+
}
45+
46+
func TestConsistentHashing_RemoveKey(t *testing.T) {
47+
ch := New[string](3, nil)
48+
ch.AddNode("NodeA")
49+
ch.AddKey("user123", "Data1")
50+
51+
ch.RemoveKey("user123")
52+
53+
_, exists := ch.GetKey("user123")
54+
if exists {
55+
t.Errorf("Expected key to be removed, but it still exists")
56+
}
57+
}
58+
59+
type TestStruct struct {
60+
Name string
61+
Score int
62+
}
63+
64+
func TestConsistentHashing_WithStruct(t *testing.T) {
65+
ch := New[TestStruct](3, nil)
66+
ch.AddNode("NodeA")
67+
ch.AddNode("NodeB")
68+
69+
data := TestStruct{Name: "Alice", Score: 100}
70+
ch.AddKey("user123", data)
71+
72+
retrieved, exists := ch.GetKey("user123")
73+
if !exists || retrieved.Name != "Alice" || retrieved.Score != 100 {
74+
t.Errorf("Expected Alice with score 100, but got %+v", retrieved)
75+
}
76+
}
77+
78+
func BenchmarkConsistentHashing_AddNode(b *testing.B) {
79+
ch := New[string](100, nil)
80+
81+
b.ReportAllocs()
82+
b.ResetTimer()
83+
for i := 0; i < b.N; i++ {
84+
ch.AddNode("Node" + strconv.Itoa(i))
85+
}
86+
}
87+
88+
func BenchmarkConsistentHashing_RemoveNode(b *testing.B) {
89+
ch := New[string](100, nil)
90+
for i := 0; i < 1000; i++ {
91+
ch.AddNode("Node" + strconv.Itoa(i))
92+
}
93+
94+
b.ReportAllocs()
95+
b.ResetTimer()
96+
for i := 0; i < b.N; i++ {
97+
ch.RemoveNode("Node" + strconv.Itoa(i%1000))
98+
}
99+
}
100+
101+
func BenchmarkConsistentHashing_GetNode(b *testing.B) {
102+
ch := New[string](100, nil)
103+
for i := 0; i < 1000; i++ {
104+
ch.AddNode("Node" + strconv.Itoa(i))
105+
}
106+
107+
b.ReportAllocs()
108+
b.ResetTimer()
109+
for i := 0; i < b.N; i++ {
110+
_ = ch.GetNode("key" + strconv.Itoa(i))
111+
}
112+
}
113+
114+
func BenchmarkConsistentHashing_AddKey(b *testing.B) {
115+
ch := New[string](100, nil)
116+
for i := 0; i < 1000; i++ {
117+
ch.AddNode("Node" + strconv.Itoa(i))
118+
}
119+
120+
b.ReportAllocs()
121+
b.ResetTimer()
122+
for i := 0; i < b.N; i++ {
123+
ch.AddKey("key"+strconv.Itoa(i), "value"+strconv.Itoa(i))
124+
}
125+
}
126+
127+
func BenchmarkConsistentHashing_RemoveKey(b *testing.B) {
128+
ch := New[string](100, nil)
129+
for i := 0; i < 1000; i++ {
130+
ch.AddNode("Node" + strconv.Itoa(i))
131+
}
132+
for i := 0; i < 10000; i++ {
133+
ch.AddKey("key"+strconv.Itoa(i), "value"+strconv.Itoa(i))
134+
}
135+
136+
b.ReportAllocs()
137+
b.ResetTimer()
138+
for i := 0; i < b.N; i++ {
139+
ch.RemoveKey("key" + strconv.Itoa(i%10000))
140+
}
141+
}

0 commit comments

Comments
 (0)