Skip to content

Commit 53adbc8

Browse files
committed
Add reservoir sampling to selection sampling chapter
1 parent 2e058bd commit 53adbc8

File tree

2 files changed

+52
-1
lines changed

2 files changed

+52
-1
lines changed

Selection Sampling/README.markdown

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,35 @@ One more random number to pick, let's say it is 4 again. We swap `"c"` with `"a"
5050

5151
And that's it. Easy peasy. The performance of this function is **O(k)** because as soon as we've selected *k* elements, we're done.
5252

53-
However, there is one downside: this algorithm does not keep the elements in the original order. In the input array `"a"` came before `"e"` but now it's the other way around. If that is an issue for your app, you can't use this particular method.
53+
Here is an alternative algorithm, called "reservoir sampling":
54+
55+
```swift
56+
func reservoirSample<T>(from a: [T], count k: Int) -> [T] {
57+
precondition(a.count >= k)
58+
59+
var result = [T]() // 1
60+
for i in 0..<k {
61+
result.append(a[i])
62+
}
63+
64+
for i in k..<a.count { // 2
65+
let j = random(min: 0, max: i)
66+
if j < k {
67+
result[j] = a[i]
68+
}
69+
}
70+
return result
71+
}
72+
```
73+
74+
This works in two steps:
75+
76+
1. Fill the `result` array with the first `k` elements from the original array. This is called the "reservoir".
77+
2. Randomly replace elements in the reservoir with elements from the remaining pool.
78+
79+
The performance of this algorithm is **O(n)**, so it's a little bit slower than the first algorithm. However, its big advantage is that it can be used for arrays that are too large to fit in memory, even if you don't know what the size of the array is (in Swift this might be something like a lazy generator that reads the elements from a file).
80+
81+
There is one downside to the previous two algorithms: they do not keep the elements in the original order. In the input array `"a"` came before `"e"` but now it's the other way around. If that is an issue for your app, you can't use this particular method.
5482

5583
Here is an alternative approach that does keep the original order intact, but is a little more involved:
5684

Selection Sampling/SelectionSampling.swift

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,29 @@ func select<T>(from a: [T], count k: Int) -> [T] {
2323
return Array(a[0..<k])
2424
}
2525

26+
/*
27+
Pick k random elements from an array. Performance: O(n).
28+
*/
29+
func reservoirSample<T>(from a: [T], count k: Int) -> [T] {
30+
precondition(a.count >= k)
31+
32+
var result = [T]()
33+
34+
// Fill the result array with first k elements.
35+
for i in 0..<k {
36+
result.append(a[i])
37+
}
38+
39+
// Randomly replace elements from remaining pool.
40+
for i in k..<a.count {
41+
let j = random(min: 0, max: i)
42+
if j < k {
43+
result[j] = a[i]
44+
}
45+
}
46+
return result
47+
}
48+
2649
/*
2750
Selects `count` items at random from an array. Respects the original order of
2851
the elements. Performance: O(n).

0 commit comments

Comments
 (0)