-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy pathsolution_15.py
48 lines (35 loc) · 1.65 KB
/
solution_15.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import random
def coding_problem_15(sample_generator):
"""
Given a stream of elements too large to store in memory, pick a random element from the stream with
uniform probability.
Example:
>>> import random
>>> random.seed(0xBADC0FFE)
>>> def sample_gen_fun(n):
... for x in range(n):
... yield x
>>> num = 10
>>> hist = [0] * num
>>> for trials in range(5000):
... hist[coding_problem_15(sample_gen_fun(num))] += 1
>>> all(abs(float(h) / sum(hist) - 1.0 / len(hist)) < 0.01 for h in hist)
True
Note: this problem only makes sense if you don't know the total number in advance. Otherwise, you would pick a
random integer between [0..len[ and record the element corresponding to that index when processed. Setting a seed
for the random number generator to avoid random failures due to unlucky distributions of samples.
Strategy: assuming that at round n we have already selected between the available samples with uniform
probability, accept the incoming one as the new selected sample with a probability that keeps everyone's chances
fair. Example: at n==1, the first sample as 100% chance of being selected. n==2, second sample 50% chance, n==3
1/3 chance etc. Proof: for n==5, the prob. of choosing first sample is 1 * 0.5 * 0.666.. * 0.75 * 0.8 = 0.2 = 1/5
"""
sample_count = 0
selected_sample = None
for sample in sample_generator:
sample_count += 1
if random.random() <= 1.0 / sample_count:
selected_sample = sample
return selected_sample
if __name__ == '__main__':
import doctest
doctest.testmod(verbose=True)