Skip to content

Commit 5a4fac9

Browse files
committed
Merge pull request kodecocodes#62 from Jamil/master
Add Bloom Filter Implementation & Description
2 parents 5721ee7 + 11e3d64 commit 5a4fac9

File tree

4 files changed

+195
-1
lines changed

4 files changed

+195
-1
lines changed

Bloom Filter/BloomFilter.swift

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
import Foundation
2+
3+
public class BloomFilter<T> {
4+
private(set) private var arr: [Bool]
5+
private(set) private var hashFunctions: [T -> Int]
6+
7+
public init(size: Int = 1024, hashFunctions: [T -> Int]) {
8+
self.arr = Array<Bool>(count: size, repeatedValue: false)
9+
self.hashFunctions = hashFunctions
10+
}
11+
12+
private func computeHashes(value: T) -> [Int] {
13+
return hashFunctions.map() { hashFunc in
14+
abs(hashFunc(value) % self.arr.count)
15+
}
16+
}
17+
18+
public func insert(toInsert: T) {
19+
let hashValues: [Int] = self.computeHashes(toInsert)
20+
21+
for hashValue in hashValues {
22+
self.arr[hashValue] = true
23+
}
24+
}
25+
26+
public func insert(values: [T]) {
27+
for value in values {
28+
self.insert(value)
29+
}
30+
}
31+
32+
public func query(value: T) -> Bool {
33+
let hashValues = self.computeHashes(value)
34+
35+
// Map hashes to indices in the Bloom filter
36+
let results = hashValues.map() { hashValue in
37+
self.arr[hashValue]
38+
}
39+
40+
// All values must be 'true' for the query to return true
41+
42+
// This does NOT imply that the value is in the Bloom filter,
43+
// only that it may be. If the query returns false, however,
44+
// you can be certain that the value was not added.
45+
46+
let exists = results.reduce(true, combine: { $0 && $1 })
47+
48+
return exists
49+
}
50+
51+
public func isEmpty() -> Bool {
52+
// Reduce list; as soon as the reduction hits a 'true' value, the && condition will fail
53+
return arr.reduce(true) { prev, next in
54+
prev && !next
55+
}
56+
}
57+
58+
}
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
import XCTest
2+
import BloomFilter
3+
4+
/* Two hash functions, adapted from
5+
http://www.cse.yorku.ca/~oz/hash.html */
6+
7+
8+
func djb2(x: String) -> Int {
9+
var hash = 5381
10+
11+
for char in x.characters {
12+
hash = ((hash << 5) &+ hash) &+ char.hashValue
13+
}
14+
15+
return Int(hash)
16+
}
17+
18+
func sdbm(x: String) -> Int {
19+
var hash = 0
20+
21+
for char in x.characters {
22+
hash = char.hashValue &+ (hash << 6) &+ (hash << 16) &- hash;
23+
}
24+
25+
return Int(hash)
26+
}
27+
28+
29+
class BloomFilterTests: XCTestCase {
30+
31+
func testSingleHashFunction() {
32+
let bloom = BloomFilter<String>(hashFunctions: [djb2])
33+
34+
bloom.insert("Hello world!")
35+
36+
let result_good = bloom.query("Hello world!")
37+
let result_bad = bloom.query("Hello world")
38+
39+
XCTAssertTrue(result_good)
40+
XCTAssertFalse(result_bad)
41+
}
42+
43+
func testEmptyFilter() {
44+
let bloom = BloomFilter<String>(hashFunctions: [djb2])
45+
46+
let empty = bloom.isEmpty()
47+
48+
XCTAssertTrue(empty)
49+
}
50+
51+
func testMultipleHashFunctions() {
52+
let bloom = BloomFilter<String>(hashFunctions: [djb2, sdbm])
53+
54+
bloom.insert("Hello world!")
55+
56+
let result_good = bloom.query("Hello world!")
57+
let result_bad = bloom.query("Hello world")
58+
59+
XCTAssertTrue(result_good)
60+
XCTAssertFalse(result_bad)
61+
}
62+
63+
func testFalsePositive() {
64+
let bloom = BloomFilter<String>(size: 5, hashFunctions: [djb2, sdbm])
65+
66+
bloom.insert(["hello", "elloh", "llohe", "lohel", "ohell"])
67+
68+
print("Inserted")
69+
70+
let query = bloom.query("This wasn't inserted!")
71+
72+
// This is true even though we did not insert the value in the Bloom filter;
73+
// the Bloom filter is capable of producing false positives but NOT
74+
// false negatives.
75+
76+
XCTAssertTrue(query)
77+
}
78+
}

Bloom Filter/README.markdown

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Bloom Filter
2+
3+
## Introduction
4+
5+
A Bloom Filter is a space-efficient data structure to check for an element in a set, that guarantees that there are no false negatives on queries. In other words, a query to a Bloom filter either returns "false", meaning the element is definitely not in the set, or "true", meaning that the element could be in the set. At first, this may not seem too useful. However, it's important in applications like cache filtering and data synchronization.
6+
7+
An advantage of the Bloom Filter over a hash table is that the former maintains constant memory usage and constant-time insert and search. For a large number of elements in a set, the performance difference between a hash table and a Bloom Filter is significant, and it is a viable option if you do not need the guarantee of no false positives.
8+
9+
## Implementation
10+
11+
A Bloom Filter is essentially a fixed-length bit vector. To insert an element in the filter, it is hashed with *m* different hash functions, which map to indices in the array. The bits at these indices are set to `1`, or `true`, when an element is inserted.
12+
13+
Querying, similarly, is accomplished by hashing the expected value, and checking to see if all of the bits at the indices are `true`. If even one of the bits is not `true`, the element could not have been inserted - and the query returns `false`. If all the bits are `true`, the query returns likewise. If there are "collisions", the query may erroneously return `true` even though the element was not inserted - bringing about the issue with false positives mentioned earlier.
14+
15+
Deletion is not possible with a Bloom Filter, since any one bit might have been set by multiple elements inserted. Once you add an element, it's in there for good.
16+
17+
## The Code
18+
19+
The code is extremely straightforward, as you can imagine. The internal bit array is set to a fixed length on initialization, which cannot be mutated once it is initialized. Several hash functions should be specified at initialization, which will depend on the types you're using. You can see some examples in the tests - the djb2 and sdbm hash functions for strings.
20+
21+
```swift
22+
public init(size: Int = 1024, hashFunctions: [T -> Int]) {
23+
self.arr = Array<Bool>(count: size, repeatedValue: false)
24+
self.hashFunctions = hashFunctions
25+
}
26+
```
27+
28+
Insertion just flips the required bits to `true`:
29+
30+
```swift
31+
public func insert(toInsert: T) {
32+
let hashValues: [Int] = self.computeHashes(toInsert)
33+
34+
for hashValue in hashValues {
35+
self.arr[hashValue] = true
36+
}
37+
}
38+
```
39+
40+
And querying checks to make sure the bits at the hashed values are `true`:
41+
42+
```swift
43+
public func query(value: T) -> Bool {
44+
let hashValues = self.computeHashes(value)
45+
46+
let results = hashValues.map() { hashValue in
47+
self.arr[hashValue]
48+
}
49+
50+
let exists = results.reduce(true, combine: { $0 && $1 })
51+
52+
return exists
53+
}
54+
```
55+
56+
If you're coming from another imperative language, you might notice the unusual syntax in the `exists` constant assignment. Swift makes use of functional paradigms when it makes code more consise and readable, and in this case, `reduce` is a much more consise way to check if all the required bits are `true` than a `for` loop.
57+
58+
*Written for Swift Algorithm Club by Jamil Dhanani*

README.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ Most of the time using just the built-in `Array`, `Dictionary`, and `Set` types
154154

155155
### Sets
156156

157-
- Bloom Filter
157+
- [Bloom Filter](Bloom Filter/). A constant-memory data structure that probabilistically tests whether an element is in a set.
158158
- [Hash Set](Hash Set/). A set implemented using a hash table.
159159
- Multiset
160160
- Ordered Set

0 commit comments

Comments
 (0)