|
3 | 3 |
|
4 | 4 | # Set
|
5 | 5 |
|
6 |
| -## Module Set |
7 |
| -To make a set of strings: |
| 6 | +`Set` is a functor, which means that it is a module that is parameterized |
| 7 | +by another module. More concretely, this means you cannot directly create |
| 8 | +a set; instead, you must first specify what type of elements your set will |
| 9 | +contain. |
| 10 | + |
| 11 | +The `Set` functor provides a function `Make` which accepts a module as a |
| 12 | +parameter, and returns a new module representing a set whose elements have |
| 13 | +the type that you passed in. For example, if you want to work with sets of |
| 14 | +strings, you can invoke `Set.Make(String)` which will return you a new module |
| 15 | +which you can assign the name `SS` (short for "String Set"). |
| 16 | + |
| 17 | +Doing this in the OCaml's top level will yield a lot of output: |
8 | 18 |
|
9 | 19 | ```ocamltop
|
10 | 20 | module SS = Set.Make(String);;
|
11 | 21 | ```
|
12 |
| -To create a set you need to start somewhere so here is the empty set: |
| 22 | + |
| 23 | +What happened here is that after assigning your newly created module to the name |
| 24 | +`SS`, OCaml's top level then displayed the module, which in this case contains |
| 25 | +a large number of convenience functions for working with sets (for example `is_empty` |
| 26 | +for checking if you set is empty, `add` to add an element to your set, `remove` to |
| 27 | +remove an element from your set, and so on). |
| 28 | + |
| 29 | +Note also that this module defines two types: `type elt = String.t` representing |
| 30 | +the type of the elements, and `type t = Set.Make(String).t` representing the type of |
| 31 | +the set itself. It's important to note this, because these types are used in the |
| 32 | +signatures of many of the functions defined in this module. |
| 33 | + |
| 34 | +For example, the `add` function has the signature `elt -> t -> t`, which means |
| 35 | +that it expects an element (a String), and a set of strings, and will return to you |
| 36 | +a set of strings. As you gain more experience in OCaml and other function languages, |
| 37 | +the type signature of functions are often the most convenient form of documentation |
| 38 | +on how to use those functions. |
| 39 | + |
| 40 | +## Creating a Set |
| 41 | + |
| 42 | +You've created your module representing a set of strings, but now you actually want |
| 43 | +to create an instance of a set of strings. So how do we go about doing this? Well, you |
| 44 | +could search through the documentation for the original `Set` functor to try and |
| 45 | +find what function or value you should use to do this, but this is an excellent |
| 46 | +opportunity to practice reading the type signatures and inferring the answer from them. |
| 47 | + |
| 48 | +You want to create a new set (as opposed to modifying an existing set). So you should |
| 49 | +look for functions whose return result has type `t` (the type representing the set), |
| 50 | +and which *does not* require a parameter of type `t`. |
| 51 | + |
| 52 | +Skimming through the list of functions in the module, there's only a handful of functions |
| 53 | +that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t` |
| 54 | +and `of_seq : elt Seq.t -> t`. |
| 55 | + |
| 56 | +Perhaps you already know how to work with lists and sequences in OCaml or |
| 57 | +perhaps you don't. For now, let's assume you don't know, and so we'll focus |
| 58 | +our attention on the first two functions in that list: `empty` and `singleton`. |
| 59 | + |
| 60 | +The type signature for `empty` says that it simply returns `t`, i.e. an instance |
| 61 | +of our set, without requiring any parameters at all. By intuition, you might |
| 62 | +guess that the only reasonable set that a library function could return when |
| 63 | +given zero parameters is the empty list. And the fact that the function is named |
| 64 | +`empty` reinforces this theory. |
| 65 | + |
| 66 | +Is there a way to test this theory? Perhaps if we had a function which |
| 67 | +could print out the size of a set, then we could check if the set we get |
| 68 | +from `empty` has a size of zero. In other words, we want a function which |
| 69 | +receives a set as a parameter, and returns an integer as a result. Again, |
| 70 | +skimming through the list of functions in the module, we see there is a |
| 71 | +function which matches this signature: `cardinal : t -> int`. If you're |
| 72 | +not familiar with the word "cardinal", you can look it up on Wikipedia |
| 73 | +and notice that it basically refers to the size of sets, so this reinforces |
| 74 | +the idea that this is exactly the function we want. |
| 75 | + |
| 76 | +So let's test our hypothesis: |
13 | 77 |
|
14 | 78 | ```ocamltop
|
15 | 79 | let s = SS.empty;;
|
| 80 | +SS.cardinal s;; |
16 | 81 | ```
|
17 |
| -Alternatively if we know an element to start with we can create a set |
18 |
| -like |
| 82 | + |
| 83 | +Excellent, it looks like `SS.empty` does indeed create an empty set, |
| 84 | +and `SS.cardinal` does indeed print out the size of a set. |
| 85 | + |
| 86 | +What about that other function we saw, `singleton : elt -> t`? Again, |
| 87 | +using our intuition, if we provide the function with a single element, |
| 88 | +and the function returns a set, then probably the function will return |
| 89 | +a set containing that element (or else what else would it do with the |
| 90 | +parameter we gave it?). The name of the function is `singleton`, and |
| 91 | +again if you're unfamiliar with what word, you can look it up on |
| 92 | +Wikipedia and see that the word means "a set with exactly one element". |
| 93 | +It sounds like we're on the right track again. Let's test our theory. |
19 | 94 |
|
20 | 95 | ```ocamltop
|
21 | 96 | let s = SS.singleton "hello";;
|
| 97 | +SS.cardinal s;; |
22 | 98 | ```
|
23 |
| -To add some elements to the set we can do. |
24 | 99 |
|
25 |
| -```ocamltop |
26 |
| -let s = |
27 |
| - List.fold_right SS.add ["hello"; "world"; "community"; "manager"; |
28 |
| - "stuff"; "blue"; "green"] s;; |
29 |
| -``` |
30 |
| -Now if we are playing around with sets we will probably want to see what |
31 |
| -is in the set that we have created. To do this we can write a function |
32 |
| -that will print the set out. |
| 100 | +It looks like we were right again! |
| 101 | + |
| 102 | +## Working with Sets |
| 103 | + |
| 104 | +Now let's say we want to build bigger and more complex sets. Specifically, |
| 105 | +let's say we want to add another element to our existing set. So we're |
| 106 | +looking for a function with two parameters: One of the parameters should |
| 107 | +be the element we wish to add, and the other parameter should be the set |
| 108 | +that we're adding to. For the return value, we would expect it to either |
| 109 | +return unit (if the function modifies the set in place), or it returns a |
| 110 | +new set representing the result of adding the new element. So we're |
| 111 | +looking for signatures that look something like `elt -> t -> unit` or |
| 112 | +`t -> elt -> unit` (since we don't know what order the two parameters |
| 113 | +should appear in), or `elt -> t -> t` or `t -> elt -> t`. |
| 114 | + |
| 115 | +Skimming through the list, we see 2 functions with matching signatures: |
| 116 | +`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names, |
| 117 | +`add` is probably the function we're looking for. `remove` probably removes |
| 118 | +an element from a list, and using our intuition again, it does seem like |
| 119 | +the type signature makes sense: To remove an element from a set, you need |
| 120 | +to tell it what set you want to perform the removal on and what element |
| 121 | +you want to remove; and the return result will be the resulting set after |
| 122 | +the removal. |
| 123 | + |
| 124 | +Furthermore, because we see that these functions return `t` and not `unit`, |
| 125 | +we can infer that these functions do not modify the set in place, but |
| 126 | +instead return a new set. Again, we can test this theory: |
33 | 127 |
|
34 | 128 | ```ocamltop
|
35 |
| -(* Prints a new line "\n" after each string is printed *) |
36 |
| -let print_set s = |
37 |
| - SS.iter print_endline s;; |
| 129 | +let firstSet = SS.singleton "hello";; |
| 130 | +let secondSet = SS.add "world" firstSet;; |
| 131 | +SS.cardinal firstSet;; |
| 132 | +SS.cardinal secondSet;; |
38 | 133 | ```
|
39 |
| -If we want to remove a specific element of a set there is a remove |
40 |
| -function. However if we want to remove several elements at once we could |
41 |
| -think of it as doing a 'filter'. Let's filter out all words that are |
42 |
| -longer than 5 characters. |
43 | 134 |
|
44 |
| -This can be written as: |
| 135 | +It looks like our theories were correct! |
| 136 | + |
| 137 | +## Sets of With Custom Comparators |
| 138 | + |
| 139 | +The `SS` module we created uses the built-in comparison function provided |
| 140 | +by the `String` module, which performs a case-sensitive comparison. We |
| 141 | +can test that with the following code: |
45 | 142 |
|
46 | 143 | ```ocamltop
|
47 |
| -let my_filter str = |
48 |
| - String.length str <= 5;; |
49 |
| -let s2 = SS.filter my_filter s;; |
| 144 | +let firstSet = SS.singleton "hello";; |
| 145 | +let secondSet = SS.add "HELLO" firstSet;; |
| 146 | +SS.cardinal firstSet;; |
| 147 | +SS.cardinal secondSet;; |
50 | 148 | ```
|
51 |
| -or using an anonymous function: |
| 149 | + |
| 150 | +As we can see, the `secondSet` has a cardinality of 2, indicating that |
| 151 | +`"hello"` and `"HELLO"` are considered two distinct elements. |
| 152 | + |
| 153 | +Let's say we want to create a set which performs a case-insensitive |
| 154 | +comparison instead. To do this, we simply have to change the parameter |
| 155 | +that we pass to the `Set.Make` function. |
| 156 | + |
| 157 | +The `Set.Make` function expects a struct with two fields: a type `t` |
| 158 | +that represents the type of the element, and a function `compare` |
| 159 | +whose signature is `t -> t -> int` and essentially returns 0 if two |
| 160 | +values are equal, and non-zero if they are non-equal. It just so happens |
| 161 | +that the `String` module matches that structure, which is why we could |
| 162 | +directly pass `String` as a parameter to `Set.Make`. Incidentally, many |
| 163 | +other modules also have that structure, including `Int` and `Float`, |
| 164 | +and so they too can be directly passed into `Set.Make` to construct a |
| 165 | +set of integers, or a set of floating point numbers. |
| 166 | + |
| 167 | +For our use case, we still want our elements to be of type string, but |
| 168 | +we want to change the comparison function to ignore the case of the |
| 169 | +strings. We can accomplish this by directly passing in a literal struct |
| 170 | +to the `Set.Make` function: |
52 | 171 |
|
53 | 172 | ```ocamltop
|
54 |
| -let s2 = SS.filter (fun str -> String.length str <= 5) s;; |
| 173 | +module CISS = Set.Make(struct |
| 174 | + type t = string |
| 175 | + let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b) |
| 176 | +end);; |
55 | 177 | ```
|
56 |
| -If we want to check and see if an element is in the set it might look |
57 |
| -like this. |
| 178 | + |
| 179 | +We name the resulting module CISS (short for "Case Insensitive String Set"). |
| 180 | +We can now test whether this module has the desired behavior: |
| 181 | + |
58 | 182 |
|
59 | 183 | ```ocamltop
|
60 |
| -SS.mem "hello" s2;; |
| 184 | +let firstSet = CISS.singleton "hello";; |
| 185 | +let secondSet = CISS.add "HELLO" firstSet;; |
| 186 | +CISS.cardinal firstSet;; |
| 187 | +CISS.cardinal secondSet;; |
61 | 188 | ```
|
62 | 189 |
|
63 |
| -The Set module also provides the set theoretic operations union, |
64 |
| -intersection and difference. For example, the difference of the original |
65 |
| -set and the set with short strings (≤ 5 characters) is the set of long |
66 |
| -strings: |
| 190 | +Success! `secondSet` has a cardinality of 1, showing that `"hello"` |
| 191 | +and `"HELLO"` are now considered to be the same element in this set. |
| 192 | +We now have a set of strings whose compare function performs a case |
| 193 | +insensitive comparison. |
| 194 | + |
| 195 | +Note that this technique can also be used to allow arbitrary types |
| 196 | +to be used as the element type for set, as long as you can define a |
| 197 | +meaningful compare operation: |
67 | 198 |
|
68 | 199 | ```ocamltop
|
69 |
| -print_set (SS.diff s s2);; |
70 |
| -``` |
71 |
| -Note that the Set module provides a purely functional data structure: |
72 |
| -removing an element from a set does not alter that set but, rather, |
73 |
| -returns a new set that is very similar to (and shares much of its |
74 |
| -internals with) the original set. |
| 200 | +type color = Red | Green | Blue;; |
75 | 201 |
|
| 202 | +module SC = Set.Make(struct |
| 203 | + type t = color |
| 204 | + let compare a b = |
| 205 | + match (a, b) with |
| 206 | + | (Red, Red) -> 0 |
| 207 | + | (Red, Green) -> 1 |
| 208 | + | (Red, Blue) -> 1 |
| 209 | + | (Green, Red) -> -1 |
| 210 | + | (Green, Green) -> 0 |
| 211 | + | (Green, Blue) -> 1 |
| 212 | + | (Blue, Red) -> -1 |
| 213 | + | (Blue, Green) -> -1 |
| 214 | + | (Blue, Blue) -> 0 |
| 215 | +end);; |
| 216 | +``` |
76 | 217 |
|
0 commit comments