@@ -6,140 +6,253 @@ description: >
6
6
category : " Data Structures"
7
7
---
8
8
9
- # Sets
10
-
11
- ## Module Set
12
- To make a set of strings:
13
-
14
- ``` ocaml
15
- # module SS = Set.Make(String);;
16
- module SS :
17
- sig
18
- type elt = string
19
- type t = Set.Make(String).t
20
- val empty : t
21
- val is_empty : t -> bool
22
- val mem : elt -> t -> bool
23
- val add : elt -> t -> t
24
- val singleton : elt -> t
25
- val remove : elt -> t -> t
26
- val union : t -> t -> t
27
- val inter : t -> t -> t
28
- val disjoint : t -> t -> bool
29
- val diff : t -> t -> t
30
- val compare : t -> t -> int
31
- val equal : t -> t -> bool
32
- val subset : t -> t -> bool
33
- val iter : (elt -> unit) -> t -> unit
34
- val map : (elt -> elt) -> t -> t
35
- val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a
36
- val for_all : (elt -> bool) -> t -> bool
37
- val exists : (elt -> bool) -> t -> bool
38
- val filter : (elt -> bool) -> t -> t
39
- val filter_map : (elt -> elt option) -> t -> t
40
- val partition : (elt -> bool) -> t -> t * t
41
- val cardinal : t -> int
42
- val elements : t -> elt list
43
- val min_elt : t -> elt
44
- val min_elt_opt : t -> elt option
45
- val max_elt : t -> elt
46
- val max_elt_opt : t -> elt option
47
- val choose : t -> elt
48
- val choose_opt : t -> elt option
49
- val split : elt -> t -> t * bool * t
50
- val find : elt -> t -> elt
51
- val find_opt : elt -> t -> elt option
52
- val find_first : (elt -> bool) -> t -> elt
53
- val find_first_opt : (elt -> bool) -> t -> elt option
54
- val find_last : (elt -> bool) -> t -> elt
55
- val find_last_opt : (elt -> bool) -> t -> elt option
56
- val of_list : elt list -> t
57
- val to_seq_from : elt -> t -> elt Seq.t
58
- val to_seq : t -> elt Seq.t
59
- val to_rev_seq : t -> elt Seq.t
60
- val add_seq : elt Seq.t -> t -> t
61
- val of_seq : elt Seq.t -> t
62
- end
63
- ```
9
+ # Set
10
+
11
+ ` Set ` is a functor, which means that it is a module that is parameterized
12
+ by another module. More concretely, this means you cannot directly create
13
+ a set; instead, you must first specify what type of elements your set will
14
+ contain.
64
15
65
- To create a set you need to start somewhere so here is the empty set:
16
+ The ` Set ` functor provides a function ` Make ` which accepts a module as a
17
+ parameter, and returns a new module representing a set whose elements have
18
+ the type that you passed in. For example, if you want to work with sets of
19
+ strings, you can invoke ` Set.Make(String) ` which will return you a new module
20
+ which you can assign the name ` SS ` (short for "String Set"). Note: Be sure to
21
+ pay attention to the case; you need to type ` Set.Make(String) ` and not
22
+ ` Set.Make(string) ` . The reason behind this is explained in the
23
+ "Technical Details" section at the bottom.
66
24
67
- ``` ocaml
68
- # let s = SS.empty;;
69
- val s : SS.t = <abstr>
25
+ Doing this in the OCaml's top level will yield a lot of output:
26
+
27
+ ``` ocamltop
28
+ module SS = Set.Make(String);;
70
29
```
71
30
72
- Alternatively if we know an element to start with we can create a set
73
- like
31
+ What happened here is that after assigning your newly created module to the name
32
+ ` SS ` , OCaml's top level then displayed the module, which in this case contains
33
+ a large number of convenience functions for working with sets (for example ` is_empty `
34
+ for checking if you set is empty, ` add ` to add an element to your set, ` remove ` to
35
+ remove an element from your set, and so on).
36
+
37
+ Note also that this module defines two types: ` type elt = String.t ` representing
38
+ the type of the elements, and ` type t = Set.Make(String).t ` representing the type of
39
+ the set itself. It's important to note this, because these types are used in the
40
+ signatures of many of the functions defined in this module.
41
+
42
+ For example, the ` add ` function has the signature ` elt -> t -> t ` , which means
43
+ that it expects an element (a String), and a set of strings, and will return to you
44
+ a set of strings. As you gain more experience in OCaml and other function languages,
45
+ the type signature of functions are often the most convenient form of documentation
46
+ on how to use those functions.
47
+
48
+ ## Creating a Set
49
+
50
+ You've created your module representing a set of strings, but now you actually want
51
+ to create an instance of a set of strings. So how do we go about doing this? Well, you
52
+ could search through the documentation for the original ` Set ` functor to try and
53
+ find what function or value you should use to do this, but this is an excellent
54
+ opportunity to practice reading the type signatures and inferring the answer from them.
55
+
56
+ You want to create a new set (as opposed to modifying an existing set). So you should
57
+ look for functions whose return result has type ` t ` (the type representing the set),
58
+ and which * does not* require a parameter of type ` t ` .
74
59
75
- ``` ocaml
76
- # let s = SS.singleton "hello";;
77
- val s : SS.t = <abstr>
60
+ Skimming through the list of functions in the module, there's only a handful of functions
61
+ that match that criteria: ` empty: t ` , ` singleton : elt -> t ` , ` of_list : elt list -> t `
62
+ and ` of_seq : elt Seq.t -> t ` .
63
+
64
+ Perhaps you already know how to work with lists and sequences in OCaml or
65
+ perhaps you don't. For now, let's assume you don't know, and so we'll focus
66
+ our attention on the first two functions in that list: ` empty ` and ` singleton ` .
67
+
68
+ The type signature for ` empty ` says that it simply returns ` t ` , i.e. an instance
69
+ of our set, without requiring any parameters at all. By intuition, you might
70
+ guess that the only reasonable set that a library function could return when
71
+ given zero parameters is the empty set. And the fact that the function is named
72
+ ` empty ` reinforces this theory.
73
+
74
+ Is there a way to test this theory? Perhaps if we had a function which
75
+ could print out the size of a set, then we could check if the set we get
76
+ from ` empty ` has a size of zero. In other words, we want a function which
77
+ receives a set as a parameter, and returns an integer as a result. Again,
78
+ skimming through the list of functions in the module, we see there is a
79
+ function which matches this signature: ` cardinal : t -> int ` . If you're
80
+ not familiar with the word "cardinal", you can look it up on Wikipedia
81
+ and notice that it basically refers to the size of sets, so this reinforces
82
+ the idea that this is exactly the function we want.
83
+
84
+ So let's test our hypothesis:
85
+
86
+ ``` ocamltop
87
+ let s = SS.empty;;
88
+ SS.cardinal s;;
78
89
```
79
90
80
- To add some elements to the set we can do.
91
+ Excellent, it looks like ` SS.empty ` does indeed create an empty set,
92
+ and ` SS.cardinal ` does indeed print out the size of a set.
93
+
94
+ What about that other function we saw, ` singleton : elt -> t ` ? Again,
95
+ using our intuition, if we provide the function with a single element,
96
+ and the function returns a set, then probably the function will return
97
+ a set containing that element (or else what else would it do with the
98
+ parameter we gave it?). The name of the function is ` singleton ` , and
99
+ again if you're unfamiliar with what word, you can look it up on
100
+ Wikipedia and see that the word means "a set with exactly one element".
101
+ It sounds like we're on the right track again. Let's test our theory.
81
102
82
- ``` ocaml
83
- # let s =
84
- List.fold_right SS.add ["hello"; "world"; "community"; "manager";
85
- "stuff"; "blue"; "green"] s;;
86
- val s : SS.t = <abstr>
103
+ ``` ocamltop
104
+ let s = SS.singleton "hello";;
105
+ SS.cardinal s;;
87
106
```
88
107
89
- Now if we are playing around with sets we will probably want to see what
90
- is in the set that we have created. To do this we can write a function
91
- that will print the set out.
108
+ It looks like we were right again!
109
+
110
+ ## Working with Sets
111
+
112
+ Now let's say we want to build bigger and more complex sets. Specifically,
113
+ let's say we want to add another element to our existing set. So we're
114
+ looking for a function with two parameters: One of the parameters should
115
+ be the element we wish to add, and the other parameter should be the set
116
+ that we're adding to. For the return value, we would expect it to either
117
+ return unit (if the function modifies the set in place), or it returns a
118
+ new set representing the result of adding the new element. So we're
119
+ looking for signatures that look something like ` elt -> t -> unit ` or
120
+ ` t -> elt -> unit ` (since we don't know what order the two parameters
121
+ should appear in), or ` elt -> t -> t ` or ` t -> elt -> t ` .
92
122
93
- ``` ocaml
94
- # let print_set s =
95
- SS.iter print_endline s;;
96
- val print_set : SS.t -> unit = <fun>
123
+ Skimming through the list, we see 2 functions with matching signatures:
124
+ ` add : elt -> t -> t ` and ` remove : elt -> t -> t ` . Based on their names,
125
+ ` add ` is probably the function we're looking for. ` remove ` probably removes
126
+ an element from a set, and using our intuition again, it does seem like
127
+ the type signature makes sense: To remove an element from a set, you need
128
+ to tell it what set you want to perform the removal on and what element
129
+ you want to remove; and the return result will be the resulting set after
130
+ the removal.
131
+
132
+ Furthermore, because we see that these functions return ` t ` and not ` unit ` ,
133
+ we can infer that these functions do not modify the set in place, but
134
+ instead return a new set. Again, we can test this theory:
135
+
136
+ ``` ocamltop
137
+ let firstSet = SS.singleton "hello";;
138
+ let secondSet = SS.add "world" firstSet;;
139
+ SS.cardinal firstSet;;
140
+ SS.cardinal secondSet;;
97
141
```
98
142
99
- If we want to remove a specific element of a set there is a remove
100
- function. However if we want to remove several elements at once we could
101
- think of it as doing a 'filter'. Let's filter out all words that are
102
- longer than 5 characters.
143
+ It looks like our theories were correct!
103
144
104
- This can be written as:
145
+ ## Sets of With Custom Comparators
105
146
106
- ``` ocaml
107
- # let my_filter str =
108
- String.length str <= 5;;
109
- val my_filter : string -> bool = <fun>
110
- # let s2 = SS.filter my_filter s;;
111
- val s2 : SS.t = <abstr>
147
+ The ` SS ` module we created uses the built-in comparison function provided
148
+ by the ` String ` module, which performs a case-sensitive comparison. We
149
+ can test that with the following code:
150
+
151
+ ``` ocamltop
152
+ let firstSet = SS.singleton "hello";;
153
+ let secondSet = SS.add "HELLO" firstSet;;
154
+ SS.cardinal firstSet;;
155
+ SS.cardinal secondSet;;
112
156
```
113
157
114
- or using an anonymous function:
158
+ As we can see, the ` secondSet ` has a cardinality of 2, indicating that
159
+ ` "hello" ` and ` "HELLO" ` are considered two distinct elements.
160
+
161
+ Let's say we want to create a set which performs a case-insensitive
162
+ comparison instead. To do this, we simply have to change the parameter
163
+ that we pass to the ` Set.Make ` function.
115
164
116
- ``` ocaml
117
- # let s2 = SS.filter (fun str -> String.length str <= 5) s;;
118
- val s2 : SS.t = <abstr>
165
+ The ` Set.Make ` function expects a struct with two fields: a type ` t `
166
+ that represents the type of the element, and a function ` compare `
167
+ whose signature is ` t -> t -> int ` and essentially returns 0 if two
168
+ values are equal, and non-zero if they are non-equal. It just so happens
169
+ that the ` String ` module matches that structure, which is why we could
170
+ directly pass ` String ` as a parameter to ` Set.Make ` . Incidentally, many
171
+ other modules also have that structure, including ` Int ` and ` Float ` ,
172
+ and so they too can be directly passed into ` Set.Make ` to construct a
173
+ set of integers, or a set of floating point numbers.
174
+
175
+ For our use case, we still want our elements to be of type string, but
176
+ we want to change the comparison function to ignore the case of the
177
+ strings. We can accomplish this by directly passing in a literal struct
178
+ to the ` Set.Make ` function:
179
+
180
+ ``` ocamltop
181
+ module CISS = Set.Make(struct
182
+ type t = string
183
+ let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b)
184
+ end);;
119
185
```
120
186
121
- If we want to check and see if an element is in the set it might look
122
- like this.
187
+ We name the resulting module CISS (short for "Case Insensitive String Set").
188
+ We can now test whether this module has the desired behavior:
123
189
124
- ``` ocaml
125
- # SS.mem "hello" s2;;
126
- - : bool = true
190
+
191
+ ``` ocamltop
192
+ let firstSet = CISS.singleton "hello";;
193
+ let secondSet = CISS.add "HELLO" firstSet;;
194
+ CISS.cardinal firstSet;;
195
+ CISS.cardinal secondSet;;
127
196
```
128
197
129
- The Set module also provides the set theoretic operations union,
130
- intersection and difference. For example, the difference of the original
131
- set and the set with short strings (≤ 5 characters) is the set of long
132
- strings:
198
+ Success! ` secondSet ` has a cardinality of 1, showing that ` "hello" `
199
+ and ` "HELLO" ` are now considered to be the same element in this set.
200
+ We now have a set of strings whose compare function performs a case
201
+ insensitive comparison.
202
+
203
+ Note that this technique can also be used to allow arbitrary types
204
+ to be used as the element type for set, as long as you can define a
205
+ meaningful compare operation:
133
206
134
- ``` ocaml
135
- # print_set (SS.diff s s2);;
136
- community
137
- manager
138
- - : unit = ()
207
+ ``` ocamltop
208
+ type color = Red | Green | Blue;;
209
+
210
+ module SC = Set.Make(struct
211
+ type t = color
212
+ let compare a b =
213
+ match (a, b) with
214
+ | (Red, Red) -> 0
215
+ | (Red, Green) -> 1
216
+ | (Red, Blue) -> 1
217
+ | (Green, Red) -> -1
218
+ | (Green, Green) -> 0
219
+ | (Green, Blue) -> 1
220
+ | (Blue, Red) -> -1
221
+ | (Blue, Green) -> -1
222
+ | (Blue, Blue) -> 0
223
+ end);;
139
224
```
140
225
141
- Note that the Set module provides a purely functional data structure:
142
- removing an element from a set does not alter that set but, rather,
143
- returns a new set that is very similar to (and shares much of its
144
- internals with) the original set.
226
+ ## Technical Details
227
+
228
+ ### Set.Make, types and modules
229
+
230
+ As mentioned in a previous section, the ` Set.Make ` function accepts a structure
231
+ with two specific fields, ` t ` and ` compare ` . Modules have structure, and thus
232
+ it's possible (but not guaranteed) for a module to have the structure that
233
+ ` Set.Make ` expects. On the other hand, types do not have structure, and so you
234
+ can never pass a type to the ` Set.Make ` function. In OCaml, modules start with
235
+ an upper case letter and types start with a lower case letter. This is why
236
+ when creating a set of strings, you have to use ` Set.Make(String) ` (passing in
237
+ the module named ` String ` ), and not ` Set.Make(string) ` (which would be attempting
238
+ to pass in the type named ` string ` , which will not work).
239
+
240
+ ### Purely Functional Data Structures
241
+
242
+ The data structure implemented by the Set functor is a purely functional one.
243
+ What exactly that means is a big topic in itself (feel free to search for
244
+ "Purely Functional Data Structure" in Google or Wikipedia to learn more). As a
245
+ short oversimplification, this means that all instances of the data structure
246
+ that you create are immutable. The functions like ` add ` and ` remove ` do not
247
+ actually modify the set you pass in, but instead return a new set representing
248
+ the results of having performed the corresponding operation.
249
+
250
+ ### Full API documentation
251
+
252
+ This tutorial focused on teaching how to quickly find a function that does what
253
+ you want by looking at the type signature. This is often the quickest and most
254
+ convenient way to discover useful functions. However, sometimes you do want to
255
+ see the formal documentation for the API provided by a module. For sets, the
256
+ API documentation you probably want to look at is at
257
+ https://ocaml.org/api/Set.Make.html
145
258
0 commit comments