Skip to content

Commit a2322fc

Browse files
committed
Rewrote set tutorial from scratch.
This is intended to address the issue mentioned in ocaml/ocaml.org#824 This version of the tutorial now demonstrates how to use Set with arbitrary types. It also provides a demonstration on how to reason about the behavior of functions based on their type signatures.
1 parent 1ce0c62 commit a2322fc

File tree

1 file changed

+181
-40
lines changed

1 file changed

+181
-40
lines changed

site/learn/tutorials/set.md

+181-40
Original file line numberDiff line numberDiff line change
@@ -3,74 +3,215 @@
33

44
# Set
55

6-
## Module Set
7-
To make a set of strings:
6+
`Set` is a functor, which means that it is a module that is parameterized
7+
by another module. More concretely, this means you cannot directly create
8+
a set; instead, you must first specify what type of elements your set will
9+
contain.
10+
11+
The `Set` functor provides a function `Make` which accepts a module as a
12+
parameter, and returns a new module representing a set whose elements have
13+
the type that you passed in. For example, if you want to work with sets of
14+
strings, you can invoke `Set.Make(String)` which will return you a new module
15+
which you can assign the name `SS` (short for "String Set").
16+
17+
Doing this in the OCaml's top level will yield a lot of output:
818

919
```ocamltop
1020
module SS = Set.Make(String);;
1121
```
12-
To create a set you need to start somewhere so here is the empty set:
22+
23+
What happened here is that after assigning your newly created module to the name
24+
`SS`, OCaml's top level then displayed the module, which in this case contains
25+
a large number of convenience functions for working with sets (for example `is_empty`
26+
for checking if you set is empty, `add` to add an element to your set, `remove` to
27+
remove an element from your set, and so on).
28+
29+
Note also that this module defines two types: `type elt = String.t` representing
30+
the type of the elements, and `type t = Set.Make(String).t` representing the type of
31+
the set itself. It's important to note this, because these types are used in the
32+
signatures of many of the functions defined in this module.
33+
34+
For example, the `add` function has the signature `elt -> t -> t`, which means
35+
that it expects an element (a String), and a set of strings, and will return to you
36+
a set of strings. As you gain more experience in OCaml and other function languages,
37+
the type signature of functions are often the most convenient form of documentation
38+
on how to use those functions.
39+
40+
## Creating a Set
41+
42+
You've created your module representing a set of strings, but now you actually want
43+
to create an instance of a set of strings. So how do we go about doing this? Well, you
44+
could search through the documentation for the original `Set` functor to try and
45+
find what function or value you should use to do this, but this is an excellent
46+
opportunity to practice reading the type signatures and inferring the answer from them.
47+
48+
You want to create a new set (as opposed to modifying an existing set). So you should
49+
look for functions whose return result has type `t` (the type representing the set),
50+
and which *does not* require a parameter of type `t`.
51+
52+
Skimming through the list of functions in the module, there's only a handful of functions
53+
that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t`
54+
and `of_seq : elt Seq.t -> t`.
55+
56+
Perhaps you already know how to work with lists and sequences in OCaml or
57+
perhaps you don't. For now, let's assume you don't know, and so we'll focus
58+
our attention on the first two functions in that list: `empty` and `singleton`.
59+
60+
The type signature for `empty` says that it simply returns `t`, i.e. an instance
61+
of our set, without requiring any parameters at all. By intuition, you might
62+
guess that the only reasonable set that a library function could return when
63+
given zero parameters is the empty list. And the fact that the function is named
64+
`empty` reinforces this theory.
65+
66+
Is there a way to test this theory? Perhaps if we had a function which
67+
could print out the size of a set, then we could check if the set we get
68+
from `empty` has a size of zero. In other words, we want a function which
69+
receives a set as a parameter, and returns an integer as a result. Again,
70+
skimming through the list of functions in the module, we see there is a
71+
function which matches this signature: `cardinal : t -> int`. If you're
72+
not familiar with the word "cardinal", you can look it up on Wikipedia
73+
and notice that it basically refers to the size of sets, so this reinforces
74+
the idea that this is exactly the function we want.
75+
76+
So let's test our hypothesis:
1377

1478
```ocamltop
1579
let s = SS.empty;;
80+
SS.cardinal s;;
1681
```
17-
Alternatively if we know an element to start with we can create a set
18-
like
82+
83+
Excellent, it looks like `SS.empty` does indeed create an empty set,
84+
and `SS.cardinal` does indeed print out the size of a set.
85+
86+
What about that other function we saw, `singleton : elt -> t`? Again,
87+
using our intuition, if we provide the function with a single element,
88+
and the function returns a set, then probably the function will return
89+
a set containing that element (or else what else would it do with the
90+
parameter we gave it?). The name of the function is `singleton`, and
91+
again if you're unfamiliar with what word, you can look it up on
92+
Wikipedia and see that the word means "a set with exactly one element".
93+
It sounds like we're on the right track again. Let's test our theory.
1994

2095
```ocamltop
2196
let s = SS.singleton "hello";;
97+
SS.cardinal s;;
2298
```
23-
To add some elements to the set we can do.
2499

25-
```ocamltop
26-
let s =
27-
List.fold_right SS.add ["hello"; "world"; "community"; "manager";
28-
"stuff"; "blue"; "green"] s;;
29-
```
30-
Now if we are playing around with sets we will probably want to see what
31-
is in the set that we have created. To do this we can write a function
32-
that will print the set out.
100+
It looks like we were right again!
101+
102+
## Working with Sets
103+
104+
Now let's say we want to build bigger and more complex sets. Specifically,
105+
let's say we want to add another element to our existing set. So we're
106+
looking for a function with two parameters: One of the parameters should
107+
be the element we wish to add, and the other parameter should be the set
108+
that we're adding to. For the return value, we would expect it to either
109+
return unit (if the function modifies the set in place), or it returns a
110+
new set representing the result of adding the new element. So we're
111+
looking for signatures that look something like `elt -> t -> unit` or
112+
`t -> elt -> unit` (since we don't know what order the two parameters
113+
should appear in), or `elt -> t -> t` or `t -> elt -> t`.
114+
115+
Skimming through the list, we see 2 functions with matching signatures:
116+
`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names,
117+
`add` is probably the function we're looking for. `remove` probably removes
118+
an element from a list, and using our intuition again, it does seem like
119+
the type signature makes sense: To remove an element from a set, you need
120+
to tell it what set you want to perform the removal on and what element
121+
you want to remove; and the return result will be the resulting set after
122+
the removal.
123+
124+
Furthermore, because we see that these functions return `t` and not `unit`,
125+
we can infer that these functions do not modify the set in place, but
126+
instead return a new set. Again, we can test this theory:
33127

34128
```ocamltop
35-
(* Prints a new line "\n" after each string is printed *)
36-
let print_set s =
37-
SS.iter print_endline s;;
129+
let firstSet = SS.singleton "hello";;
130+
let secondSet = SS.add "world" firstSet;;
131+
SS.cardinal firstSet;;
132+
SS.cardinal secondSet;;
38133
```
39-
If we want to remove a specific element of a set there is a remove
40-
function. However if we want to remove several elements at once we could
41-
think of it as doing a 'filter'. Let's filter out all words that are
42-
longer than 5 characters.
43134

44-
This can be written as:
135+
It looks like our theories were correct!
136+
137+
## Sets of With Custom Comparators
138+
139+
The `SS` module we created uses the built-in comparison function provided
140+
by the `String` module, which performs a case-sensitive comparison. We
141+
can test that with the following code:
45142

46143
```ocamltop
47-
let my_filter str =
48-
String.length str <= 5;;
49-
let s2 = SS.filter my_filter s;;
144+
let firstSet = SS.singleton "hello";;
145+
let secondSet = SS.add "HELLO" firstSet;;
146+
SS.cardinal firstSet;;
147+
SS.cardinal secondSet;;
50148
```
51-
or using an anonymous function:
149+
150+
As we can see, the `secondSet` has a cardinality of 2, indicating that
151+
`"hello"` and `"HELLO"` are considered two distinct elements.
152+
153+
Let's say we want to create a set which performs a case-insensitive
154+
comparison instead. To do this, we simply have to change the parameter
155+
that we pass to the `Set.Make` function.
156+
157+
The `Set.Make` function expects a struct with two fields: a type `t`
158+
that represents the type of the element, and a function `compare`
159+
whose signature is `t -> t -> int` and essentially returns 0 if two
160+
values are equal, and non-zero if they are non-equal. It just so happens
161+
that the `String` module matches that structure, which is why we could
162+
directly pass `String` as a parameter to `Set.Make`. Incidentally, many
163+
other modules also have that structure, including `Int` and `Float`,
164+
and so they too can be directly passed into `Set.Make` to construct a
165+
set of integers, or a set of floating point numbers.
166+
167+
For our use case, we still want our elements to be of type string, but
168+
we want to change the comparison function to ignore the case of the
169+
strings. We can accomplish this by directly passing in a literal struct
170+
to the `Set.Make` function:
52171

53172
```ocamltop
54-
let s2 = SS.filter (fun str -> String.length str <= 5) s;;
173+
module CISS = Set.Make(struct
174+
type t = string
175+
let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b)
176+
end);;
55177
```
56-
If we want to check and see if an element is in the set it might look
57-
like this.
178+
179+
We name the resulting module CISS (short for "Case Insensitive String Set").
180+
We can now test whether this module has the desired behavior:
181+
58182

59183
```ocamltop
60-
SS.mem "hello" s2;;
184+
let firstSet = CISS.singleton "hello";;
185+
let secondSet = CISS.add "HELLO" firstSet;;
186+
CISS.cardinal firstSet;;
187+
CISS.cardinal secondSet;;
61188
```
62189

63-
The Set module also provides the set theoretic operations union,
64-
intersection and difference. For example, the difference of the original
65-
set and the set with short strings (≤ 5 characters) is the set of long
66-
strings:
190+
Success! `secondSet` has a cardinality of 1, showing that `"hello"`
191+
and `"HELLO"` are now considered to be the same element in this set.
192+
We now have a set of strings whose compare function performs a case
193+
insensitive comparison.
194+
195+
Note that this technique can also be used to allow arbitrary types
196+
to be used as the element type for set, as long as you can define a
197+
meaningful compare operation:
67198

68199
```ocamltop
69-
print_set (SS.diff s s2);;
70-
```
71-
Note that the Set module provides a purely functional data structure:
72-
removing an element from a set does not alter that set but, rather,
73-
returns a new set that is very similar to (and shares much of its
74-
internals with) the original set.
200+
type color = Red | Green | Blue;;
75201
202+
module SC = Set.Make(struct
203+
type t = color
204+
let compare a b =
205+
match (a, b) with
206+
| (Red, Red) -> 0
207+
| (Red, Green) -> 1
208+
| (Red, Blue) -> 1
209+
| (Green, Red) -> -1
210+
| (Green, Green) -> 0
211+
| (Green, Blue) -> 1
212+
| (Blue, Red) -> -1
213+
| (Blue, Green) -> -1
214+
| (Blue, Blue) -> 0
215+
end);;
216+
```
76217

0 commit comments

Comments
 (0)