-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote set tutorial from scratch. #1596
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,74 +3,252 @@ | |
|
||
# Set | ||
|
||
## Module Set | ||
To make a set of strings: | ||
`Set` is a functor, which means that it is a module that is parameterized | ||
by another module. More concretely, this means you cannot directly create | ||
a set; instead, you must first specify what type of elements your set will | ||
contain. | ||
|
||
The `Set` functor provides a function `Make` which accepts a module as a | ||
parameter, and returns a new module representing a set whose elements have | ||
the type that you passed in. For example, if you want to work with sets of | ||
strings, you can invoke `Set.Make(String)` which will return you a new module | ||
which you can assign the name `SS` (short for "String Set"). Note: Be sure to | ||
pay attention to the case; you need to type `Set.Make(String)` and not | ||
`Set.Make(string)`. The reason behind this is explained in the | ||
"Technical Details" section at the bottom. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Delaying a simple explanation " |
||
|
||
Doing this in the OCaml's top level will yield a lot of output: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're not displaying all the output, you may want to add ellipses here or some such. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I understand this comment. The original MD file just contains the one line you'd type into OCaml's top level:
But the output on https://ocaml.org/learn/tutorials/set.html contains not just the line you'd enter in, but also the output produced by OCaml's top level (I'm assuming whatever static site generator you're using is responsible for doing this). So my MD is doing something similar: I'm putting just the line that the user would enter in, and I'm expecting the static site generator to produce a huge amount of input. The "Doing this in the OCaml's top level will yield a lot of output:" line I put there was intended to prepare new users to not panic in response to the huge block of text they're about to see (my reaction to seeing that initial huge block was panic). |
||
|
||
```ocamltop | ||
module SS = Set.Make(String);; | ||
``` | ||
To create a set you need to start somewhere so here is the empty set: | ||
|
||
What happened here is that after assigning your newly created module to the name | ||
`SS`, OCaml's top level then displayed the module, which in this case contains | ||
a large number of convenience functions for working with sets (for example `is_empty` | ||
for checking if you set is empty, `add` to add an element to your set, `remove` to | ||
remove an element from your set, and so on). | ||
|
||
Note also that this module defines two types: `type elt = String.t` representing | ||
the type of the elements, and `type t = Set.Make(String).t` representing the type of | ||
the set itself. It's important to note this, because these types are used in the | ||
signatures of many of the functions defined in this module. | ||
|
||
For example, the `add` function has the signature `elt -> t -> t`, which means | ||
that it expects an element (a String), and a set of strings, and will return to you | ||
a set of strings. As you gain more experience in OCaml and other function languages, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. function languages ⇒ functional languages I am not that fond of meta comments that discourse about the expected proficiency of the reader. |
||
the type signature of functions are often the most convenient form of documentation | ||
on how to use those functions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I skipped that part before, but no |
||
|
||
## Creating a Set | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The name of the section doesn't match the content. This section is describing how to analyze an unknown module, not how to create a set. I would propose to move this section later, and simply start by creating a set in the tutorial. |
||
|
||
You've created your module representing a set of strings, but now you actually want | ||
to create an instance of a set of strings. So how do we go about doing this? Well, you | ||
could search through the documentation for the original `Set` functor to try and | ||
find what function or value you should use to do this, but this is an excellent | ||
opportunity to practice reading the type signatures and inferring the answer from them. | ||
|
||
You want to create a new set (as opposed to modifying an existing set). So you should | ||
look for functions whose return result has type `t` (the type representing the set), | ||
and which *does not* require a parameter of type `t`. | ||
|
||
Skimming through the list of functions in the module, there's only a handful of functions | ||
that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t` | ||
and `of_seq : elt Seq.t -> t`. | ||
|
||
Perhaps you already know how to work with lists and sequences in OCaml or | ||
perhaps you don't. For now, let's assume you don't know, and so we'll focus | ||
our attention on the first two functions in that list: `empty` and `singleton`. | ||
|
||
The type signature for `empty` says that it simply returns `t`, i.e. an instance | ||
of our set, without requiring any parameters at all. By intuition, you might | ||
guess that the only reasonable set that a library function could return when | ||
given zero parameters is the empty set. And the fact that the function is named | ||
`empty` reinforces this theory. | ||
|
||
Is there a way to test this theory? Perhaps if we had a function which | ||
could print out the size of a set, then we could check if the set we get | ||
from `empty` has a size of zero. In other words, we want a function which | ||
receives a set as a parameter, and returns an integer as a result. Again, | ||
skimming through the list of functions in the module, we see there is a | ||
function which matches this signature: `cardinal : t -> int`. If you're | ||
not familiar with the word "cardinal", you can look it up on Wikipedia | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that sending the reader to a Wikipedia page is a good idea in general: either trust the reader to know the term or explain it. This particularly true for the mathematical section of Wikipedia which tends to be quite technical. Typically, the cardinal page of wikipedia cites the axiom of choice in its introduction. |
||
and notice that it basically refers to the size of sets, so this reinforces | ||
the idea that this is exactly the function we want. | ||
|
||
So let's test our hypothesis: | ||
|
||
```ocamltop | ||
let s = SS.empty;; | ||
SS.cardinal s;; | ||
``` | ||
Alternatively if we know an element to start with we can create a set | ||
like | ||
|
||
Excellent, it looks like `SS.empty` does indeed create an empty set, | ||
and `SS.cardinal` does indeed print out the size of a set. | ||
|
||
What about that other function we saw, `singleton : elt -> t`? Again, | ||
using our intuition, if we provide the function with a single element, | ||
and the function returns a set, then probably the function will return | ||
a set containing that element (or else what else would it do with the | ||
parameter we gave it?). The name of the function is `singleton`, and | ||
again if you're unfamiliar with what word, you can look it up on | ||
Wikipedia and see that the word means "a set with exactly one element". | ||
It sounds like we're on the right track again. Let's test our theory. | ||
|
||
```ocamltop | ||
let s = SS.singleton "hello";; | ||
SS.cardinal s;; | ||
``` | ||
To add some elements to the set we can do. | ||
|
||
```ocamltop | ||
let s = | ||
List.fold_right SS.add ["hello"; "world"; "community"; "manager"; | ||
"stuff"; "blue"; "green"] s;; | ||
``` | ||
Now if we are playing around with sets we will probably want to see what | ||
is in the set that we have created. To do this we can write a function | ||
that will print the set out. | ||
It looks like we were right again! | ||
|
||
## Working with Sets | ||
|
||
Now let's say we want to build bigger and more complex sets. Specifically, | ||
let's say we want to add another element to our existing set. So we're | ||
looking for a function with two parameters: One of the parameters should | ||
be the element we wish to add, and the other parameter should be the set | ||
that we're adding to. For the return value, we would expect it to either | ||
return unit (if the function modifies the set in place), or it returns a | ||
new set representing the result of adding the new element. So we're | ||
looking for signatures that look something like `elt -> t -> unit` or | ||
`t -> elt -> unit` (since we don't know what order the two parameters | ||
should appear in), or `elt -> t -> t` or `t -> elt -> t`. | ||
|
||
Skimming through the list, we see 2 functions with matching signatures: | ||
`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names, | ||
`add` is probably the function we're looking for. `remove` probably removes | ||
an element from a set, and using our intuition again, it does seem like | ||
the type signature makes sense: To remove an element from a set, you need | ||
to tell it what set you want to perform the removal on and what element | ||
you want to remove; and the return result will be the resulting set after | ||
the removal. | ||
|
||
Furthermore, because we see that these functions return `t` and not `unit`, | ||
we can infer that these functions do not modify the set in place, but | ||
instead return a new set. Again, we can test this theory: | ||
|
||
```ocamltop | ||
(* Prints a new line "\n" after each string is printed *) | ||
let print_set s = | ||
SS.iter print_endline s;; | ||
let firstSet = SS.singleton "hello";; | ||
let secondSet = SS.add "world" firstSet;; | ||
SS.cardinal firstSet;; | ||
SS.cardinal secondSet;; | ||
``` | ||
If we want to remove a specific element of a set there is a remove | ||
function. However if we want to remove several elements at once we could | ||
think of it as doing a 'filter'. Let's filter out all words that are | ||
longer than 5 characters. | ||
|
||
This can be written as: | ||
It looks like our theories were correct! | ||
|
||
## Sets of With Custom Comparators | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sets with custom comparators? |
||
|
||
The `SS` module we created uses the built-in comparison function provided | ||
by the `String` module, which performs a case-sensitive comparison. We | ||
can test that with the following code: | ||
|
||
```ocamltop | ||
let my_filter str = | ||
String.length str <= 5;; | ||
let s2 = SS.filter my_filter s;; | ||
let firstSet = SS.singleton "hello";; | ||
let secondSet = SS.add "HELLO" firstSet;; | ||
SS.cardinal firstSet;; | ||
SS.cardinal secondSet;; | ||
``` | ||
or using an anonymous function: | ||
|
||
As we can see, the `secondSet` has a cardinality of 2, indicating that | ||
`"hello"` and `"HELLO"` are considered two distinct elements. | ||
|
||
Let's say we want to create a set which performs a case-insensitive | ||
comparison instead. To do this, we simply have to change the parameter | ||
that we pass to the `Set.Make` function. | ||
|
||
The `Set.Make` function expects a struct with two fields: a type `t` | ||
that represents the type of the element, and a function `compare` | ||
whose signature is `t -> t -> int` and essentially returns 0 if two | ||
values are equal, and non-zero if they are non-equal. It just so happens | ||
that the `String` module matches that structure, which is why we could | ||
directly pass `String` as a parameter to `Set.Make`. Incidentally, many | ||
other modules also have that structure, including `Int` and `Float`, | ||
and so they too can be directly passed into `Set.Make` to construct a | ||
set of integers, or a set of floating point numbers. | ||
|
||
For our use case, we still want our elements to be of type string, but | ||
we want to change the comparison function to ignore the case of the | ||
strings. We can accomplish this by directly passing in a literal struct | ||
to the `Set.Make` function: | ||
|
||
```ocamltop | ||
let s2 = SS.filter (fun str -> String.length str <= 5) s;; | ||
module CISS = Set.Make(struct | ||
type t = string | ||
let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b) | ||
end);; | ||
``` | ||
If we want to check and see if an element is in the set it might look | ||
like this. | ||
|
||
We name the resulting module CISS (short for "Case Insensitive String Set"). | ||
We can now test whether this module has the desired behavior: | ||
|
||
|
||
```ocamltop | ||
SS.mem "hello" s2;; | ||
let firstSet = CISS.singleton "hello";; | ||
let secondSet = CISS.add "HELLO" firstSet;; | ||
CISS.cardinal firstSet;; | ||
CISS.cardinal secondSet;; | ||
``` | ||
|
||
The Set module also provides the set theoretic operations union, | ||
intersection and difference. For example, the difference of the original | ||
set and the set with short strings (≤ 5 characters) is the set of long | ||
strings: | ||
Success! `secondSet` has a cardinality of 1, showing that `"hello"` | ||
and `"HELLO"` are now considered to be the same element in this set. | ||
We now have a set of strings whose compare function performs a case | ||
insensitive comparison. | ||
|
||
Note that this technique can also be used to allow arbitrary types | ||
to be used as the element type for set, as long as you can define a | ||
meaningful compare operation: | ||
|
||
```ocamltop | ||
print_set (SS.diff s s2);; | ||
type color = Red | Green | Blue;; | ||
|
||
module SC = Set.Make(struct | ||
type t = color | ||
let compare a b = | ||
match (a, b) with | ||
| (Red, Red) -> 0 | ||
| (Red, Green) -> 1 | ||
| (Red, Blue) -> 1 | ||
| (Green, Red) -> -1 | ||
| (Green, Green) -> 0 | ||
| (Green, Blue) -> 1 | ||
| (Blue, Red) -> -1 | ||
| (Blue, Green) -> -1 | ||
| (Blue, Blue) -> 0 | ||
end);; | ||
``` | ||
Note that the Set module provides a purely functional data structure: | ||
removing an element from a set does not alter that set but, rather, | ||
returns a new set that is very similar to (and shares much of its | ||
internals with) the original set. | ||
|
||
## Technical Details | ||
|
||
### Set.Make, types and modules | ||
|
||
As mentioned in a previous section, the `Set.Make` function accepts a structure | ||
with two specific fields, `t` and `compare`. Modules have structure, and thus | ||
it's possible (but not guaranteed) for a module to have the structure that | ||
`Set.Make` expects. On the other hand, types do not have structure, and so you | ||
can never pass a type to the `Set.Make` function. In OCaml, modules start with | ||
an upper case letter and types start with a lower case letter. This is why | ||
when creating a set of strings, you have to use `Set.Make(String)` (passing in | ||
the module named `String`), and not `Set.Make(string)` (which would be attempting | ||
to pass in the type named `string`, which will not work). | ||
|
||
### Purely Functional Data Structures | ||
|
||
The data structure implemented by the Set functor is a purely functional one. | ||
What exactly that means is a big topic in itself (feel free to search for | ||
"Purely Functional Data Structure" in Google or Wikipedia to learn more). As a | ||
short oversimplification, this means that all instances of the data structure | ||
that you create are immutable. The functions like `add` and `remove` do not | ||
actually modify the set you pass in, but instead return a new set representing | ||
the results of having performed the corresponding operation. | ||
|
||
### Full API documentation | ||
|
||
This tutorial focused on teaching how to quickly find a function that does what | ||
you want by looking at the type signature. This is often the quickest and most | ||
convenient way to discover useful functions. However, sometimes you do want to | ||
see the formal documentation for the API provided by a module. For sets, the | ||
API documentation you probably want to look at is at | ||
https://ocaml.org/api/Set.Make.html | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tutorial needs some introduction: it is reachable from the
Data structures
section of the tutorials. Thus when landing on the page the reader is probably expecting to learn aboutSet
as a data structure. Starting with "Set
is a functor" is thus doubly confusing: "What is a functor? Wasn't I promised a tutorial about a data structure?"There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the right thing then is to split this into two tutorials, one about sets-as-functors and another about how to use sets for their own sake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction should start by explaining what is a set from OCaml point of view : a collection of ordered elements. Then it would make sense to the reader that the
Set.Make
functor is building a set module from a module of ordered elements.