Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrote set tutorial from scratch. #1596

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 217 additions & 39 deletions site/learn/tutorials/set.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,74 +3,252 @@

# Set

## Module Set
To make a set of strings:
`Set` is a functor, which means that it is a module that is parameterized
Copy link
Member

@Octachron Octachron Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tutorial needs some introduction: it is reachable from the Data structures section of the tutorials. Thus when landing on the page the reader is probably expecting to learn about Set as a data structure. Starting with "Set is a functor" is thus doubly confusing: "What is a functor? Wasn't I promised a tutorial about a data structure?"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the right thing then is to split this into two tutorials, one about sets-as-functors and another about how to use sets for their own sake?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction should start by explaining what is a set from OCaml point of view : a collection of ordered elements. Then it would make sense to the reader that the Set.Make functor is building a set module from a module of ordered elements.

by another module. More concretely, this means you cannot directly create
a set; instead, you must first specify what type of elements your set will
contain.

The `Set` functor provides a function `Make` which accepts a module as a
parameter, and returns a new module representing a set whose elements have
the type that you passed in. For example, if you want to work with sets of
strings, you can invoke `Set.Make(String)` which will return you a new module
which you can assign the name `SS` (short for "String Set"). Note: Be sure to
pay attention to the case; you need to type `Set.Make(String)` and not
`Set.Make(string)`. The reason behind this is explained in the
"Technical Details" section at the bottom.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delaying a simple explanation "String is a module" (?) to a technical section seems unnecessary?


Doing this in the OCaml's top level will yield a lot of output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not displaying all the output, you may want to add ellipses here or some such.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment.

The original MD file just contains the one line you'd type into OCaml's top level:

   To make a set of strings:
   
   ```ocamltop
   module SS = Set.Make(String);;
   ```

But the output on https://ocaml.org/learn/tutorials/set.html contains not just the line you'd enter in, but also the output produced by OCaml's top level (I'm assuming whatever static site generator you're using is responsible for doing this). So my MD is doing something similar: I'm putting just the line that the user would enter in, and I'm expecting the static site generator to produce a huge amount of input.

The "Doing this in the OCaml's top level will yield a lot of output:" line I put there was intended to prepare new users to not panic in response to the huge block of text they're about to see (my reaction to seeing that initial huge block was panic).


```ocamltop
module SS = Set.Make(String);;
```
To create a set you need to start somewhere so here is the empty set:

What happened here is that after assigning your newly created module to the name
`SS`, OCaml's top level then displayed the module, which in this case contains
a large number of convenience functions for working with sets (for example `is_empty`
for checking if you set is empty, `add` to add an element to your set, `remove` to
remove an element from your set, and so on).

Note also that this module defines two types: `type elt = String.t` representing
the type of the elements, and `type t = Set.Make(String).t` representing the type of
the set itself. It's important to note this, because these types are used in the
signatures of many of the functions defined in this module.

For example, the `add` function has the signature `elt -> t -> t`, which means
that it expects an element (a String), and a set of strings, and will return to you
a set of strings. As you gain more experience in OCaml and other function languages,
Copy link
Member

@Octachron Octachron Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function languages ⇒ functional languages

I am not that fond of meta comments that discourse about the expected proficiency of the reader.

the type signature of functions are often the most convenient form of documentation
on how to use those functions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skipped that part before, but no types are not the most convenient form of documentation. It is the most basic form of documentation.


## Creating a Set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the section doesn't match the content. This section is describing how to analyze an unknown module, not how to create a set. I would propose to move this section later, and simply start by creating a set in the tutorial.


You've created your module representing a set of strings, but now you actually want
to create an instance of a set of strings. So how do we go about doing this? Well, you
could search through the documentation for the original `Set` functor to try and
find what function or value you should use to do this, but this is an excellent
opportunity to practice reading the type signatures and inferring the answer from them.

You want to create a new set (as opposed to modifying an existing set). So you should
look for functions whose return result has type `t` (the type representing the set),
and which *does not* require a parameter of type `t`.

Skimming through the list of functions in the module, there's only a handful of functions
that match that criteria: `empty: t`, `singleton : elt -> t`, `of_list : elt list -> t`
and `of_seq : elt Seq.t -> t`.

Perhaps you already know how to work with lists and sequences in OCaml or
perhaps you don't. For now, let's assume you don't know, and so we'll focus
our attention on the first two functions in that list: `empty` and `singleton`.

The type signature for `empty` says that it simply returns `t`, i.e. an instance
of our set, without requiring any parameters at all. By intuition, you might
guess that the only reasonable set that a library function could return when
given zero parameters is the empty set. And the fact that the function is named
`empty` reinforces this theory.

Is there a way to test this theory? Perhaps if we had a function which
could print out the size of a set, then we could check if the set we get
from `empty` has a size of zero. In other words, we want a function which
receives a set as a parameter, and returns an integer as a result. Again,
skimming through the list of functions in the module, we see there is a
function which matches this signature: `cardinal : t -> int`. If you're
not familiar with the word "cardinal", you can look it up on Wikipedia
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that sending the reader to a Wikipedia page is a good idea in general: either trust the reader to know the term or explain it. This particularly true for the mathematical section of Wikipedia which tends to be quite technical. Typically, the cardinal page of wikipedia cites the axiom of choice in its introduction.

and notice that it basically refers to the size of sets, so this reinforces
the idea that this is exactly the function we want.

So let's test our hypothesis:

```ocamltop
let s = SS.empty;;
SS.cardinal s;;
```
Alternatively if we know an element to start with we can create a set
like

Excellent, it looks like `SS.empty` does indeed create an empty set,
and `SS.cardinal` does indeed print out the size of a set.

What about that other function we saw, `singleton : elt -> t`? Again,
using our intuition, if we provide the function with a single element,
and the function returns a set, then probably the function will return
a set containing that element (or else what else would it do with the
parameter we gave it?). The name of the function is `singleton`, and
again if you're unfamiliar with what word, you can look it up on
Wikipedia and see that the word means "a set with exactly one element".
It sounds like we're on the right track again. Let's test our theory.

```ocamltop
let s = SS.singleton "hello";;
SS.cardinal s;;
```
To add some elements to the set we can do.

```ocamltop
let s =
List.fold_right SS.add ["hello"; "world"; "community"; "manager";
"stuff"; "blue"; "green"] s;;
```
Now if we are playing around with sets we will probably want to see what
is in the set that we have created. To do this we can write a function
that will print the set out.
It looks like we were right again!

## Working with Sets

Now let's say we want to build bigger and more complex sets. Specifically,
let's say we want to add another element to our existing set. So we're
looking for a function with two parameters: One of the parameters should
be the element we wish to add, and the other parameter should be the set
that we're adding to. For the return value, we would expect it to either
return unit (if the function modifies the set in place), or it returns a
new set representing the result of adding the new element. So we're
looking for signatures that look something like `elt -> t -> unit` or
`t -> elt -> unit` (since we don't know what order the two parameters
should appear in), or `elt -> t -> t` or `t -> elt -> t`.

Skimming through the list, we see 2 functions with matching signatures:
`add : elt -> t -> t` and `remove : elt -> t -> t`. Based on their names,
`add` is probably the function we're looking for. `remove` probably removes
an element from a set, and using our intuition again, it does seem like
the type signature makes sense: To remove an element from a set, you need
to tell it what set you want to perform the removal on and what element
you want to remove; and the return result will be the resulting set after
the removal.

Furthermore, because we see that these functions return `t` and not `unit`,
we can infer that these functions do not modify the set in place, but
instead return a new set. Again, we can test this theory:

```ocamltop
(* Prints a new line "\n" after each string is printed *)
let print_set s =
SS.iter print_endline s;;
let firstSet = SS.singleton "hello";;
let secondSet = SS.add "world" firstSet;;
SS.cardinal firstSet;;
SS.cardinal secondSet;;
```
If we want to remove a specific element of a set there is a remove
function. However if we want to remove several elements at once we could
think of it as doing a 'filter'. Let's filter out all words that are
longer than 5 characters.

This can be written as:
It looks like our theories were correct!

## Sets of With Custom Comparators
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sets with custom comparators?


The `SS` module we created uses the built-in comparison function provided
by the `String` module, which performs a case-sensitive comparison. We
can test that with the following code:

```ocamltop
let my_filter str =
String.length str <= 5;;
let s2 = SS.filter my_filter s;;
let firstSet = SS.singleton "hello";;
let secondSet = SS.add "HELLO" firstSet;;
SS.cardinal firstSet;;
SS.cardinal secondSet;;
```
or using an anonymous function:

As we can see, the `secondSet` has a cardinality of 2, indicating that
`"hello"` and `"HELLO"` are considered two distinct elements.

Let's say we want to create a set which performs a case-insensitive
comparison instead. To do this, we simply have to change the parameter
that we pass to the `Set.Make` function.

The `Set.Make` function expects a struct with two fields: a type `t`
that represents the type of the element, and a function `compare`
whose signature is `t -> t -> int` and essentially returns 0 if two
values are equal, and non-zero if they are non-equal. It just so happens
that the `String` module matches that structure, which is why we could
directly pass `String` as a parameter to `Set.Make`. Incidentally, many
other modules also have that structure, including `Int` and `Float`,
and so they too can be directly passed into `Set.Make` to construct a
set of integers, or a set of floating point numbers.

For our use case, we still want our elements to be of type string, but
we want to change the comparison function to ignore the case of the
strings. We can accomplish this by directly passing in a literal struct
to the `Set.Make` function:

```ocamltop
let s2 = SS.filter (fun str -> String.length str <= 5) s;;
module CISS = Set.Make(struct
type t = string
let compare a b = compare (String.lowercase_ascii a) (String.lowercase_ascii b)
end);;
```
If we want to check and see if an element is in the set it might look
like this.

We name the resulting module CISS (short for "Case Insensitive String Set").
We can now test whether this module has the desired behavior:


```ocamltop
SS.mem "hello" s2;;
let firstSet = CISS.singleton "hello";;
let secondSet = CISS.add "HELLO" firstSet;;
CISS.cardinal firstSet;;
CISS.cardinal secondSet;;
```

The Set module also provides the set theoretic operations union,
intersection and difference. For example, the difference of the original
set and the set with short strings (≤ 5 characters) is the set of long
strings:
Success! `secondSet` has a cardinality of 1, showing that `"hello"`
and `"HELLO"` are now considered to be the same element in this set.
We now have a set of strings whose compare function performs a case
insensitive comparison.

Note that this technique can also be used to allow arbitrary types
to be used as the element type for set, as long as you can define a
meaningful compare operation:

```ocamltop
print_set (SS.diff s s2);;
type color = Red | Green | Blue;;

module SC = Set.Make(struct
type t = color
let compare a b =
match (a, b) with
| (Red, Red) -> 0
| (Red, Green) -> 1
| (Red, Blue) -> 1
| (Green, Red) -> -1
| (Green, Green) -> 0
| (Green, Blue) -> 1
| (Blue, Red) -> -1
| (Blue, Green) -> -1
| (Blue, Blue) -> 0
end);;
```
Note that the Set module provides a purely functional data structure:
removing an element from a set does not alter that set but, rather,
returns a new set that is very similar to (and shares much of its
internals with) the original set.

## Technical Details

### Set.Make, types and modules

As mentioned in a previous section, the `Set.Make` function accepts a structure
with two specific fields, `t` and `compare`. Modules have structure, and thus
it's possible (but not guaranteed) for a module to have the structure that
`Set.Make` expects. On the other hand, types do not have structure, and so you
can never pass a type to the `Set.Make` function. In OCaml, modules start with
an upper case letter and types start with a lower case letter. This is why
when creating a set of strings, you have to use `Set.Make(String)` (passing in
the module named `String`), and not `Set.Make(string)` (which would be attempting
to pass in the type named `string`, which will not work).

### Purely Functional Data Structures

The data structure implemented by the Set functor is a purely functional one.
What exactly that means is a big topic in itself (feel free to search for
"Purely Functional Data Structure" in Google or Wikipedia to learn more). As a
short oversimplification, this means that all instances of the data structure
that you create are immutable. The functions like `add` and `remove` do not
actually modify the set you pass in, but instead return a new set representing
the results of having performed the corresponding operation.

### Full API documentation

This tutorial focused on teaching how to quickly find a function that does what
you want by looking at the type signature. This is often the quickest and most
convenient way to discover useful functions. However, sometimes you do want to
see the formal documentation for the API provided by a module. For sets, the
API documentation you probably want to look at is at
https://ocaml.org/api/Set.Make.html