Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sets: add TreeSet type for orderable data #33

Merged
merged 1 commit into from
Apr 3, 2023
Merged

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Mar 22, 2023

This PR adds TreeSet to the implemented set types.

A TreeSet has advantages in cases where the data being stored can be represented in some kind of order. Order-specific operations such as finding the minimum value of a set, finding the TopK or BottomK elements of a set, or iterating the elements of a set in order are dramatically less expensive when stored via binary search tree. A TreeSet[T] determines order between its elements via Compare[T], where the implementation of Compare is provided by the user. The Cmp[BuiltIn] helper provides a convenience implementation of Compare for built-in types.

The TreeSet type is backed by an underlying Red-Black Binary Search Tree[0], a well established data structure found in major projects such as the Linux Kernel and Java Virtual Machine.

Benchmarks are provided below for comparing some operations across the different set types, including plain slices. The following table describes the expected asymptotic performance for some operations of each type.

slice Set HashSet TreeSet
Insert (avg) O(1) O(1) O(1) O(log(n))
Insert (worst) O(n) O(n) O(n) O(log(n))
Delete (avg) O(n) O(1) O(1) O(log(n))
Delete (worst) O(n) O(n) O(n) O(log(n))
Contains O(n) O(1) O(1) O(log(n))
Min O(n) O(n) O(n) O(log(n))
TopK O(n * log(n)) O(n * log(n)) O(n * log(n)) O(log(n))
Iterate O(n) O(n) O(n) O(n)
Iterate (in order) O(n * log(n)) O(n * log(n)) O(n * log(n)) O(n)

[0] https://en.wikipedia.org/wiki/Red–black_tree

Benchmark_Slice_Insert/10-12                   283483903                3.538 ns/op
Benchmark_Slice_Insert/1000-12                 554723365                3.249 ns/op
Benchmark_Slice_Insert/100000-12               626779956                2.414 ns/op
Benchmark_Slice_Insert/1000000-12              615125025                3.122 ns/op

Benchmark_Set_Insert/10-12                     19020508               117.2 ns/op
Benchmark_Set_Insert/1000-12                   19318110               117.0 ns/op
Benchmark_Set_Insert/100000-12                 18946682               116.8 ns/op
Benchmark_Set_Insert/1000000-12                13727908               125.4 ns/op

Benchmark_HashSet_Insert/10-12                 14467316               139.3 ns/op
Benchmark_HashSet_Insert/1000-12               15396433               140.8 ns/op
Benchmark_HashSet_Insert/100000-12             14508603               138.8 ns/op
Benchmark_HashSet_Insert/1000000-12            10585650               119.0 ns/op

Benchmark_TreeSet_Insert/10-12                 10633179               120.7 ns/op
Benchmark_TreeSet_Insert/1000-12               10658127               122.1 ns/op
Benchmark_TreeSet_Insert/100000-12             10504867               126.2 ns/op
Benchmark_TreeSet_Insert/1000000-12            10902920               125.9 ns/op

Benchmark_Slice_Minimum/10-12                  12310653                96.23 ns/op
Benchmark_Slice_Minimum/1000-12                  144500              8181 ns/op
Benchmark_Slice_Minimum/100000-12                  1434            819509 ns/op
Benchmark_Slice_Minimum/1000000-12                  134           8655484 ns/op

Benchmark_Set_Minimum/10-12                     5023794               235.9 ns/op
Benchmark_Set_Minimum/1000-12                     18967             63799 ns/op
Benchmark_Set_Minimum/100000-12                     124           9530887 ns/op
Benchmark_Set_Minimum/1000000-12                      9         112611542 ns/op

Benchmark_HashSet_Minimum/10-12                 4418487               269.0 ns/op
Benchmark_HashSet_Minimum/1000-12                 19364             61636 ns/op
Benchmark_HashSet_Minimum/100000-12                 128           9290069 ns/op
Benchmark_HashSet_Minimum/1000000-12                 10         110000279 ns/op

Benchmark_TreeSet_Minimum/10-12                591382812                2.018 ns/op
Benchmark_TreeSet_Minimum/1000-12              298672525                4.021 ns/op
Benchmark_TreeSet_Minimum/100000-12            208646332                5.738 ns/op
Benchmark_TreeSet_Minimum/1000000-12           181639167                6.598 ns/op

Benchmark_Slice_Contains/10-12                 246240872                4.866 ns/op
Benchmark_Slice_Contains/1000-12                3896030               305.8 ns/op
Benchmark_Slice_Contains/100000-12                41647             28732 ns/op
Benchmark_Slice_Contains/1000000-12                4068            288287 ns/op

Benchmark_Set_Contains/10-12                   190191267                6.304 ns/op
Benchmark_Set_Contains/1000-12                 183063370                6.526 ns/op
Benchmark_Set_Contains/100000-12               127484904                9.357 ns/op
Benchmark_Set_Contains/1000000-12              82559810                12.68 ns/op

Benchmark_HashSetContains/10-12                173321290                6.877 ns/op
Benchmark_HashSetContains/1000-12              166393195                7.205 ns/op
Benchmark_HashSetContains/100000-12            120025850                9.974 ns/op
Benchmark_HashSetContains/1000000-12           40584700                26.22 ns/op

Benchmark_TreeSetContains/10-12                167040844                7.163 ns/op
Benchmark_TreeSetContains/1000-12              51727203                22.68 ns/op
Benchmark_TreeSetContains/100000-12            42277028                27.81 ns/op
Benchmark_TreeSetContains/1000000-12           29599938                40.17 ns/op

This PR adds TreeSet to the implemented set types.

A TreeSet has advantages in cases where the data being stored can be
represented in some kind of order. Order-specific operations such as finding
the minimum value of a set, finding the TopK or BottomK elements of a set,
or iterating the elements of a set in order are dramatically less expensive
when stored via binary search tree. A TreeSet[T] determines order between
its elements via Compare[T], where the implementation of Compare is provided
by the user. The Cmp[BuiltIn] helper provides a convenience implementation
of Compare for built-in types.

The TreeSet type is backed by an underlying Red-Black Binary Search Tree[0],
a well established data structure found in major projects such as the
Linux Kernel and Java Virtual Machine.

Benchmarks are provided below for comparing some operations across the
different set types, including plain slices. The following table describes
the expected asymptotic performance for some operations of each type.

|                    | slice         | Set           | HashSet       | TreeSet   |
| :--                | :--           | :--           | :--           | :--       |
| Insert (avg)       | O(1)          | O(1)          | O(1)          | O(log(n)) |
| Insert (worst)     | O(n)          | O(n)          | O(n)          | O(log(n)) |
| Delete (avg)       | O(n)          | O(1)          | O(1)          | O(log(n)) |
| Delete (worst)     | O(n)          | O(n)          | O(n)          | O(log(n)) |
| Contains           | O(n)          | O(1)          | O(1)          | O(log(n)) |
| Min                | O(n)          | O(n)          | O(n)          | O(log(n)) |
| TopK               | O(n * log(n)) | O(n * log(n)) | O(n * log(n)) | O(log(n)) |
| Iterate            | O(n)          | O(n)          | O(n)          | O(n)      |
| Iterate (in order) | O(n * log(n)) | O(n * log(n)) | O(n * log(n)) | O(n)      |

[0] https://en.wikipedia.org/wiki/Red–black_tree

Benchmark_Slice_Insert/10-12                   283483903                3.538 ns/op
Benchmark_Slice_Insert/1000-12                 554723365                3.249 ns/op
Benchmark_Slice_Insert/100000-12               626779956                2.414 ns/op
Benchmark_Slice_Insert/1000000-12              615125025                3.122 ns/op

Benchmark_Set_Insert/10-12                     19020508               117.2 ns/op
Benchmark_Set_Insert/1000-12                   19318110               117.0 ns/op
Benchmark_Set_Insert/100000-12                 18946682               116.8 ns/op
Benchmark_Set_Insert/1000000-12                13727908               125.4 ns/op

Benchmark_HashSet_Insert/10-12                 14467316               139.3 ns/op
Benchmark_HashSet_Insert/1000-12               15396433               140.8 ns/op
Benchmark_HashSet_Insert/100000-12             14508603               138.8 ns/op
Benchmark_HashSet_Insert/1000000-12            10585650               119.0 ns/op

Benchmark_TreeSet_Insert/10-12                 10633179               120.7 ns/op
Benchmark_TreeSet_Insert/1000-12               10658127               122.1 ns/op
Benchmark_TreeSet_Insert/100000-12             10504867               126.2 ns/op
Benchmark_TreeSet_Insert/1000000-12            10902920               125.9 ns/op

Benchmark_Slice_Minimum/10-12                  12310653                96.23 ns/op
Benchmark_Slice_Minimum/1000-12                  144500              8181 ns/op
Benchmark_Slice_Minimum/100000-12                  1434            819509 ns/op
Benchmark_Slice_Minimum/1000000-12                  134           8655484 ns/op

Benchmark_Set_Minimum/10-12                     5023794               235.9 ns/op
Benchmark_Set_Minimum/1000-12                     18967             63799 ns/op
Benchmark_Set_Minimum/100000-12                     124           9530887 ns/op
Benchmark_Set_Minimum/1000000-12                      9         112611542 ns/op

Benchmark_HashSet_Minimum/10-12                 4418487               269.0 ns/op
Benchmark_HashSet_Minimum/1000-12                 19364             61636 ns/op
Benchmark_HashSet_Minimum/100000-12                 128           9290069 ns/op
Benchmark_HashSet_Minimum/1000000-12                 10         110000279 ns/op

Benchmark_TreeSet_Minimum/10-12                591382812                2.018 ns/op
Benchmark_TreeSet_Minimum/1000-12              298672525                4.021 ns/op
Benchmark_TreeSet_Minimum/100000-12            208646332                5.738 ns/op
Benchmark_TreeSet_Minimum/1000000-12           181639167                6.598 ns/op

Benchmark_Slice_Contains/10-12                 246240872                4.866 ns/op
Benchmark_Slice_Contains/1000-12                3896030               305.8 ns/op
Benchmark_Slice_Contains/100000-12                41647             28732 ns/op
Benchmark_Slice_Contains/1000000-12                4068            288287 ns/op

Benchmark_Set_Contains/10-12                   190191267                6.304 ns/op
Benchmark_Set_Contains/1000-12                 183063370                6.526 ns/op
Benchmark_Set_Contains/100000-12               127484904                9.357 ns/op
Benchmark_Set_Contains/1000000-12              82559810                12.68 ns/op

Benchmark_HashSetContains/10-12                173321290                6.877 ns/op
Benchmark_HashSetContains/1000-12              166393195                7.205 ns/op
Benchmark_HashSetContains/100000-12            120025850                9.974 ns/op
Benchmark_HashSetContains/1000000-12           40584700                26.22 ns/op

Benchmark_TreeSetContains/10-12                167040844                7.163 ns/op
Benchmark_TreeSetContains/1000-12              51727203                22.68 ns/op
Benchmark_TreeSetContains/100000-12            42277028                27.81 ns/op
Benchmark_TreeSetContains/1000000-12           29599938                40.17 ns/op
@shoenig shoenig merged commit 579403b into main Apr 3, 2023
@shoenig shoenig deleted the red-black-tree-set branch April 15, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant