-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sets: add TreeSet type for orderable data #33
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR adds TreeSet to the implemented set types. A TreeSet has advantages in cases where the data being stored can be represented in some kind of order. Order-specific operations such as finding the minimum value of a set, finding the TopK or BottomK elements of a set, or iterating the elements of a set in order are dramatically less expensive when stored via binary search tree. A TreeSet[T] determines order between its elements via Compare[T], where the implementation of Compare is provided by the user. The Cmp[BuiltIn] helper provides a convenience implementation of Compare for built-in types. The TreeSet type is backed by an underlying Red-Black Binary Search Tree[0], a well established data structure found in major projects such as the Linux Kernel and Java Virtual Machine. Benchmarks are provided below for comparing some operations across the different set types, including plain slices. The following table describes the expected asymptotic performance for some operations of each type. | | slice | Set | HashSet | TreeSet | | :-- | :-- | :-- | :-- | :-- | | Insert (avg) | O(1) | O(1) | O(1) | O(log(n)) | | Insert (worst) | O(n) | O(n) | O(n) | O(log(n)) | | Delete (avg) | O(n) | O(1) | O(1) | O(log(n)) | | Delete (worst) | O(n) | O(n) | O(n) | O(log(n)) | | Contains | O(n) | O(1) | O(1) | O(log(n)) | | Min | O(n) | O(n) | O(n) | O(log(n)) | | TopK | O(n * log(n)) | O(n * log(n)) | O(n * log(n)) | O(log(n)) | | Iterate | O(n) | O(n) | O(n) | O(n) | | Iterate (in order) | O(n * log(n)) | O(n * log(n)) | O(n * log(n)) | O(n) | [0] https://en.wikipedia.org/wiki/Red–black_tree Benchmark_Slice_Insert/10-12 283483903 3.538 ns/op Benchmark_Slice_Insert/1000-12 554723365 3.249 ns/op Benchmark_Slice_Insert/100000-12 626779956 2.414 ns/op Benchmark_Slice_Insert/1000000-12 615125025 3.122 ns/op Benchmark_Set_Insert/10-12 19020508 117.2 ns/op Benchmark_Set_Insert/1000-12 19318110 117.0 ns/op Benchmark_Set_Insert/100000-12 18946682 116.8 ns/op Benchmark_Set_Insert/1000000-12 13727908 125.4 ns/op Benchmark_HashSet_Insert/10-12 14467316 139.3 ns/op Benchmark_HashSet_Insert/1000-12 15396433 140.8 ns/op Benchmark_HashSet_Insert/100000-12 14508603 138.8 ns/op Benchmark_HashSet_Insert/1000000-12 10585650 119.0 ns/op Benchmark_TreeSet_Insert/10-12 10633179 120.7 ns/op Benchmark_TreeSet_Insert/1000-12 10658127 122.1 ns/op Benchmark_TreeSet_Insert/100000-12 10504867 126.2 ns/op Benchmark_TreeSet_Insert/1000000-12 10902920 125.9 ns/op Benchmark_Slice_Minimum/10-12 12310653 96.23 ns/op Benchmark_Slice_Minimum/1000-12 144500 8181 ns/op Benchmark_Slice_Minimum/100000-12 1434 819509 ns/op Benchmark_Slice_Minimum/1000000-12 134 8655484 ns/op Benchmark_Set_Minimum/10-12 5023794 235.9 ns/op Benchmark_Set_Minimum/1000-12 18967 63799 ns/op Benchmark_Set_Minimum/100000-12 124 9530887 ns/op Benchmark_Set_Minimum/1000000-12 9 112611542 ns/op Benchmark_HashSet_Minimum/10-12 4418487 269.0 ns/op Benchmark_HashSet_Minimum/1000-12 19364 61636 ns/op Benchmark_HashSet_Minimum/100000-12 128 9290069 ns/op Benchmark_HashSet_Minimum/1000000-12 10 110000279 ns/op Benchmark_TreeSet_Minimum/10-12 591382812 2.018 ns/op Benchmark_TreeSet_Minimum/1000-12 298672525 4.021 ns/op Benchmark_TreeSet_Minimum/100000-12 208646332 5.738 ns/op Benchmark_TreeSet_Minimum/1000000-12 181639167 6.598 ns/op Benchmark_Slice_Contains/10-12 246240872 4.866 ns/op Benchmark_Slice_Contains/1000-12 3896030 305.8 ns/op Benchmark_Slice_Contains/100000-12 41647 28732 ns/op Benchmark_Slice_Contains/1000000-12 4068 288287 ns/op Benchmark_Set_Contains/10-12 190191267 6.304 ns/op Benchmark_Set_Contains/1000-12 183063370 6.526 ns/op Benchmark_Set_Contains/100000-12 127484904 9.357 ns/op Benchmark_Set_Contains/1000000-12 82559810 12.68 ns/op Benchmark_HashSetContains/10-12 173321290 6.877 ns/op Benchmark_HashSetContains/1000-12 166393195 7.205 ns/op Benchmark_HashSetContains/100000-12 120025850 9.974 ns/op Benchmark_HashSetContains/1000000-12 40584700 26.22 ns/op Benchmark_TreeSetContains/10-12 167040844 7.163 ns/op Benchmark_TreeSetContains/1000-12 51727203 22.68 ns/op Benchmark_TreeSetContains/100000-12 42277028 27.81 ns/op Benchmark_TreeSetContains/1000000-12 29599938 40.17 ns/op
shoenig
force-pushed
the
red-black-tree-set
branch
from
March 22, 2023 14:18
fb92e24
to
f099f0b
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds
TreeSet
to the implemented set types.A
TreeSet
has advantages in cases where the data being stored can be represented in some kind of order. Order-specific operations such as finding the minimum value of a set, finding theTopK
orBottomK
elements of a set, or iterating the elements of a set in order are dramatically less expensive when stored via binary search tree. ATreeSet[T]
determines order between its elements viaCompare[T]
, where the implementation ofCompare
is provided by the user. TheCmp[BuiltIn]
helper provides a convenience implementation ofCompare
for built-in types.The
TreeSet
type is backed by an underlying Red-Black Binary Search Tree[0], a well established data structure found in major projects such as the Linux Kernel and Java Virtual Machine.Benchmarks are provided below for comparing some operations across the different set types, including plain slices. The following table describes the expected asymptotic performance for some operations of each type.
[0] https://en.wikipedia.org/wiki/Red–black_tree