-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Milestone-3] Serokell: New OrderedSet.mo & OrderedMap fixup #662
base: master
Are you sure you want to change the base?
Conversation
One quick observation. subset and equality seem very expensive: I wonder if that's because the isSubsetHelper still uses an iterator, that allocates. Maybe just use recursion again? Perhaps there's even faster way but I don't know enough about red-black trees to suggest one. If the representation is canonical, maybe you can exploit that. But I'm not sure it is or depends on the order of insertions/deletions. Do you know? |
It's not canonical, so we cannot have just a linear one-to-one comparison. I've experimented with a couple of alternatives I think it works pretty well now. I've managed to use the order of nodes in the tree to optimize it a bit. I have updated the benchmark results in the description, for the comparison of the version before and after the last commit see serokell#37 . |
That's much better indeed! Thank you! |
Also, I've optimized |
src/OrderedMap.mo
Outdated
}; | ||
|
||
/// Test helpers | ||
public module MapDebug { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this module into the test dir, to avoid polluting the library?
Currently a persistent ordered set is implemented by storing () in PersistentOrderedMap, this PR manually specializes the set type.
Namely: + inline color into constructor + optimize folds via direct recursion + filterMap through fold
Store the set size in the root node and make set operations to use the set size instead of calculating black height.
* rename elements() -> vals() * add valsRev() * Hide helper types for iterations into the Internal module
* rename `rbSet` to `set` or `s` * add import `Debug` into examples * fix typos
+ Minor call optimization
Also: * fix code format * fix typo * refactor internal functions a bit
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Co-authored-by: Claudio Russo <claudio@dfinity.org>
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the adjustments!
This is an MR for the 3rd Milestone of the Serokell's grant about improving Motoko's base library.
The main goal of the PR is to introduce a new functional implementation of the set data structure to the' base' library. Also, it brings a few changes to the new functional map that was added in #664 , #654 .
General changes:
PersistentOrderedMap
toOrderedMap
(same for theOrderedSet
)Functional Map changes:
New functionality:
any
/all
functionscontains
functionminEntry
/maxEntry
Optimizations:
size
in the Map, benchmark resultsFixup:
entriesRev()
, removeiter()
NEW functional Set:
The new data structure implements an ordered set interface using Red-Black trees as well as the new functional map from the 1-2 Milestones.
API implemented:
put
,delete
,contains
,fromIter
, etcmap
,mapFilter
,foldLeft
,foldRight
union
,intersect
,diff
,isSubset
,equal
OrderedMap
):min
/max
,all
/some
Maintainance support:
Applied optimizations:
map
/filterMap
throughfoldLeft
foldLeft
OrderedMap
instead of sharing it, benchmark resultsintersect
optimization: use order of output values to build the resulting tree faster, see Optimize set operations: intersect serokell/motoko-base#39isSubset
,equal
optimization: use early exit and use order of subtrees to reduce intermediate tree height, see OrderedSet: optimze isSubset,equal serokell/motoko-base#37Rejected optimizations:
Initially, we were planning to use an implementation of set operations (
intersect
,union
,diff
) from Nipkow's book. However, the experiment shows that naive implementation with a simple size heuristic performs better. The benchmark results are comparing 3 versions:O(min(n,m)log((max(n,m))
which is very close to Nipkow's version). Sizes of sets are also stored but only in the root.The last one outperforms others and keeps a tree slim in terms of byte size. Thus, we have picked it.
Final benchmark results:
Collection benchmarks
set API
new set API