Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

GoPavel · 2024-09-05T12:16:46Z

This is an MR for the 1st Milestone of Serokell's grant about improving Motoko's base library.

Within it, we are introducing a new functional implementation of the map data structure based on the current RBTree.mo.

Content of the MR:

extract the implementation from the RBTree.mo
change API to keep it convenient to store the map in the stable variable
add missing functionality
- map
- foldLeft, foldRight
- mapFilter
- keys, vals
optimize tree representation via elimination of option type ?V -> V
unit tests
property-based tests
documentation
- check typos again
- add examples

sa-github-api · 2024-09-05T12:17:49Z

Dear @GoPavel,

In order to potentially merge your code in this open-source repository and therefore proceed with your contribution, we need to have your approval on DFINITY's CLA.

If you decide to agree with it, please visit this issue and read the instructions there. Once you have signed it, re-trigger the workflow on this PR to see if your code can be merged.

— The DFINITY Foundation

src/PersistentOrderedMap.mo

crusso

Some interim comments (more to come, just submitting because I expect GH might lost my review soon)

src/PersistentOrderedMap.mo

crusso · 2024-10-04T16:13:10Z

src/PersistentOrderedMap.mo

+  /// with `Map` to maintain invariant that comparator did not changed.
+  ///
+  /// `MapOps` contains methods that require `compare` internally:
+  /// operations that may reshape a `Map` or should find something.


Suggested change

/// operations that may reshape a `Map` or should find something.

/// operations that update, traverse or search a map.

I think traverse doesn't require comp (since we keep tree structure) so it's outside of the MapOps.
So I think it should be

Suggested change

/// operations that may reshape a `Map` or should find something.

/// operations that update or search a map.

I guess that's a UX question - should we put all operations in MapOps, even if they don't require the comparison, just for uniformity? I guess we could also do that later easily enough. Something to think about.

should we put all operations in MapOps

I also thought about this. We were going to measure its overhead on a benchmark and then decide. If there is no difference I would also put it there.

However for now some modules like RBTree or TrieMap provide some functionality as a module-function, not as a member-function e.g. map or iter. So we thought that the current solution is not very inconsistent.

We did performance experiments:

Comparing inlined internal functions and the current version (all calls through MapOps): [DMS-68] Inline internal functions into MapOps object serokell/motoko-base#21

Comparing calling methods through MapOps and through the Internal module passing comp manually: [DMS-68] Compare function calls from MapOps with direct function call serokell/motoko-base#20

The 1st experiment shows that the inlined version has more or less the same performance (<1% difference).
The 2nd one shows that a static function call is faster than a similar one from MapOps on 1-4% percent.

Thus we can say that static calls are less convenient but faster, so probably it makes sense to keep the option ?
We can add the rest of the functions into MapOps and make the Internal module public.
OR keep the current version as compromised

WDYT?

So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

src/PersistentOrderedMap.mo

crusso

Added some more comments...

src/PersistentOrderedMap.mo

No changes in logic so far, just simple refactoring

Add `MapOps` class with the following signature: public class MapOps<K>(compare : (K,K) -> O.Order) { public func put<V>(rbMap : Map<K, V>, key : K, value : V) : Map<K, V> public func fromIter<V>(i : I.Iter<(K,V)>) : Map<K, V> public func replace<V>(rbMap : Map<K, V>, key : K, value : V) : (Map<K,V>, ?V) public func mapFilter<V1, V2>(f : (K, V1) -> ?V2, rbMap : Map<K, V1>) : Map<K, V2> public func get<V>(key : K, rbMap : Map<K, V>) : ?V public func delete<V>(rbMap : Map<K, V>, key : K) : Map<K, V> public func remove< V>(rbMap : Map<K, V>, key : K) : (Map<K,V>, ?V) }; The other functionality provided as standalone functions, as they don't require comparator: public type Direction = { #fwd; #bwd }; public func iter<K, V>(rbMap : Map<K, V>, direction : Direction) : I.Iter<(K, V)> public func entries<K, V>(m : Map<K, V>) : I.Iter<(K, V)> public func keys<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<K> public func vals<K, V>(m : Map<K, V>, direction : Direction) : I.Iter<V> public func map<K, V1, V2>(f : (K, V1) -> V2, rbMap : Map<K, V1>) : Map<K, V2> public func size<K, V>(t : Map<K, V>) : Nat public func foldLeft<Key, Value, Accum>( combine : (Key, Value, Accum) -> Accum, base : Accum, rbMap : Map<Key, Value> ) : Accum And foldRight with the same signature as foldLeft The following functions are new for the API: - MapOps.put, MapOps.delete - MapOps.fromIter, entries, keys, vals - MapOps.mapFilter, map - foldLeft, foldRight

Problem: now order is not consistent within new module and with old modules as well. Solution: make the map argument always go first

In addition to tests this patch removes `direction` argument from `keys` and `values` function to keep them simple and provides a new function `Map.empty` to create a map without knowing its internal representation.

* rename `rbMap` into `m` in signature for brevity & consistent language * rename `rbMap` into `map` in examples for brevity & encapsulation sake * rename `tree` into `map` in doc comments for the encapsulation sake

GoPavel · 2024-10-21T15:36:36Z

@crusso
I've rebased the branch and fixed the name rbMap in doc comments and signatures.
Please resolve conversations that you think should be resolved, so we will see if something left about his PR.
Also, I would ask you to approve the CI workflow since now I see "This workflow is awaiting approval from a maintainer in" here

crusso

Added some comments

crusso · 2024-10-24T16:28:01Z

src/PersistentOrderedMap.mo

+  /// with `Map` to maintain invariant that comparator did not changed.
+  ///
+  /// `MapOps` contains methods that require `compare` internally:
+  /// operations that may reshape a `Map` or should find something.


So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

test/PersistentOrderedMap.prop.test.mo

GoPavel · 2024-10-24T16:54:51Z

So I guess we decide to put all operations in MapOps and leave Internal private for now at a 1-4% perf hit.
Does the perf hit disappear entirely if we inline the Internal functions in MapOps, or am I misunderstanding?

We have tried it, and it doesn't affect the benchmark at all. See serokell#21

This is an MR for the 3rd Milestone of the Serokell's grant about improving Motoko's base library. The main goal of the PR is to introduce a new functional implementation of the set data structure to the' base' library. Also, it brings a few changes to the new functional map that was added in #664 , #654 . # General changes: * rename `PersistentOrderedMap` to `OrderedMap` (same for the `OrderedSet`) * improve docs # Functional Map changes: ## New functionality: + add `any`/`all` functions + add `contains` function + add `minEntry`/`maxEntry` ## Optimizations: + Store `size` in the Map, [benchmark results](serokell#35) ## Fixup: + add `entriesRev()`, remove `iter()` # NEW functional Set: The new data structure implements an ordered set interface using Red-Black trees as well as the new functional map from the 1-2 Milestones. ## API implemented: * Basic operations (based on the map): `put`, `delete`, `contains`, `fromIter`, etc * Maps and folds: `map`, `mapFilter`, `foldLeft`, `foldRight` * Set operations: `union` , `intersect`, `diff`, `isSubset`, `equal` * Additional operations (as for the `OrderedMap`): `min`/`max`, `all`/`some` ## Maintainance support: * Unit, property tests * Documentation ## Applied optimizations: * Same optimizations that were useful for the functional map: * inline node color * float-out exceeded matching in iteration * `map`/`filterMap` through `foldLeft` * direct recursion in `foldLeft` * [Benchmark results for all four optimizations together](serokell#27) * store size in the root of the tree, [benchmark results](serokell#36 (comment)) * Pattern matching order optimization, [benchmark results](serokell#36 (comment)) * Other optimizations: * Inline code of `OrderedMap` instead of sharing it, [benchmark results](serokell#25) * `intersect` optimization: use order of output values to build the resulting tree faster, see serokell#39 * `isSubset`, `equal` optimization: use early exit and use order of subtrees to reduce intermediate tree height, see serokell#37 ## Rejected optimizations: * Nipkow's implementation of set operation [Tobias Nipkow's "Functional Data Structures and Algorithms", 117]. Initially, we were planning to use an implementation of set operations (`intersect`, `union`, `diff`) from Nipkow's book. However, the experiment shows that naive implementation with a simple size heuristic performs better. [The benchmark results](serokell#33) are comparing 3 versions: * persistentset_baseline -- original implementation that uses Nipkow's algorithms. However, the black height is calculated before each set operation (the book assumes it's stored). * persistentset_bh -- the same as the baseline but the black height is stored in each node. * persistentset -- naive implementation that looks up in a smaller set and modifies a bigger one (it gives us `O(min(n,m)log((max(n,m))` which is very close to Nipkow's version). Sizes of sets are also stored but only in the root. The last one outperforms others and keeps a tree slim in terms of byte size. Thus, we have picked it. ## Final benchmark results: ### Collection benchmarks | |binary_size|generate|max mem|batch_get 50|batch_put 50|batch_remove 50|upgrade| |--:|--:|--:|--:|--:|--:|--:|--:| |orderedset+100|218_168|186_441|37_916|53_044|121_237|127_460|346_108| |trieset+100|211_245|574_022|47_652|131_218|288_429|268_499|729_696| |orderedset+1000|218_168|2_561_296|520_364|69_883|158_349|170_418|3_186_579| |trieset+1000|211_245|7_374_045|633_440|162_806|383_594|375_264|9_178_466| |orderedset+10000|218_168|40_015_301|320_532|84_660|192_931|215_592|31_522_120| |trieset+10000|211_245|105_695_670|682_792|192_931|457_923|462_594|129_453_045| |orderedset+100000|218_168|476_278_087|3_200_532|98_553|230_123|258_372|409_032_232| |trieset+100000|211_245|1_234_038_235|6_826_516|222_247|560_440|549_813|1_525_692_388| |orderedset+1000000|218_168|5_514_198_432|32_000_532|115_836|268_236|306_896|4_090_302_778| |trieset+1000000|211_245|13_990_048_548|68_228_312|252_211|650_405|642_099|17_455_845_492| ### set API | |size|intersect|union|diff|equals|isSubset| |--:|--:|--:|--:|--:|--:|--:| |orderedset+100|100|146_264|157_544|215_871|28_117|27_726| |trieset+100|100|352_496|411_306|350_935|201_896|201_456| |orderedset+1000|1000|162_428|194_198|286_747|242_329|241_938| |trieset+1000|1000|731_650|1_079_906|912_629|2_589_090|4_023_673| |orderedset+10000|10000|177_080|231_070|345_529|2_383_587|2_383_591| |trieset+10000|10000|3_986_854|21_412_306|5_984_106|46_174_710|31_885_381| |orderedset+100000|100000|190_727|267_008|402_081|91_300_348|91_300_393| |trieset+100000|100000|178_863_894|209_889_623|199_028_396|521_399_350|521_399_346| |orderedset+1000000|1000000|205_022|304_937|464_859|912_901_595|912_901_558| |trieset+1000000|1000000|1_782_977_198|2_092_850_787|1_984_818_266|5_813_335_155|5_813_335_151| ### new set API | |size|foldLeft|foldRight|mapfilter|map| |--:|--:|--:|--:|--:|--:| |orderedset|100|16_487|16_463|88_028|224_597| |orderedset|1000|133_685|131_953|1_526_510|4_035_782| |orderedset|10000|1_305_120|1_287_495|28_455_361|51_527_733| |orderedset|100000|13_041_665|12_849_418|344_132_505|630_692_463| |orderedset|1000000|130_428_573|803_454_777|4_019_592_041|7_453_944_902| --------- Co-authored-by: Andrei Borzenkov <andreyborzenkov2002@gmail.com> Co-authored-by: Andrei Borzenkov <root@sandwitch.dev> Co-authored-by: Sergey Gulin <sergeygulin96@gmail.com> Co-authored-by: Claudio Russo <claudio@dfinity.org>

github-actions bot added the external-contributor label Sep 5, 2024

GoPavel force-pushed the milestone-1 branch from 740c391 to 41b6516 Compare September 30, 2024 22:56

GoPavel changed the title ~~[Draft] Serokell: New PersistentMap.mo lib~~ [Draft] Serokell: New PersistentOrderedMap.mo lib Sep 30, 2024

GoPavel changed the title ~~[Draft] Serokell: New PersistentOrderedMap.mo lib~~ Serokell: New PersistentOrderedMap.mo lib Oct 2, 2024

crusso reviewed Oct 4, 2024

View reviewed changes

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved

crusso reviewed Oct 7, 2024

View reviewed changes

src/PersistentOrderedMap.mo Show resolved Hide resolved

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved

src/PersistentOrderedMap.mo Outdated Show resolved Hide resolved

GoPavel changed the title ~~Serokell: New PersistentOrderedMap.mo lib~~ Serokell: [Milestone-1] New PersistentOrderedMap.mo lib Oct 14, 2024

GoPavel mentioned this pull request Oct 14, 2024

Serokell: [Milestone-2] Optimize PersistenOrderedMap.mo #664

Open

s-and-witch and others added 10 commits October 21, 2024 17:31

Extract persistent RBTree from mutable wrapper into a separate module

d97833b

No changes in logic so far, just simple refactoring

[DMS-66] fix order of method arguments

564f95a

Problem: now order is not consistent within new module and with old modules as well. Solution: make the map argument always go first

[DMS-76] Persistent ordered map unit testing

174e877

In addition to tests this patch removes `direction` argument from `keys` and `values` function to keep them simple and provides a new function `Map.empty` to create a map without knowing its internal representation.

[DMS-67] add property-based tests for PersistentOrderedMap

92925ab

[DMS-77] Add usage examples to the documentation

af994d6

Specify keys order in iterators

e040734

PersistentOrderedMap: fix typos

070673f

PersistentOrderedMap: fix imports in examples

ef13759

PerisistentOrderedMap: fix docs

d4b02eb

* rename `rbMap` into `m` in signature for brevity & consistent language * rename `rbMap` into `map` in examples for brevity & encapsulation sake * rename `tree` into `map` in doc comments for the encapsulation sake

GoPavel force-pushed the milestone-1 branch from 273e788 to d4b02eb Compare October 21, 2024 15:32

crusso reviewed Oct 24, 2024

View reviewed changes

PersistentOrderedMap: fix newline in prop tests

26a33c1

GoPavel mentioned this pull request Nov 4, 2024

[Milestone-3] Serokell: New OrderedSet.mo & OrderedMap fixup #662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

GoPavel commented Sep 5, 2024 •

edited by aterga

Loading

sa-github-api commented Sep 5, 2024

crusso left a comment

crusso Oct 4, 2024

GoPavel Oct 7, 2024

crusso Oct 8, 2024

GoPavel Oct 8, 2024

GoPavel Oct 14, 2024

crusso Oct 24, 2024

crusso left a comment

GoPavel commented Oct 21, 2024

crusso left a comment

crusso Oct 24, 2024

GoPavel commented Oct 24, 2024

	/// operations that may reshape a `Map` or should find something.
	/// operations that update, traverse or search a map.

	/// operations that may reshape a `Map` or should find something.
	/// operations that update or search a map.

Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

Are you sure you want to change the base?

Serokell: [Milestone-1] New PersistentOrderedMap.mo lib #654

Conversation

GoPavel commented Sep 5, 2024 • edited by aterga Loading

sa-github-api commented Sep 5, 2024

crusso left a comment

Choose a reason for hiding this comment

crusso Oct 4, 2024

Choose a reason for hiding this comment

GoPavel Oct 7, 2024

Choose a reason for hiding this comment

crusso Oct 8, 2024

Choose a reason for hiding this comment

GoPavel Oct 8, 2024

Choose a reason for hiding this comment

GoPavel Oct 14, 2024

Choose a reason for hiding this comment

crusso Oct 24, 2024

Choose a reason for hiding this comment

crusso left a comment

Choose a reason for hiding this comment

GoPavel commented Oct 21, 2024

crusso left a comment

Choose a reason for hiding this comment

crusso Oct 24, 2024

Choose a reason for hiding this comment

GoPavel commented Oct 24, 2024

GoPavel commented Sep 5, 2024 •

edited by aterga

Loading