More performance optimizations

- [x] Investigate effect of making `SortedSet` balance less frequently (or even moving to RB trees rather than AVL trees)
- [x] Even when there is no need to rebalance, `SortedSet` sometimes copies more nodes than is necessary. Fix this.
- [x] `SortedSet#delete_at` traverses the tree twice. Add a method to `AVLNode` so it just needs to traverse the tree once.
- [x] When a `SortedSet` is created with a 1-param block (to derive sort keys), should the sort keys for each node be cached? (In some cases, the computation to derive the sort key might be expensive.)
- [x] In `Set#union`, check if `other` is a `Hamster::Set` which is larger than `self`. If so, flip around `self` and `other`. (It will be cheaper to build up the result using the larger trie.)
- [x] In `Set#intersection`, if `other` is significantly smaller than `self`, but is _not_ a `Hamster::Set`, then create an empty `Trie`, iterate through `other`, and add all the elements which are also in `self` to the new `Trie`.
- [x] In `Set#subset?` and `Set#proper_subset?`, if `other` is not a `::Set` or `Hamster::Set` (i.e. `#include?` may be slow), and if `other` is large, it may be faster to build a temporary `::Set` out of `other` first, and use that for testing inclusion. (Don't bother using a `Hamster::Set` for this purpose, it will be slower.) Another option would be to iterate through `other`, count the number of elements which are in `self`, and see if the number is equal to `self.size`.
- [x] Likewise in `Set#disjoint?`: if `self.size < other.size` but `other` is not a `::Set` or `Hamster::Set`, testing `other.include?(item)` may be slow (and may lead to O(n^2) run time). In this case, it may be better to build a temporary `::Set` for testing inclusion, or just use the 2nd branch so the inclusion test is done on `self`, not `other`.
- [x] In `Trie`, add methods for bulk insertion and deletion (which copy less nodes than repeated adding/deletion of single elements). Use these for operations like `Set#union` and so on.
- [ ] Investigate the effect on `Vector` of using an implementation strategy more like the `Persistent::Vector` library. That is, all `Vector` tries would always be composed of full, 32-item nodes. Then, each `Vector` would have a "tail" array of 0-31 additional items at the end. `Vector#push` would only have to modify the trie once for every 32 pushes; the rest of the time, it would just be copying the "tail" array. Likewise for `Vector#pop`. If this proves worthwhile, consider having both "head" and "tail" arrays for odd numbers of items at the front _and_ back. This could also make `Vector#shift` and `#unshift` fast. Retrieving an item from a `Vector` by index would require checking whether it falls within the "head", the "tail", or the vector trie itself.
- [ ] If giving `Vector`s a "head" and "tail" proves to be a worthwhile and successful strategy, compare its performance as a stack/queue with `Deque`. If `Vector` can be made _almost_ as efficient as `Deque` for that purpose, drop `Deque` and allow `Vector` to replace it.
- [ ] Otherwise, if keeping `Deque` is necessary, try to mitigate the nasty worst-case scenario which happens when a large `Deque` is built up, and then items are alternately popped from the front and back. (It will keep reversing the list, and reversing it back again.) If, say, we are reversing `@rear` and moving the items to `@front`, I would suggest: copy the first 1/3 of the items and keep them on `@rear`. Reverse the remaining 2/3 and put them on `@front`.
- [ ] There is another, less deadly worst-case scenario when retrieving the `#first` or `#last` item from a `Deque` with all the items in a long `@front` or `@rear`. Perhaps each `Deque` can keep the last item in `@front` and `@rear` in instance variables, so `#first` and `#last` can always be returned quickly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More performance optimizations #136

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

More performance optimizations #136

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions