Skip to content

Metabug: Libcollections TODO #18009

Closed
Closed
@Gankra

Description

@Gankra

As part of the stabilization process, we'd like our standard collection implementations to be good. Maybe even great. Wouldn't that be nice? The overall quality of each implementation is pretty varied, partially just because the language has dramatically shifted out from underneath them. Vec and HashMap are in a good place because they're The Big Dogs of data structuring, and have gotten substantial attention as a result. Other less-used collections are in worse shape, and need some serious attention.

If you're interested in working on a Rust project, or collections in particular, here's a big list of things to do experiment with on libcollections.

  • DList's internal "safe" API is totally busted. It's not safe at all. Possibly simply because RawLink is Copyable. This allows a reference to a RawLink to be converted into a value, allowing mutation of an immutable DList.
  • Investigate implementing a proper bidirectional cursor on DList that can insert/remove values. It's borderline useless without this.
  • Full code review of BitV and BitVSet. They've had a lot of bugs because the code base has seen a lot of churn. Some legacy references to the bad-old-day of BitV being an enum still exist (such as a benchmark that just tests how fast bitshifting a uint is).
  • Bitv uses architecture-sized uints for backing storage #16736 BitV's internal representation is a Vec of uints. It should almost certainly be a Vec of u32's, or some other non-machine-sized type.
  • RingBuf should not be a Vec of Options.
  • Investigate whether accepting total unsafety in TreeMap and using RawPtrs everywhere is worth it. May be necessary in a future full of scoped allocators.
  • Replace the bulky triple of Vecs used in BTreeMap's Node struct with a single manually allocated
    buffer, in the same vein as HashMap's RawTable.
  • Investigate a richer default B-selection algorithm for BTreeMap's default constructor than "6"
  • Investigate alternative node-search schemes over trivial linear search for BTreeMap. Binary search? skip-k-at-a-time linear search? Dynamic (or static!) selection of these algorithms based on B/K/V?
  • Seriously optimize BTreeMap at all. At all.
  • Investigate the possibility of replacing the heap-allocated Vec<*mut Node<T>>-based search-stacks of BTreeMap, TreeMap, and TrieMap with a stack-allocated [*mut Node<T>, ..uint::BITS] (TreeMap might need more, it's an AA-tree, dunno how tall they can get). Note that a preliminary experiment on BTree gave very poor results, though it has the least to gain from such a design from it's current design (tracks depth, can use with_capacity.
  • Investigate the possibility of reusing the SearchStack concept used in BTreeMap to more safely wrap the rawptr-based search stacks of TreeMap and TrieMap.
  • Investigate the possibility of making the iterators for TreeMap and TrieMap DoubleEnded, rather than having separate forward/backward iterators.
  • At very least, get rid of the old-style internal backwards iterator on TrieMap.
  • Investigate the possibility of cutting out a lot of the macro-madness from TreeMap and TrieMap (BTreeMap has no macros and little-to-no code duplication, just hardcore generics). Possibly using the Deref/DerefMut trick in HashMap.
  • Investigate completely replacing the unsafe rawptr search-stacks in the iterators of TreeMap and TrieMap with safe stacks of iterators over the nodes themselves, like BTreeMap does.
  • Investigate arbitrary sub-range iterators for sorted sets/maps.
  • Investigate functionality augmentations to a lot of collections iterators. For instance DList's iterator provides peek_next and remove_next methods for soundly removing elements from the list during iteration. At least Vec, RingBuf, and HashMap could all support this soundly, I believe. Vec, RingBuf, and DList could also support insert_next. Special iterator-like objects dedicated to these tasks might be better ideas, though (cursors, zippers, etc).
  • Implement "entry" API for maps #17320 Implement the new Entry API on TreeMap and TrieMap.
  • Investigate different hashing architectures and mechanisms for hash algorithm selection than the current one. For instance, our current stream-based architecture is poorly designed for small keys (e.g. u64's) which can and should be hashed in a single step.
  • Add a min heap option to PriorityQueue #15947 Investigate making ordering-based collections comparator based rather than directly Ord-based using unboxed closures. Ord types should be supported through some kind of "natural comparator" that is reversible (for supporting reverse orderings). PriorityQueue in particular would like this to simplify min-vs-max-heap.
  • Investigate optimizing PriorityQueue.extend with a modified heapify operation.
  • Consider more slice utility methods. slice.rotate_left/right? slice.shift_and_replace_left/right?
  • Consider providing split(&mut self, at: Index) -> Self on some collections.
  • Investigate tricks used in other languages to optimize for common/special access patterns.
  • Tests! Code coverage!
  • Docs! Examples!
  • Detailed discussions of what individual collections are good/bad for.
  • Performance comparison tables (probably just assymptoticss
  • There was only one other checked box and it looked lonely so I made it a friend. :D

Metadata

Metadata

Assignees

No one assigned

    Labels

    metabugIssues about issues themselves ("bugs about bugs")

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions