Skip to content

BTreeMap Raw Entry API #73

Closed
Closed
@mokhaled2992

Description

@mokhaled2992

Proposal

Problem statement

We need an API that will allow the users to

  1. Fetch or insert entries without doing double lookups. Currently this is only possible with a sequence of get, insert: so two lookups.
  2. Implement custom generic comparators without the need to store the comparator function in the BTreeMap itself (as in C++). Additionally those custom comparators would allow the users to store entries in an order that can rely on some external contexts. Currently this is only possible through one of these two unpleasant alternatives:
    2.1. Making your external context a global mutable and having to deal with the hassle that comes out of that because of threading/synchronization etc....just no
    2.2. Making your context part of the key. This requires the context to be some Rc<RefCell<....>> and the key to have some Weak pointer to that. Obviously a big unnecessary overhead.

Motivation, use-cases

Imagine implementing a dictionary where the strings are stored contiguously in some vector and a you want to establish some sorted lightweight index on that vector. The index would be a BTreeMap of usize keys which are an index into the vector. Obviously the sorting order would then have to be different from the natural sorting order of the usize and it will need to rely on the actual strings vector, we want alphabetical order sorting for our index. By making the raw entry API take a closure we achieve two things: enabling mixed type comparisons, and enabling comparators to consider external contexts during comparisons and only during comparisons. In fact this is even better than the C++ alternative where the comparator lambda has to be passed to the map constructor to be stored as member variable in the map and keep whatever references it needs throughout the whole lifetime of the map.

Furthermore, the raw entry API would enable the users to efficiently tackle the very common case of "get or insert", in which the caller wants to get a reference to the value if it exists or insert a new value immediately at the right position without having to do the lookup twice; once to check for existence and another to insert.

Below is a basic user code that summarizes the use case that was described above

use std::collections::BTreeMap;


// holds all strings contiguously
// maintains a lightweight sorted index
// on such strings.
struct Storage {
    dict: Vec<String>,
    index: BTreeMap<usize, ()>,
}

impl Storage {
    fn get_or_insert(& mut self, input: &str) -> usize {
        match self.index.raw_entry_mut(|key| -> Ordering {
            self.dict[key].cmp(input)
        })
        {
            RawEntryMut::Occupied(o) => o.key(),
            RawEntryMut::Vacant(v) => {
                self.dict.push(input.to_owned());
                v.insert(self.dict.len() - 1, ());
                self.dict.len() - 1
            }
        }
    }
}

Solution sketches

The proposed API looks like this

pub fn BTreeMap::raw_entry_mut<F, U>(&self, u: &U, cmp: F) -> RawEntryMut<'_, K, V>
where F: FnMut(&K, &U) -> Ordering;

pub fn BTreeMap::raw_entry_mut<F>(&self, cmp: F) -> RawEntryMut<'_, K, V>
where F: FnMut(&K) -> Ordering;

RawEntryMut can be an enum Vacant/Occupied where the Vacant variant points to where that key would be inserted, or using C++ terms:

iterator to the position before which the new element will be inserted

Nevertheless, this is an implementation detail of Vacant, the user obviously will just call insert to insert an entry in a vacant location.
Finally, I'm aware of the cursor APIs discussions. I believe future work can extend the methods of RawEntryMut to add things like insert_(before|after) and so on.

Links and related work

I gathered some links showing the demand for such a feature

  1. BTreeSet sorted by dynamically generated keys
  2. How can I implement Ord when the comparison depends on data not part of the compared items?
  3. Interface improvements for BTreeMap
  4. Comparator for BTree(Map|Set)s

Furthermore,

This use case is fully achievable in C++ (C++ Use case code snippet) and I demonstrate that below.

#include <vector>
#include <string>
#include <set>
#include <iostream>

struct Comparator {
    using is_transparent = std::true_type;

    bool operator()(const std::size_t lhs, const std::size_t rhs) const 
    {
        return dict[lhs] < dict[rhs];
    }

    bool operator()(const std::string & lhs, const std::size_t rhs) const {
        return lhs < dict[rhs];
    }

    bool operator()(const std::size_t lhs, const std::string & rhs) const {
        return dict[lhs] < rhs;
    }

    Comparator(const std::vector<std::string> & dict) : dict(dict) {}
    const std::vector<std::string> & dict;
};

struct Storage
{
    std::vector<std::string> dict;
    std::set<std::size_t, Comparator> index;

    Storage() : index(Comparator(dict)) {}

    std::size_t get_or_insert(const std::string& s) {
        const auto it = index.lower_bound(s);
        if(it == index.end() || dict[*it] != s) {
            dict.push_back(s);
            index.emplace_hint(it, dict.size() - 1);
            return dict.size() - 1;
        }

        return *it;
    }
};


int main(int argc, const char ** argv) {
    Storage s;

    std::cout << s.get_or_insert("HelloWorld") << std::endl;
    std::cout << s.get_or_insert("HelloWorld") << std::endl;
    std::cout << s.get_or_insert("C++") << std::endl;
    std::cout << s.get_or_insert("Rust") << std::endl;
    std::cout << s.get_or_insert("C++") << std::endl;
    std::cout << s.get_or_insert("C++") << std::endl;
    std::cout << s.get_or_insert("Rust") << std::endl;
    std::cout << s.get_or_insert("C++") << std::endl;
    std::cout << s.get_or_insert("C++") << std::endl;

    return 0;
}

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-libs-apiapi-change-proposalA proposal to add or alter unstable APIs in the standard libraries

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions