Description
Is your feature request related to a problem? Please describe.
Due to the policy-based data structures (PBDS) library being gcc specific, I had to change the implementation of Quantile and Rank in cpp/cppnodes/statsimpl.cpp to not use an order statistics tree. Instead, they use a std::multiset to store sorted values, and iterate on that set in linear time to find each quantile/rank when triggered. This is much more inefficient than the O(log n) lookup in the PBDS tree.
Describe the solution you'd like
It would be helpful if someone implemented a balanced (red-black) order statistics tree (OST) natively in csp. The OST is a binary search tree with the additional invariant that each node contains the size of its own subtree (https://en.wikipedia.org/wiki/Order_statistic_tree). This allows for efficient (log n time) insert, find, delete, and index-based lookup.
Describe alternatives you've considered
There are actually 2 data structures that will achieve O(log n) insert, delete and index-based lookup
- The order statistics tree mentioned above
- An indexable skiplist. This is what pandas uses for their rolling quantiles. However, pandas has the advantage of static data, so they can size it up front knowing the maximum size of the data interval. In a time-windowed csp stats node, we do not know the maximum number of ticks which will fall in an interval, so sizing becomes a bit trickier (I think we'd have to resize/add links dynamically at powers of 2). Here is a link explaining the impl https://rhettinger.wordpress.com/2010/02/06/lost-knowledge/.
Additional context
This would be helpful for any non gcc users and would also make the code more readable since we're no longer using 2 different approaches for the same stats calculation.
Activity
AdamGlustein commentedon Feb 13, 2024
May be able to leverage the boost multindex datat structure for this:
https://www.boost.org/doc/libs/1_75_0/libs/multi_index/doc/tutorial/indices.html#rnk_indices
Improve non-GCC implementation of statistics functions Quantile/Rank,
boost::multi_index
#333Improve non-GCC implementation of statistics functions Quantile/Rank,