Skip to content

feat(search): Introduce RangeTree. DRAFT #5397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

BagritsevichStepan
Copy link
Contributor

@BagritsevichStepan BagritsevichStepan commented Jul 1, 2025

TODO: description

@BagritsevichStepan BagritsevichStepan self-assigned this Jul 1, 2025
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
@BagritsevichStepan BagritsevichStepan force-pushed the search/speed-up-numeric-indexes branch from 06a4be2 to 15cc964 Compare July 2, 2025 04:19
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We could've started without blocklist 🙂
  2. The tree is ok. Let's use doubles from the start
  3. RangeResult and ResultsMerger are too complicated. They can be implemented much simpler inside a single function. Please don't try to optimize so much stuff prematurely

Comment on lines +19 to +20
using RangeNumber = long long;
using Key = std::pair<RangeNumber, RangeNumber>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if all values are in the range 0...1? 🔍 I suggest just to use doubles. RS treats everything as a double, so do we

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be changed

Comment on lines +71 to +75
public:
class ResultsMerger {
private:
using MiddleBlockIteratorList = absl::InlinedVector<RangeBlockIterator, 10>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not replace all this with just:

auto compare = []()...
queue<pair<B::const_iterator, /* cur */ B::const_iterator /* end */>, ...> q;
// fill q
while (!q.empty()) {
    auto [it, end_it] = q.top();
    if (it->second >= l && it->second <= r && it->first != result.back().first)
      result.push_back(*it);
    if (++it != end_it)
      q.emplace(it, end_it);
  }

This task in it's basic form shouldn't take 200LOC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to save unnecessary comparisons we'll add just a flag to the iterator, I don't see a reason for separating blocks into left, middle, right as fields

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially GetMin(x) shouldn't exist, we always collect the values sequentially advancing one by one, assuming x is always result.back() + 1

Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in this PR for the future use. RangeResult can be easily added to the IndexResult. GetMin with size_t min_index will help a lot for merging std::vector (or whatever in other IndexResult) with RangeResult. Based on the some_customer requests I see that this can improve performance a lot (we have quite different number of matches between fields for single query). Also I added a benchmarks with two numeric indexes - and benchmark with text/tag index to compare a difference in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, yet:

  • The comments above (about fields, queue design) still apply
  • We don't have skiplist merges for blocklists yet. Let's start simple. Once we do, we'll just wrap it into a stateful struct

Copy link
Contributor

@dranikpg dranikpg Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically one of your most common cases, a range that lands in one or two blocks (they will be "border" blocks) is not a fast path: you filter them, copying filtered values into a temporary container, loosing your natural ("future") skiplist at the same time (bc you turn it into a vector). And you still route everything through the queue

Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when I was writing this logic I didn’t expect that the optimal block size would be so large — so that most range requests only cover 2–3 blocks.
But it’s just a few lines to fix: just change the RangeResult constructor - we can simply store l and r in the RangeResult and not calling those Utils functions you dislike

Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also can see that I'm using in constructor absl::InlinedVector<RangeBlockPointer, 10> blocks , which is not needed (10 is too big number)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it’s just a few lines to fix ... we can simply store l and r in the RangeResult

So then we don't need to handle border blocks separately, then we can simplify the queue... and we'll end up with a simpler solution 🤓

@dranikpg dranikpg changed the title fix(search_family): Fix perfomance for the NumericIndex::Range method feat(search): Introduce RangeTree Jul 2, 2025
@BagritsevichStepan
Copy link
Contributor Author

We are not going to merge this PR. I'm splitting it to several steps - that's why it pushed as a draft. But I will fix you comments for next PR

@BagritsevichStepan BagritsevichStepan changed the title feat(search): Introduce RangeTree feat(search): Introduce RangeTree. DRAFT Jul 3, 2025
@BagritsevichStepan
Copy link
Contributor Author

PR was splitted in several PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants