-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(search): Introduce RangeTree. DRAFT #5397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search): Introduce RangeTree. DRAFT #5397
Conversation
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
06a4be2
to
15cc964
Compare
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We could've started without blocklist 🙂
- The tree is ok. Let's use doubles from the start
- RangeResult and ResultsMerger are too complicated. They can be implemented much simpler inside a single function. Please don't try to optimize so much stuff prematurely
using RangeNumber = long long; | ||
using Key = std::pair<RangeNumber, RangeNumber>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if all values are in the range 0...1? 🔍 I suggest just to use doubles. RS treats everything as a double, so do we
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be changed
public: | ||
class ResultsMerger { | ||
private: | ||
using MiddleBlockIteratorList = absl::InlinedVector<RangeBlockIterator, 10>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not replace all this with just:
auto compare = []()...
queue<pair<B::const_iterator, /* cur */ B::const_iterator /* end */>, ...> q;
// fill q
while (!q.empty()) {
auto [it, end_it] = q.top();
if (it->second >= l && it->second <= r && it->first != result.back().first)
result.push_back(*it);
if (++it != end_it)
q.emplace(it, end_it);
}
This task in it's basic form shouldn't take 200LOC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to save unnecessary comparisons we'll add just a flag to the iterator, I don't see a reason for separating blocks into left, middle, right
as fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially GetMin(x)
shouldn't exist, we always collect the values sequentially advancing one by one, assuming x
is always result.back() + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in this PR for the future use. RangeResult can be easily added to the IndexResult. GetMin with size_t min_index will help a lot for merging std::vector (or whatever in other IndexResult) with RangeResult. Based on the some_customer requests I see that this can improve performance a lot (we have quite different number of matches between fields for single query). Also I added a benchmarks with two numeric indexes - and benchmark with text/tag index to compare a difference in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, yet:
- The comments above (about fields, queue design) still apply
- We don't have skiplist merges for blocklists yet. Let's start simple. Once we do, we'll just wrap it into a stateful struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ironically one of your most common cases, a range that lands in one or two blocks (they will be "border" blocks) is not a fast path: you filter them, copying filtered values into a temporary container, loosing your natural ("future") skiplist at the same time (bc you turn it into a vector). And you still route everything through the queue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when I was writing this logic I didn’t expect that the optimal block size would be so large — so that most range requests only cover 2–3 blocks.
But it’s just a few lines to fix: just change the RangeResult constructor - we can simply store l and r in the RangeResult and not calling those Utils functions you dislike
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also can see that I'm using in constructor absl::InlinedVector<RangeBlockPointer, 10> blocks , which is not needed (10 is too big number)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it’s just a few lines to fix ... we can simply store l and r in the RangeResult
So then we don't need to handle border blocks separately, then we can simplify the queue... and we'll end up with a simpler solution 🤓
We are not going to merge this PR. I'm splitting it to several steps - that's why it pushed as a draft. But I will fix you comments for next PR |
PR was splitted in several PRs |
TODO: description