-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(search_family): Introduce merging of sorted search results from shards. FIRST PR #5381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(search_family): Introduce merging of sorted search results from shards. FIRST PR #5381
Conversation
…shards Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
f7a6ddf
to
ff598ab
Compare
// Sort order for sorting indices and results. | ||
enum class SortOrder { ASC, DESC }; | ||
|
||
/* Optimized comparator for ascending or descending sort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you demonstrate the performance improvement here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially after introducing dynamic dispatch with std::function 😆 Will be two times slower at least. I'm not sure instantiated sort is fully optimizied for the ascending parameter being immutable, but I doubt it has a serious impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I’m aware that std::function introduces dynamic dispatch. It was initially added based on several comments and discussions from Borys in other PRs (regarding the SORTBY option in the FT.AGGREGATE command). I still added std::function here because we can always implement our own lightweight version — there are a lof of such implementations, and I remember that I implemented one myself some time ago.
For this PR we can remove this, and with adding the correct implementation add this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They optimize allocations, not dispatch cost. Why add so much code for this at all if
- we can just use static dispatch with templates
- likely optimize prematurely here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR mixes many changes but in practice the change you suggest can be demonstrated by using a much smaller PR. It does not have to work for all cases to demonstrate the value.
What this PR does not have is benchmarks, which in fact are necessary for such changes.
// Sort order for sorting indices and results. | ||
enum class SortOrder { ASC, DESC }; | ||
|
||
/* Optimized comparator for ascending or descending sort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially after introducing dynamic dispatch with std::function 😆 Will be two times slower at least. I'm not sure instantiated sort is fully optimizied for the ascending parameter being immutable, but I doubt it has a serious impact
// TODO: use min-heap if limit is much smaller than the number of results | ||
std::partial_sort(knn_distances_.begin(), knn_distances_.begin() + knn.limit, | ||
knn_distances_.end()); | ||
knn_distances_.resize(knn.limit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the limit is much smaller than size of the array - use k-min-heap selection instead of std::partial_sort.
This will significantly reduce allocations and improve efficiency.
Bold claim. I'm not sure at all it's more effective than partial sort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Note that std::partial_sort is also using heap approach for small k (as I get from documentation). But here you need to do knn_distances_.reserve(sub_results.Size()); first and push the values to the array. If limit is small and there is a big amount of sub_results heap approach should be better. But I agree that it is not priority for the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we can avoid allocating knn distances vector (but likely cheap compared to computing vector distance)
search_result->RearrangeAccordingToIndexes(ids_to_sort); | ||
|
||
std::vector<SortableValue> out(size); | ||
for (size_t i = 0; i < size; i++) { | ||
const auto doc_id = search_result->ids[i]; | ||
auto& value = values_[doc_id]; | ||
if constexpr (!std::is_arithmetic_v<T>) { | ||
out[i] = std::string{value}; | ||
} else { | ||
out[i] = static_cast<double>(value); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If there was a simple way to sort two arrays at once, you wouldn't need the indices array and could keep a separate array for scores (to avoid reallocating into an array of pairs). Requires modifying the swap
- I understand it's temporary(?) but passing SearchAlgorithmResult* is not the best interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's temporary. I added TODOs and also mentioned in the PR description that it will be fixed in the very next PR.
Good idea regarding the swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, take into account that now (on the current main branch) this method is not called at all 😄
search::SortOrder sort_order, search::SearchAlrgorithmResult* search_result) const { | ||
auto fident = field.GetIdentifier(base_->schema, false); | ||
|
||
if (IsSortableField(fident, base_->schema)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe let's not move logic higher than needed, doc_index knows itself if it's sortable or not, so it can do it internally, we just pass an accessor object unconditionally
for (size_t i = 0; i < results.size(); ++i) { | ||
if (!results[i].docs.empty()) { | ||
const auto& first_doc = results[i].docs[0]; | ||
update_cached_value(i, first_doc, &cached_values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: update_cached_value can really just be a get_value(i)
🤔
for (size_t i = 0; i < results.size(); ++i) { | ||
if (indexes[i] < results[i].docs.size() && (!next_doc || comparator(i, next_index))) { | ||
next_doc = &results[i].docs[indexes[i]]; | ||
next_index = i; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually where you can use a heap optimization with make_heap or something akin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is temporary, note that the shards number is not big
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe later we can change this
Let's not review this PR for now - it won't be submitted in this form as it's too big and not focused. |
Fixes non-optimal sorting and limiting introduced in #4942
The problem:
In the PR #4942, we moved sorting and limiting of search results to the coordinator thread to support sorting on non-sortable fields. However, this introduced performance issues: each shard returns all possible IDs without applying any local limiting, and the coordinator moves everything into a single array, and only then does a partial or full sort.
Even before this PR, for KNN search we were performing full sorting on coordinator thread instead of merging sorted shard results.
How it works now:
This change should improve performance (though benchmark tests are still in progress), especially due to limiting done early on shards.
Coordinator logic is already quite optimized: no unnecessary large allocations (e.g.,
offset + limit
) are made — only up tolimit
elements are allocated and processed.However, the shard-side logic still requires improvements, which will be fixed in the second PR:
offset + limit
in sort indexes and non-sortable field sorting.Instead, we should use
variant<vector<DocId>, vector<pair<DocId, ResultScore>>>
inSearchAlrgorithmResult
If the limit is much smaller than size of the array - usek-min-heap
selection instead ofstd::partial_sort
.This will significantly reduce allocations and improve efficiency.
Take
methodThere are also other improvements, but I haven't implemented them yet — I don't think it's worth optimizing this further (after second pr)