feat(search_family): Introduce merging of sorted search results from shards. FIRST PR #5381

BagritsevichStepan · 2025-06-29T20:21:47Z

Fixes non-optimal sorting and limiting introduced in #4942

The problem:
In the PR #4942, we moved sorting and limiting of search results to the coordinator thread to support sorting on non-sortable fields. However, this introduced performance issues: each shard returns all possible IDs without applying any local limiting, and the coordinator moves everything into a single array, and only then does a partial or full sort.
Even before this PR, for KNN search we were performing full sorting on coordinator thread instead of merging sorted shard results.

How it works now:

Sorting and limiting is now performed on shards in all cases: SORTBY on sortable fields, SORTBY on non-sortable fields, KNN search, KNN search + SORTBY
The coordinator thread now performs only merging of already sorted results; sorting has been completely removed from this stage.

This change should improve performance (though benchmark tests are still in progress), especially due to limiting done early on shards.

Coordinator logic is already quite optimized: no unnecessary large allocations (e.g., offset + limit) are made — only up to limit elements are allocated and processed.

However, the shard-side logic still requires improvements, which will be fixed in the second PR:

Remove unnecessary allocations of size offset + limit in sort indexes and non-sortable field sorting.
Instead, we should use variant<vector<DocId>, vector<pair<DocId, ResultScore>>> in SearchAlrgorithmResult
If the limit is much smaller than size of the array - use k-min-heap selection instead of std::partial_sort.
This will significantly reduce allocations and improve efficiency.
Do sorting in the search algorithm before calling Take method

There are also other improvements, but I haven't implemented them yet — I don't think it's worth optimizing this further (after second pr)

…shards Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

romange · 2025-06-30T09:29:05Z

src/core/search/base.h

+// Sort order for sorting indices and results.
+enum class SortOrder { ASC, DESC };
+
+/* Optimized comparator for ascending or descending sort.


Can you demonstrate the performance improvement here?

Especially after introducing dynamic dispatch with std::function 😆 Will be two times slower at least. I'm not sure instantiated sort is fully optimizied for the ascending parameter being immutable, but I doubt it has a serious impact

Yes, I’m aware that std::function introduces dynamic dispatch. It was initially added based on several comments and discussions from Borys in other PRs (regarding the SORTBY option in the FT.AGGREGATE command). I still added std::function here because we can always implement our own lightweight version — there are a lof of such implementations, and I remember that I implemented one myself some time ago.

For this PR we can remove this, and with adding the correct implementation add this change

They optimize allocations, not dispatch cost. Why add so much code for this at all if

we can just use static dispatch with templates

likely optimize prematurely here

romange

This PR mixes many changes but in practice the change you suggest can be demonstrated by using a much smaller PR. It does not have to work for all cases to demonstrate the value.

What this PR does not have is benchmarks, which in fact are necessary for such changes.

dranikpg · 2025-06-30T19:45:33Z

src/core/search/base.h

+// Sort order for sorting indices and results.
+enum class SortOrder { ASC, DESC };
+
+/* Optimized comparator for ascending or descending sort.


Especially after introducing dynamic dispatch with std::function 😆 Will be two times slower at least. I'm not sure instantiated sort is fully optimizied for the ascending parameter being immutable, but I doubt it has a serious impact

dranikpg · 2025-06-30T19:55:16Z

src/core/search/search.cc

+      // TODO: use min-heap if limit is much smaller than the number of results
+      std::partial_sort(knn_distances_.begin(), knn_distances_.begin() + knn.limit,
+                        knn_distances_.end());
+      knn_distances_.resize(knn.limit);


If the limit is much smaller than size of the array - use k-min-heap selection instead of std::partial_sort.
This will significantly reduce allocations and improve efficiency.

Bold claim. I'm not sure at all it's more effective than partial sort

Why? Note that std::partial_sort is also using heap approach for small k (as I get from documentation). But here you need to do knn_distances_.reserve(sub_results.Size()); first and push the values to the array. If limit is small and there is a big amount of sub_results heap approach should be better. But I agree that it is not priority for the next PR

True, we can avoid allocating knn distances vector (but likely cheap compared to computing vector distance)

dranikpg · 2025-06-30T20:08:25Z

src/core/search/sort_indices.cc

+  search_result->RearrangeAccordingToIndexes(ids_to_sort);
+
+  std::vector<SortableValue> out(size);
+  for (size_t i = 0; i < size; i++) {
+    const auto doc_id = search_result->ids[i];
+    auto& value = values_[doc_id];
+    if constexpr (!std::is_arithmetic_v<T>) {
+      out[i] = std::string{value};
+    } else {
+      out[i] = static_cast<double>(value);
+    }
+  }


If there was a simple way to sort two arrays at once, you wouldn't need the indices array and could keep a separate array for scores (to avoid reallocating into an array of pairs). Requires modifying the swap

I understand it's temporary(?) but passing SearchAlgorithmResult* is not the best interface

Yes, it's temporary. I added TODOs and also mentioned in the PR description that it will be fixed in the very next PR.

Good idea regarding the swap.

Also, take into account that now (on the current main branch) this method is not called at all 😄

dranikpg · 2025-06-30T20:12:14Z

src/server/search/doc_index.cc

+    search::SortOrder sort_order, search::SearchAlrgorithmResult* search_result) const {
+  auto fident = field.GetIdentifier(base_->schema, false);
+
+  if (IsSortableField(fident, base_->schema)) {


Nit: maybe let's not move logic higher than needed, doc_index knows itself if it's sortable or not, so it can do it internally, we just pass an accessor object unconditionally

dranikpg · 2025-06-30T20:17:40Z

src/server/search/search_family.cc

+  for (size_t i = 0; i < results.size(); ++i) {
+    if (!results[i].docs.empty()) {
+      const auto& first_doc = results[i].docs[0];
+      update_cached_value(i, first_doc, &cached_values);


nit: update_cached_value can really just be a get_value(i) 🤔

dranikpg · 2025-06-30T20:18:04Z

src/server/search/search_family.cc

+    for (size_t i = 0; i < results.size(); ++i) {
+      if (indexes[i] < results[i].docs.size() && (!next_doc || comparator(i, next_index))) {
+        next_doc = &results[i].docs[indexes[i]];
+        next_index = i;
+      }
+    }


this is actually where you can use a heap optimization with make_heap or something akin

Yes, it is temporary, note that the shards number is not big

Maybe later we can change this

romange · 2025-07-01T11:27:23Z

Let's not review this PR for now - it won't be submitted in this form as it's too big and not focused.

BagritsevichStepan requested review from dranikpg and vyavdoshenko June 29, 2025 20:21

BagritsevichStepan self-assigned this Jun 29, 2025

feat(search_family): Introduce merging of sorted search results from …

ff598ab

…shards Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

BagritsevichStepan force-pushed the search/speed-up-sort-limit branch from f7a6ddf to ff598ab Compare June 29, 2025 20:54

romange reviewed Jun 30, 2025

View reviewed changes

dranikpg reviewed Jun 30, 2025

View reviewed changes

dranikpg mentioned this pull request Jul 3, 2025

fix(search): Cut off results before serializing #5412

Merged

dranikpg closed this Jul 5, 2025

feat(search_family): Introduce merging of sorted search results from shards. FIRST PR #5381

feat(search_family): Introduce merging of sorted search results from shards. FIRST PR #5381

Uh oh!

Conversation

BagritsevichStepan commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BagritsevichStepan Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romange left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BagritsevichStepan Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dranikpg Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BagritsevichStepan Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

romange commented Jul 1, 2025

Uh oh!

Uh oh!

BagritsevichStepan commented Jun 29, 2025 •

edited

Loading

BagritsevichStepan Jun 30, 2025 •

edited

Loading

BagritsevichStepan Jun 30, 2025 •

edited

Loading

dranikpg Jul 1, 2025 •

edited

Loading

BagritsevichStepan Jun 30, 2025 •

edited

Loading