Skip to content

fix(search_family): Fix SORTBY option in FT.SEARCH for non-sortable fields and KNN search #4942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

BagritsevichStepan
Copy link
Contributor

@BagritsevichStepan BagritsevichStepan commented Apr 15, 2025

fixes #4939, #4934

Fixes two issues. They are good described in the tickets above.

…ields and KNN search

fixes dragonflydb#4939, dragonflydb#4934

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
@vyavdoshenko
Copy link
Contributor

lgtm

@romange
Copy link
Collaborator

romange commented Apr 15, 2025

it's quite a big change, while it's clear what it fixes, it's not clear what was missing or faulty and how it was fixed.
I see lots of code was removed, so I am curious what happens in this PR.

@@ -50,6 +50,68 @@ void TraverseAllMatching(const DocIndex& index, const OpArgs& op_args, F&& f) {
} while (cursor);
}

bool IsSortableField(std::string_view field_identifier, const search::Schema& schema) {
Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange All the code here was simply moved. Previously, we had non-class functions defined in a non-anonymous namespace, so I moved them into an anonymous namespace.

using SortIndiciesFieldsList =
std::vector<std::pair<string_view /*identifier*/, string_view /*alias*/>>;

std::pair<SearchFieldsList, SortIndiciesFieldsList> PreprocessAggregateFields(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved. But I can revert this

if (params.sort_option) {
auto field_alias = params.sort_option->field.NameView();

auto comparator = [&](const SerializedSearchDoc* l_doc, const SerializedSearchDoc* r_doc) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other updates are related to the issue with sorting the non-sortable fields. Maybe I can remove changes to the SearchField and use just string for SortParams. But it is not a big change

@BagritsevichStepan
Copy link
Contributor Author

Here's what's happening: previously, sorting was performed inside the search logic itself, using a AST node — AstSortNode. This meant that sorting happened locally on each shard. And this will not work as needed.

Actually, since we are sorting only inside shards I think we can find a test case where the final result is not globally sorted.

I've now moved the sorting logic out of the shards into the coordinator thread — similar to how it's handled in the FT.AGGREGATE command

@romange
Copy link
Collaborator

romange commented Apr 15, 2025

For future, it is best to separate it into two PRs : the one that moves the code and does not change functionality and the one that changes functionality. otherwise it's a nightmare for a reviewer to review the code.

@romange
Copy link
Collaborator

romange commented Apr 15, 2025

comment: while I understand the bug, it is possible that sorting locally on each shard might still be beneficial in the future - because then you can merge-sort already sorted results coming from each shard in more efficient manner in O(logK*N)

@BagritsevichStepan
Copy link
Contributor Author

comment: while I understand the bug, it is possible that sorting locally on each shard might still be beneficial in the future - because then you can merge-sort already sorted results coming from each shard in more efficient manner in O(logK*N)

The issue here is that we also support sorting for KNN search. In this case, we first perform KNN, then apply LIMIT and OFFSET (i.e., take a subarray of the sorted KNN results), and then sort that subarray by a field value.

So, the possible solution here is to have separate logic for knn results and sort them in the coordinator thread. For basic sort we can still sort in the shard.
However, I don't think it's worth including this in 1.29. It would be better to create a new ticket and implement it properly in a separate PR.

Comment on lines +586 to +587
auto l_it = l.find(field_alias);
auto r_it = r.find(field_alias);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really slow

Copy link
Contributor Author

@BagritsevichStepan BagritsevichStepan Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it for the next version.
And it is not very slow. Since we will need to allocate more memory to fix this

@BagritsevichStepan BagritsevichStepan merged commit c81d990 into dragonflydb:main Apr 17, 2025
10 checks passed
@BagritsevichStepan BagritsevichStepan deleted the search/fix-sorting-for-nonsortable-fields branch April 17, 2025 11:43
kostasrim pushed a commit that referenced this pull request Apr 22, 2025
…ields and KNN search (#4942)

* fix(search_family): Fix SORTBY option in FT.SEARCH for non-sortable fields and KNN search

fixes #4939, #4934

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

* fix compilation error

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>

---------

Signed-off-by: Stepan Bagritsevich <stefan@dragonflydb.io>
Co-authored-by: Roman Gershman <roman@dragonflydb.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SORTBY is ignored when used with KNN in FT.SEARCH
4 participants