Skip to content

add tag search #411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

add tag search #411

wants to merge 3 commits into from

Conversation

caohassl
Copy link

@caohassl caohassl commented Sep 5, 2022

I notice the issue(Support "skipping" nodes during traversal? #294) and support it , further supports tag search~

1、when building the index , enable to define the tag of the vector
2、when the build is finished,add a inverted file to store those tags
3、when querying, use the inverted file to skip nodes by tag (or id)
4、add a python file to test

@yurymalkov
Copy link
Member

Hi @caohassl ,

Thank you for PR! It seems it has some conflict (in terms of functionality) with #402, so we would need to figure out how to merge them.
It also seems that windows tests are failing due to dirent.h

@caohassl
Copy link
Author

caohassl commented Sep 6, 2022

Hi @yurymalkov

Thanks for the reply!
I have read the PR #402,it seems that the ID positive filtering(keeping the nodes we need) has already been supported .

The first scenario is that we need to discard some unnecessary nodes,not only just keep the nodes we need in the scan.
The second scenario is that we also need to find the TOPN that satisfies the particular tag, rather than taking TOPN + 1000 and filtering out 1000

To support ID filtering(positive and negative) and tag filtering,I commit the PR (and fix the ci error)。

@caohassl caohassl closed this Sep 9, 2022
@dyashuni
Copy link
Contributor

dyashuni commented Sep 9, 2022

Hi @caohassl, why did you close the PR?

@caohassl
Copy link
Author

hi @dyashuni

I realized that a lot of filtering may cause performance problem, so I closed the PR.
I will try to add a threshold to jump out the scan earlier, if the performance is good, and then I will reopen it

@dyashuni
Copy link
Contributor

@caohassl Thank you. Yes, heavy filtering makes search slow.
I think there is no need to create a new class SearchHNSW because its code is almost the same as the code of the HierarchicalNSW class. You can use the HierarchicalNSW class with a custom implementation of FilterFunctor to avoid code duplication. If serialization is required you can add serialization/deserialization methods to your implementation of FilterFunctor as well.

@caohassl caohassl deleted the develop branch September 11, 2022 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants