-
Notifications
You must be signed in to change notification settings - Fork 711
Insufficient results may occur(has marked_deleted node) #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Insufficient results may occur(has marked_deleted node) #344
Conversation
If the top_candidate is insufficient, the problem can be supplemented by allowing the cadidate_set to search further when there is data.
Thanks! Will test the PR shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've test the PR. Thanks!
it does solve the problem (I've added a test), but causes a performance degradation.
Some form of performance degradation is unavoidable with deleted elements, but I the practice it to avoid it when a user does not uses deletes. To check that there is a preprocessor flag has_deletions which allows avoiding executing the code for deletes when there are none.
I've put on what to change in the comment.
Please add this, or alternatively, I can merge it myself (and credit you at the next release). Please let me know.
hnswlib/hnswalg.h
Outdated
@@ -271,7 +271,7 @@ namespace hnswlib { | |||
|
|||
std::pair<dist_t, tableint> current_node_pair = candidate_set.top(); | |||
|
|||
if ((-current_node_pair.first) > lowerBound) { | |||
if ((-current_node_pair.first) > lowerBound && top_candidates.size() == ef) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be
if ((-current_node_pair.first) > lowerBound && (top_candidates.size() == ef || has_deletions == false))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added it :) Thank you for your kind explanation.
(To check that there is a preprocessor flag has_deletions which allows avoiding executing the code for deletes when there are none. )
Thanks for the PR! |
Very similar in nature to nmslib#344
Very similar in nature to nmslib#344
Very similar in nature to #344
In the case of marked_deleted, Insufficient results may occur(edge case)
problem
If top_candidate is insufficient, additional search is performed for the Candidate_set.
When the experiment was conducted with 50% of deletion masking, about 3 out of 10,000 cases had insufficient results. (eg. 100 nodes have been requested, but 1 is returned.)
If all of the neighbors of the entry point are deleted nodes, the candidate_set contains data to be searched, but the lower_bound is not modified by the top_candidate, so there is a problem of stopping without further searching.
[Examples of cases that occurrence]
e.g Entry point is a normal node.

If all neighboring entries at the entry point are masked deletion nodes, the Candidate_set tour ends at the lower_bound.
e.g If there were no delete masked nodes, the Candidate_set would have been additionally toured.

solve
hnswlib/hnswlib/hnswalg.h
Lines 272 to 276 in ac43973
thanks :)