Skip to content

Insufficient results may occur(has marked_deleted node) #344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dorosy-yeong
Copy link
Contributor

In the case of marked_deleted, Insufficient results may occur(edge case)

problem

  • If top_candidate is insufficient, additional search is performed for the Candidate_set.

  • When the experiment was conducted with 50% of deletion masking, about 3 out of 10,000 cases had insufficient results. (eg. 100 nodes have been requested, but 1 is returned.)

  • If all of the neighbors of the entry point are deleted nodes, the candidate_set contains data to be searched, but the lower_bound is not modified by the top_candidate, so there is a problem of stopping without further searching.

[Examples of cases that occurrence]

e.g Entry point is a normal node.
If all neighboring entries at the entry point are masked deletion nodes, the Candidate_set tour ends at the lower_bound.
image1

e.g If there were no delete masked nodes, the Candidate_set would have been additionally toured.
image2

solve

  • If the top_candidate is insufficient, the problem can be supplemented by allowing the cadidate_set to search further when there is data.

hnswlib/hnswlib/hnswalg.h

Lines 272 to 276 in ac43973

std::pair<dist_t, tableint> current_node_pair = candidate_set.top();
if ((-current_node_pair.first) > lowerBound) {
break;
}

                std::pair<dist_t, tableint> current_node_pair = candidate_set.top();

                // TO-BE: I suggest
                if ((-current_node_pair.first) > lowerBound and top_candidates.size() == ef) {  
                //if ((-current_node_pair.first) > lowerBound) { // AS-IS 
                    break;
                }

thanks :)

If the top_candidate is insufficient, the problem can be supplemented by
allowing the cadidate_set to search further when there is data.
@yurymalkov
Copy link
Member

Thanks! Will test the PR shortly.

Copy link
Member

@yurymalkov yurymalkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've test the PR. Thanks!
it does solve the problem (I've added a test), but causes a performance degradation.
Some form of performance degradation is unavoidable with deleted elements, but I the practice it to avoid it when a user does not uses deletes. To check that there is a preprocessor flag has_deletions which allows avoiding executing the code for deletes when there are none.
I've put on what to change in the comment.
Please add this, or alternatively, I can merge it myself (and credit you at the next release). Please let me know.

@@ -271,7 +271,7 @@ namespace hnswlib {

std::pair<dist_t, tableint> current_node_pair = candidate_set.top();

if ((-current_node_pair.first) > lowerBound) {
if ((-current_node_pair.first) > lowerBound && top_candidates.size() == ef) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

if ((-current_node_pair.first) > lowerBound && (top_candidates.size() == ef || has_deletions == false))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it :) Thank you for your kind explanation.

(To check that there is a preprocessor flag has_deletions which allows avoiding executing the code for deletes
when there are none. )
@yurymalkov yurymalkov merged commit 67a7a1b into nmslib:develop Nov 8, 2021
@yurymalkov
Copy link
Member

Thanks for the PR!

kishorenc added a commit to typesense/hnswlib that referenced this pull request Nov 2, 2022
kishorenc added a commit to typesense/hnswlib that referenced this pull request Jan 12, 2023
dyashuni pushed a commit that referenced this pull request Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants