Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add approximate filters #607
Add approximate filters #607
Changes from all commits
b56713b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is technically not related to this PR, but the previous
if
statement throws a ConfigurationError ifdefault == 0
. All unit tests pass in both configurations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use a Python utility here but I couldn't find one in the standard library. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not aware of anything. Also, given that the list may not be sorted, it might be harder to do. That being said, I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is necessary when the dataset is first generated and then neighbors are filtered out. This occurs in some of the perf tools datasets like here. These datasets first generate true neighbors for the test vector, such as
[5, 2, 3]
, and then post-filter the neighbors with attributes that do not pass the filter. The documents without the attributes are replaced by-1
. However looking more closely at the dataset generation code I'm not sure that all of the-1
entries are at the end of the array. @martin-gaievski I see you wrote the filtering code, could you please clarify? ThanksThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they should be at the end of collection of nearest neighbors for each query, although I'm not clearly remember sorting them. Can you share example where -1 are in the row, but they are mixed with some other ids?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see any in testing, but wanted to ask since I couldn't find a sort. I took another look at the code for filtering neighbors and it looks like the initial array with all
-1
entries is overwritten from left to right, so any remaining-1
ids are at the end. I don't think we'll use these filter datasets anyways (since Heemin's nested fields script generatesk
true neighbors after filtering instead of generatingk
true neighbors and then post-filtering to have<k
neighbors), but I'll call out the assumption that all-1
are at the end in a comment.