-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable sdc table for HNSWPQ read-only indices #1518
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jmazanec15
added
Bug Fixes
Changes to a system or product designed to handle a programming bug/glitch
backport 2.x
labels
Mar 7, 2024
jmazanec15
requested review from
heemin32,
navneet1v,
VijayanB,
vamshin,
naveentatikonda,
junqiu-lei,
martin-gaievski and
ryanbogan
as code owners
March 7, 2024 00:15
Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1518 +/- ##
============================================
- Coverage 85.11% 85.09% -0.02%
Complexity 1281 1281
============================================
Files 168 168
Lines 5232 5232
Branches 495 495
============================================
- Hits 4453 4452 -1
- Misses 572 573 +1
Partials 207 207 ☔ View full report in Codecov by Sentry. |
ryanbogan
approved these changes
Mar 7, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
junqiu-lei
approved these changes
Mar 7, 2024
opensearch-trigger-bot bot
pushed a commit
that referenced
this pull request
Mar 7, 2024
Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit c9262f5)
junqiu-lei
pushed a commit
to junqiu-lei/k-NN
that referenced
this pull request
Mar 7, 2024
Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit c9262f5)
junqiu-lei
added a commit
that referenced
this pull request
Mar 7, 2024
* Manually install zlib for win CI (#1513) Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit 231ad93) * Upgrade faiss to 12b92e9 (#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit 1303182) * Disable sdc table for HNSWPQ read-only indices (#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> (cherry picked from commit c9262f5) --------- Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei
added a commit
that referenced
this pull request
Mar 12, 2024
* Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * tidy Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add Changelog Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix javadoc tasks Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix bwc javadoc Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <luyuncheng@bytedance.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (#1505) Signed-off-by: Varun Jain <varunudr@amazon.com> * Manually install zlib for win CI (#1513) Signed-off-by: John Mazanec <jmazane@amazon.com> * Upgrade faiss to 12b92e9 (#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <jmazane@amazon.com> * Disable sdc table for HNSWPQ read-only indices (#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> * Support distance type radius search for Lucene engine Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add RNNQueryFactory class Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add javadoc Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> Signed-off-by: Varun Jain <varunudr@amazon.com> Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Junqiu Lei <junqiu@amazon.com> Co-authored-by: luyuncheng <luyuncheng@bytedance.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei
added a commit
to junqiu-lei/k-NN
that referenced
this pull request
Mar 15, 2024
…ject#1498) * Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * tidy Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add Changelog Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix javadoc tasks Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix bwc javadoc Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <luyuncheng@bytedance.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505) Signed-off-by: Varun Jain <varunudr@amazon.com> * Manually install zlib for win CI (opensearch-project#1513) Signed-off-by: John Mazanec <jmazane@amazon.com> * Upgrade faiss to 12b92e9 (opensearch-project#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <jmazane@amazon.com> * Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> * Support distance type radius search for Lucene engine Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add RNNQueryFactory class Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add javadoc Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> Signed-off-by: Varun Jain <varunudr@amazon.com> Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Junqiu Lei <junqiu@amazon.com> Co-authored-by: luyuncheng <luyuncheng@bytedance.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei
added a commit
to junqiu-lei/k-NN
that referenced
this pull request
Mar 19, 2024
…ject#1498) * Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402) * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator. Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Using int64_t instead of long type for GetLongArrayElements Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add IDSelectorJlongBitmap Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * 1. Add IDSelectorJlongBitmap and UT for it 2. Move FilterIdsSelectorType to a util class 3. Spotless apply Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase remote-tracking branch 'origin/main' into Filter Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * tidy Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Add Changelog Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix javadoc tasks Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * fix bwc javadoc Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Rebase faiss_wrapper.cpp Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector as Byte.SIZE Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * UpdatedFilterIdsSelector For comments Signed-off-by: luyuncheng <luyuncheng@bytedance.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> * Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505) Signed-off-by: Varun Jain <varunudr@amazon.com> * Manually install zlib for win CI (opensearch-project#1513) Signed-off-by: John Mazanec <jmazane@amazon.com> * Upgrade faiss to 12b92e9 (opensearch-project#1509) Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches. Signed-off-by: John Mazanec <jmazane@amazon.com> * Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518) Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory. Along with this, added a unit test and a couple test helper methods for generating random data. Signed-off-by: John Mazanec <jmazane@amazon.com> * Support distance type radius search for Lucene engine Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve comments Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add RNNQueryFactory class Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Add javadoc Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> * Resolve feedback Signed-off-by: Junqiu Lei <junqiu@amazon.com> --------- Signed-off-by: luyuncheng <luyuncheng@bytedance.com> Signed-off-by: Varun Jain <varunudr@amazon.com> Signed-off-by: John Mazanec <jmazane@amazon.com> Signed-off-by: Junqiu Lei <junqiu@amazon.com> Co-authored-by: luyuncheng <luyuncheng@bytedance.com> Co-authored-by: Varun Jain <varunudr@amazon.com> Co-authored-by: John Mazanec <jmazane@amazon.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory.
Along with this, added a unit test and a couple test helper methods for generating random data.
Issues Resolved
#1507 partial
Faiss issue: facebookresearch/faiss#3246.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.