Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable sdc table for HNSWPQ read-only indices #1518

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

jmazanec15
Copy link
Member

Description

Passes flag to disable sdc table for the HNSWPQ indices. This table is only used by HNSWPQ during graph creation to compare nodes already present in graph. When we call load index, the graph is read only. Hence, we wont be doing any ingestion and so the table can be disabled to save some memory.

Along with this, added a unit test and a couple test helper methods for generating random data.

Issues Resolved

#1507 partial

Faiss issue: facebookresearch/faiss#3246.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15 jmazanec15 added Bug Fixes Changes to a system or product designed to handle a programming bug/glitch backport 2.x labels Mar 7, 2024
Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>
Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.09%. Comparing base (1303182) to head (8790bf7).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1518      +/-   ##
============================================
- Coverage     85.11%   85.09%   -0.02%     
  Complexity     1281     1281              
============================================
  Files           168      168              
  Lines          5232     5232              
  Branches        495      495              
============================================
- Hits           4453     4452       -1     
- Misses          572      573       +1     
  Partials        207      207              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@ryanbogan ryanbogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@jmazanec15 jmazanec15 merged commit c9262f5 into opensearch-project:main Mar 7, 2024
51 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 7, 2024
Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit c9262f5)
junqiu-lei pushed a commit to junqiu-lei/k-NN that referenced this pull request Mar 7, 2024
Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit c9262f5)
junqiu-lei added a commit that referenced this pull request Mar 7, 2024
* Manually install zlib for win CI (#1513)

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 231ad93)

* Upgrade faiss to 12b92e9 (#1509)

Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit 1303182)

* Disable sdc table for HNSWPQ read-only indices (#1518)

Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>
(cherry picked from commit c9262f5)

---------

Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei added a commit that referenced this pull request Mar 12, 2024
* Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (#1402)

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator.
Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Using int64_t instead of long type for GetLongArrayElements

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add IDSelectorJlongBitmap

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class
3. Spotless apply

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase remote-tracking branch 'origin/main' into Filter

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* tidy

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add Changelog

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix javadoc tasks

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix bwc javadoc

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase faiss_wrapper.cpp

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector as Byte.SIZE

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For comments

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (#1505)

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Manually install zlib for win CI (#1513)

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Upgrade faiss to 12b92e9 (#1509)

Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Disable sdc table for HNSWPQ read-only indices (#1518)

Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Support distance type radius search for Lucene engine

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add RNNQueryFactory class

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add javadoc

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: Junqiu Lei <junqiu@amazon.com>
Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
Co-authored-by: Varun Jain <varunudr@amazon.com>
Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei added a commit to junqiu-lei/k-NN that referenced this pull request Mar 15, 2024
…ject#1498)

* Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402)

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator.
Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Using int64_t instead of long type for GetLongArrayElements

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add IDSelectorJlongBitmap

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class
3. Spotless apply

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase remote-tracking branch 'origin/main' into Filter

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* tidy

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add Changelog

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix javadoc tasks

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix bwc javadoc

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase faiss_wrapper.cpp

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector as Byte.SIZE

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For comments

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505)

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Manually install zlib for win CI (opensearch-project#1513)

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Upgrade faiss to 12b92e9 (opensearch-project#1509)

Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518)

Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Support distance type radius search for Lucene engine

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add RNNQueryFactory class

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add javadoc

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: Junqiu Lei <junqiu@amazon.com>
Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
Co-authored-by: Varun Jain <varunudr@amazon.com>
Co-authored-by: John Mazanec <jmazane@amazon.com>
junqiu-lei added a commit to junqiu-lei/k-NN that referenced this pull request Mar 19, 2024
…ject#1498)

* Optimize Faiss Query With Filters: Reduce iteration and memory for id filter (opensearch-project#1402)

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Optimize Faiss Query With Filters. Reduce iteration copy for docid set iterator.
Use Bitmap And Batch to do id filter. and you sparse or fixed bitset do exact ANN search

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Using int64_t instead of long type for GetLongArrayElements

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add IDSelectorJlongBitmap

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* 1. Add IDSelectorJlongBitmap and UT for it
2. Move FilterIdsSelectorType to a util class
3. Spotless apply

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase remote-tracking branch 'origin/main' into Filter

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* tidy

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Add Changelog

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix javadoc tasks

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* fix bwc javadoc

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Rebase faiss_wrapper.cpp

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For description Select different FilterIdsSelectorType

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector as Byte.SIZE

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* UpdatedFilterIdsSelector For comments

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>

* Increment 2.12.0-SNAPSHOT to 2.13.0-SNAPSHOT in BWC workflow (opensearch-project#1505)

Signed-off-by: Varun Jain <varunudr@amazon.com>

* Manually install zlib for win CI (opensearch-project#1513)

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Upgrade faiss to 12b92e9 (opensearch-project#1509)

Upgrades faiss to facebookresearch/faiss@12b92e9. Cleanup outdated patches.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Disable sdc table for HNSWPQ read-only indices (opensearch-project#1518)

Passes flag to disable sdc table for the HNSWPQ indices. This table is
only used by HNSWPQ during graph creation to compare nodes already
present in graph. When we call load index, the graph is read only.
Hence, we wont be doing any ingestion and so the table can be disabled
to save some memory.

Along with this, added a unit test and a couple test helper methods for
generating random data.

Signed-off-by: John Mazanec <jmazane@amazon.com>

* Support distance type radius search for Lucene engine

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve comments

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add RNNQueryFactory class

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Add javadoc

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

* Resolve feedback

Signed-off-by: Junqiu Lei <junqiu@amazon.com>

---------

Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
Signed-off-by: Varun Jain <varunudr@amazon.com>
Signed-off-by: John Mazanec <jmazane@amazon.com>
Signed-off-by: Junqiu Lei <junqiu@amazon.com>
Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
Co-authored-by: Varun Jain <varunudr@amazon.com>
Co-authored-by: John Mazanec <jmazane@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Bug Fixes Changes to a system or product designed to handle a programming bug/glitch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants