-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Implement off-heap quantized scoring #14863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR. |
I ran some benchmarks on Cohere vectors (768d) for 7-bit and 4-bit (compressed) quantization..
This PR without
This PR with
I did see slight fluctuation across runs, but the search time was ~10% faster for 7-bit and very slightly faster for 4-bit (compressed). Indexing and force merge times have improved by ~15% |
FYI I observed a strange phenomenon where if the query vector is on heap like: this.query = MemorySegment.ofArray(targetBytes); instead of the current off-heap implementation in this PR: this.query = Arena.ofAuto().allocateFrom(JAVA_BYTE, targetBytes); ..then we see a performance regression:
Maybe I'm missing something obvious, but I haven't found the root cause yet.. |
yeah. I've seen similar before. You might be hitting a problem with the loop bound not being hoisted. I will try to take a look. |
Thanks @ChrisHegarty! I saw that we use a heap-backed |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Description
Off-heap scoring for quantized vectors! Related to #13515
This scorer is in-line with
Lucene99MemorySegmentFlatVectorsScorer
, and will automatically be used withPanamaVectorizationProvider
(i.e. on addingjdk.incubator.vector
). Note that the computations are already vectorized, but we're avoiding the unnecessary copy to heap here..I added off-heap Dot Product functions for two compressed 4-bit ints (i.e. no need to "decompress" them) -- I can try to come up with similar ones for Euclidean if this approach seems fine..