Implement off-heap quantized scoring #14863

kaivalnp · 2025-06-29T10:22:06Z

Description

Off-heap scoring for quantized vectors! Related to #13515

This scorer is in-line with Lucene99MemorySegmentFlatVectorsScorer, and will automatically be used with PanamaVectorizationProvider (i.e. on adding jdk.incubator.vector). Note that the computations are already vectorized, but we're avoiding the unnecessary copy to heap here..

I added off-heap Dot Product functions for two compressed 4-bit ints (i.e. no need to "decompress" them) -- I can try to come up with similar ones for Euclidean if this approach seems fine..

github-actions · 2025-06-29T10:22:59Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

kaivalnp · 2025-06-29T10:33:43Z

I ran some benchmarks on Cohere vectors (768d) for 7-bit and 4-bit (compressed) quantization..

main without jdk.incubator.vector:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.860        2.815   2.806        0.997  100000   100      50       64        250     7 bits     44.07       2269.17           46.79             1          373.72       366.592       73.624       HNSW
 0.545        3.193   3.185        0.997  100000   100      50       64        250     4 bits     47.26       2115.95           50.04             1          338.13       329.971       37.003       HNSW

main with jdk.incubator.vector:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.863        1.904   1.886        0.991  100000   100      50       64        250     7 bits     28.65       3490.65           29.66             1          373.69       366.592       73.624       HNSW
 0.545        1.313   1.305        0.994  100000   100      50       64        250     4 bits     22.86       4373.88           17.84             1          338.13       329.971       37.003       HNSW

This PR without jdk.incubator.vector:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.861        2.774   2.765        0.997  100000   100      50       64        250     7 bits     44.60       2242.00           46.71             1          373.73       366.592       73.624       HNSW
 0.545        3.147   3.139        0.997  100000   100      50       64        250     4 bits     47.93       2086.51           50.20             1          338.11       329.971       37.003       HNSW

This PR with jdk.incubator.vector:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.861        1.612   1.603        0.994  100000   100      50       64        250     7 bits     22.99       4349.53           24.78             1          373.70       366.592       73.624       HNSW
 0.545        1.277   1.269        0.994  100000   100      50       64        250     4 bits     21.60       4630.49           17.41             1          338.11       329.971       37.003       HNSW

I did see slight fluctuation across runs, but the search time was ~10% faster for 7-bit and very slightly faster for 4-bit (compressed). Indexing and force merge times have improved by ~15%

kaivalnp · 2025-06-29T10:46:33Z

FYI I observed a strange phenomenon where if the query vector is on heap like:

this.query = MemorySegment.ofArray(targetBytes);

instead of the current off-heap implementation in this PR:

this.query = Arena.ofAuto().allocateFrom(JAVA_BYTE, targetBytes);

..then we see a performance regression:

recall  latency(ms)  netCPU  avgCpuCount    nDoc  topK  fanout  maxConn  beamWidth  quantized  index(s)  index_docs/s  force_merge(s)  num_segments  index_size(MB)  vec_disk(MB)  vec_RAM(MB)  indexType
 0.862        3.043   3.034        0.997  100000   100      50       64        250     7 bits     23.25       4301.82           25.29             1          373.70       366.592       73.624       HNSW
 0.545        2.060   2.049        0.995  100000   100      50       64        250     4 bits     22.19       4506.33           17.99             1          338.17       329.971       37.003       HNSW

Maybe I'm missing something obvious, but I haven't found the root cause yet..

ChrisHegarty · 2025-06-30T14:04:21Z

..then we see a performance regression:
...
Maybe I'm missing something obvious, but I haven't found the root cause yet..

yeah. I've seen similar before. You might be hitting a problem with the loop bound not being hoisted. I will try to take a look.

kaivalnp · 2025-06-30T16:05:31Z

Thanks @ChrisHegarty! I saw that we use a heap-backed MemorySegment while scoring byte vectors -- so I opened #14874 to investigate if we can improve performance by moving to an off-heap query

github-actions · 2025-07-15T00:28:31Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

Implement off-heap quantized scoring

82e1faa

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking Jun 29, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking Jun 29, 2025

github-actions bot added the module:core/codecs label Jun 29, 2025

This was referenced Jun 29, 2025

Feature/scalar quantized off heap scoring #13497

Draft

Examine adding more off-heap vector scoring #13515

Open

kaivalnp mentioned this pull request Jun 30, 2025

Fix off-heap byte vector scoring at query time #14874

Open

github-actions bot added the Stale label Jul 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement off-heap quantized scoring #14863

Implement off-heap quantized scoring #14863

Uh oh!

kaivalnp commented Jun 29, 2025

Uh oh!

github-actions bot commented Jun 29, 2025

Uh oh!

kaivalnp commented Jun 29, 2025

Uh oh!

kaivalnp commented Jun 29, 2025

Uh oh!

ChrisHegarty commented Jun 30, 2025

Uh oh!

kaivalnp commented Jun 30, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

Uh oh!

Implement off-heap quantized scoring #14863

Are you sure you want to change the base?

Implement off-heap quantized scoring #14863

Uh oh!

Conversation

kaivalnp commented Jun 29, 2025

Description

Uh oh!

github-actions bot commented Jun 29, 2025

Uh oh!

kaivalnp commented Jun 29, 2025

Uh oh!

kaivalnp commented Jun 29, 2025

Uh oh!

ChrisHegarty commented Jun 30, 2025

Uh oh!

kaivalnp commented Jun 30, 2025

Uh oh!

github-actions bot commented Jul 15, 2025

Uh oh!

Uh oh!