Skip to content

Conversation

punAhuja
Copy link
Contributor

@punAhuja punAhuja commented Sep 22, 2025

Java support for Binary and Scalar Quantization
-Incorporated changes for indexing and searching on quantized dataset
-Added tests to test quantization, indexing and searching on quantized dataset, with quantized query

NOTE: These changes(#1104) were previously merged by accident, ahead of higher priority features, and later reverted (#1274) to reduce rework on the other PRs
Have rebased the branch based on latest branch-25.10

punAhuja and others added 30 commits July 16, 2025 16:27
This PR is a follow-up from rapidsai#902.
Still WIP (see self-comments on the changes) but I'd like some early feedback.

Authors:
  - Lorenzo Dematté (https://github.com/ldematte)
  - MithunR (https://github.com/mythrocks)

Approvers:
  - Chris Hegarty (https://github.com/ChrisHegarty)
  - MithunR (https://github.com/mythrocks)

URL: rapidsai#1024
This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one.

- Depends on rapidsai#1033 and rapidsai#1024
- ~~The new API has a `Object memorySegment` parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~
   - As discussed, we want to keep targeting Java 21 for the API. This means the API will return a `MethodHandle`, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment.
   - This factory method can then be used as shown in the tests (see the `DatasetHelper` convenience class/method).
- Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements).

Fixes rapidsai#698

Authors:
  - Lorenzo Dematté (https://github.com/ldematte)
  - MithunR (https://github.com/mythrocks)
  - Ben Frederickson (https://github.com/benfred)

Approvers:
  - Chris Hegarty (https://github.com/ChrisHegarty)
  - MithunR (https://github.com/mythrocks)

URL: rapidsai#1034
  -Added branching for quantization in GPU and CPU
  -Added some more tests
-Using DataType and removed precision
-refactored tests accordingly
-Using try-with resources in CagraQuery
Copy link

copy-pr-bot bot commented Sep 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants