[Java] Add Dataset based on `MemorySegment` #1034

ldematte · 2025-06-19T10:09:17Z

This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one.

Depends on [Java] Add Java API benchmarks #1033 and [Java] Encapsulate on-heap float arrays into Dataset #1024
~~The new API has a Object memorySegment parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~
- As discussed, we want to keep targeting Java 21 for the API. This means the API will return a MethodHandle, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment.
- This factory method can then be used as shown in the tests (see the DatasetHelper convenience class/method).
Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements).

Fixes #698

copy-pr-bot · 2025-06-19T10:09:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…nchmarks

ldematte · 2025-06-19T13:33:47Z

I've refactored the code a bit to better handle MemorySegment allocation/deallocation; now I am able to benchmark Dataset creation separately:


Benchmark                                          (dims)  (size)  Mode  Cnt      Score       Error  Units
CagraIndexBenchmarks.testDatasetFromHeap             1024     100  avgt    5  45798.418 ± 12296.887  ns/op
CagraIndexBenchmarks.testDatasetFromMemorySegment    1024     100  avgt    5      2.736 ±     0.016  ns/op

Of course this is a tiny part of building an index; even when I use a small index (size 100, dims 1024) the time to create the index still dominates

Benchmark                                           (dims)  (size)   Mode  Cnt    Score   Error  Units
CagraIndexBenchmarks.testIndexingFromHeap             1024     100  thrpt    5  123.619 ± 1.854  ops/s
CagraIndexBenchmarks.testIndexingFromMemorySegment    1024     100  thrpt    5  184.753 ± 2.229  ops/s

But there still is a benefit, and I have the poorest of the GPUs on my dev machine (it's a 2070 mobile).

ldematte · 2025-06-19T13:46:36Z

When I bump up data size (-p size=N), of course the margin becomes smaller and smaller as the index build time dominates; still, if the data comes from off-heap memory (e.g. memory mapped file), we potentially use half the memory for the input data (or 1/3 in some cases), which I think is the key point.

ChrisHegarty · 2025-06-19T14:17:53Z

Thanks for doing this @ldematte, I think that this is case for revisiting the Java 21/22 decision that we previously made. I'll take a closer look and give it some thought.

…c Dataset benchmarks

…fferently

…array-dataset

…segment-dataset # Conflicts: # java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraIndex.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BruteForceIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/DatasetImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java # java/cuvs-java/src/test/java/com/nvidia/cuvs/CagraBuildAndSearchIT.java

# Conflicts: # java/cuvs-java/src/main/java/com/nvidia/cuvs/Dataset.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BruteForceIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/DatasetImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java

mythrocks · 2025-07-10T21:26:03Z

Hmm. There seem to be java test failures here:

Error:  Failures: 
Error:    CagraBuildAndSearchIT.testIndexingAndSearchingFlow:145->queryAndCompare:464->CuVSTestCase.checkResults:148 expected:<[{0=0.83774555, 2=0.3590463, 3=0.038782578}, {0=0.12472608, 1=0.31918612, 2=0.21700792}, {0=0.48305473, 2=0.20332818, 3=0.047766715}, {0=0.59063464, 1=0.15224178, 3=0.5986642}]> but was:<[{0=0.23431994, 2=0.23431946, 3=0.23431946}, {0=0.68692017, 2=0.6869197, 3=0.6869197}, {0=0.33261782, 2=0.3326173, 3=0.3326173}, {0=0.3379389, 2=0.33793885, 3=0.33793885}]>
Error:  Errors: 
Error:    CagraBuildAndSearchIT.testIndexing:233->runConcurrently:85 ? Execution java.lang.AssertionError: Exception while executing runnable: java.lang.RuntimeException: cuvsCagraBuild returned 0[RAFT failure at file=/tmp/conda-bld-output/bld/rattler-build_libcuvs/work/cpp/src/neighbors/detail/cagra/graph_core.cuh line=1406: Could not generate an intermediate CAGRA graph because the initial kNN graph contains too many invalid or duplicated neighbor nodes. This error can occur, for example, if too many overflows occur during the norm computation between the dataset vectors.
Obtained 6 stack frames
#1 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so(+0x49117d) [0x7c03a02ec17d]
#2 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: void cuvs::neighbors::cagra::detail::graph::optimize<unsigned int, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >(raft::resources const&, std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >, std::experimental::mdspan<unsigned int, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<unsigned int>, (raft::memory_type)0> >, bool, bool) +0x1aa3 [0x7c03a0d3b4d3]
#3 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: cuvs::neighbors::cagra::index<float, unsigned int> cuvs::neighbors::cagra::detail::build<float, unsigned int, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> >(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> >) +0xd81 [0x7c03a0d629b1]
#4 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs.so: cuvs::neighbors::cagra::build(raft::resources const&, cuvs::neighbors::cagra::index_params const&, std::experimental::mdspan<float const, std::experimental::extents<long, 18446744073709551615ul, 18446744073709551615ul>, std::experimental::layout_right, raft::host_device_accessor<std::experimental::default_accessor<float const>, (raft::memory_type)2> >) +0x21 [0x7c03a0d1d761]
#5 in /opt/conda/envs/java/lib/jvm/lib/server/../../../libcuvs_c.so: cuvsCagraBuild +0x806 [0x7c04352dbb96]
#6 in [0x7c045c0a6ec7]

ldematte · 2025-07-11T06:50:36Z

Hmm. There seem to be java test failures here:

Yeah, it's a chicken and egg problem :) I fixed those in another PR (#1089) but that PR depends on this to be merged first. These tests randomly fail because they are using resources and threading wrong.
I will cherry pick the test fixes from #1089 so we can unblock the situation.

…segment-dataset

… into java/memorysegment-dataset

ldematte · 2025-07-11T07:35:20Z

(BTW, the issue is already there, it's just random that it happened to fail in this PR -- better fix it sooner than later)
@mythrocks, I cherry picked the test fix, it is good to go again.

mythrocks · 2025-07-11T21:30:52Z

/ok to test 8582abb

ldematte · 2025-07-14T15:12:03Z

@mythrocks is there anything else I can do to unblock this PR?

mythrocks · 2025-07-14T15:43:46Z

/ok to test 63c4932

mythrocks · 2025-07-14T23:05:10Z

The CI builds should be fixed once #1114 is resolved.

benfred · 2025-07-15T01:53:45Z

/ok to test 2e65a55

mythrocks · 2025-07-15T04:52:00Z

/merge

This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one. - Depends on rapidsai#1033 and rapidsai#1024 - ~~The new API has a `Object memorySegment` parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~ - As discussed, we want to keep targeting Java 21 for the API. This means the API will return a `MethodHandle`, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment. - This factory method can then be used as shown in the tests (see the `DatasetHelper` convenience class/method). - Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements). Fixes rapidsai#698 Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) - Ben Frederickson (https://github.com/benfred) Approvers: - Chris Hegarty (https://github.com/ChrisHegarty) - MithunR (https://github.com/mythrocks) URL: rapidsai#1034

In #902 and #1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. #1105 / #1102 or #1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from #1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: #1111

…#1111) In rapidsai#902 and rapidsai#1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. rapidsai#1105 / rapidsai#1102 or rapidsai#1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from rapidsai#1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: rapidsai#1111

ldematte added 2 commits June 16, 2025 17:31

Use Dataset to encapsulate on-heap arrays too

d0f53c0

Missing assert; renaming

114285a

ldematte added 6 commits June 19, 2025 12:36

Add benchmarks project

02d58ab

Merge remote-tracking branch 'upstream/branch-25.08' into java/add-be…

42b4bde

…nchmarks

Add benchmarks README

84ec3d9

Merge branch 'java/add-benchmarks' into java/memorysegment-dataset

1a155ce

Add MemorySegment based Dataset (0-copy) + benchmark

01ca37e

Alternative: move MemorySegment creation "up", at params build time

86d9575

ldematte and others added 6 commits June 19, 2025 17:38

Different MemorySegment lifetime management (reduced scope) + specifi…

7f91a23

…c Dataset benchmarks

Different MemorySegment lifetime management (reduced scope)

3f348c6

Move addVector to Dataset.Builder

97c01ed

Extending Dataset lifetime to coincide with Index lifetime

052b62f

Reverting dataset change for BruteForceIndex -- it will be handled di…

47cc5bc

…fferently

Merge branch 'branch-25.08' into java/float-array-dataset

c54faba

ChrisHegarty mentioned this pull request Jun 30, 2025

Examine bumping cuvs-java to a minimum of Java 22 #1066

Closed

cjnolet assigned ldematte Jun 30, 2025

cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jun 30, 2025

cjnolet added this to Vector Search, ML, & Data Mining Release Board Jun 30, 2025

cjnolet moved this to In Progress in Vector Search, ML, & Data Mining Release Board Jun 30, 2025

ldematte added 6 commits July 2, 2025 12:10

Merge remote-tracking branch 'upstream/branch-25.08' into java/float-…

30f4441

…array-dataset

MemorySegment-based Dataset via MethodHandle

635b7f0

MemorySegment-based Dataset via MethodHandle

3708192

Protect against Dataset double-close

07fd375

ldematte added 4 commits July 11, 2025 08:51

Merge remote-tracking branch 'upstream/branch-25.08' into java/memory…

6159eb4

…segment-dataset

Cherry-pick test fixes

80e8333

Merge branch 'java/memorysegment-dataset' of github.com:ldematte/cuvs…

ab2338b

… into java/memorysegment-dataset

More test fixes

c855b9b

Merge branch 'branch-25.08' into java/memorysegment-dataset

8582abb

Merge branch 'branch-25.08' into java/memorysegment-dataset

63c4932

ldematte requested a review from a team as a code owner July 12, 2025 05:58

ldematte mentioned this pull request Jul 14, 2025

[Java] Extend Dataset to work as an output data container #1111

Merged

Merge branch 'branch-25.08' into java/memorysegment-dataset

2e65a55

This was referenced Jul 15, 2025

[Java] Fix POM #1106

Merged

[Java] Tidy up MemorySegments lifecycle #1069

Merged

rapids-bot bot merged commit 32d6d83 into rapidsai:branch-25.08 Jul 15, 2025
53 checks passed

github-project-automation bot moved this from In Progress to Done in Vector Search, ML, & Data Mining Release Board Jul 15, 2025

mythrocks moved this to Done in Elasticsearch + cuVS Team Jul 16, 2025

mythrocks added this to Elasticsearch + cuVS Team Jul 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Java] Add Dataset based on `MemorySegment` #1034

[Java] Add Dataset based on `MemorySegment` #1034

Uh oh!

ldematte commented Jun 19, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Jun 19, 2025

Uh oh!

ldematte commented Jun 19, 2025

Uh oh!

ldematte commented Jun 19, 2025

Uh oh!

ChrisHegarty commented Jun 19, 2025

Uh oh!

mythrocks commented Jul 10, 2025 •

edited

Loading

Uh oh!

ldematte commented Jul 11, 2025

Uh oh!

ldematte commented Jul 11, 2025

Uh oh!

mythrocks commented Jul 11, 2025

Uh oh!

ldematte commented Jul 14, 2025 •

edited

Loading

Uh oh!

mythrocks commented Jul 14, 2025

Uh oh!

mythrocks commented Jul 14, 2025

Uh oh!

benfred commented Jul 15, 2025

Uh oh!

mythrocks commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

[Java] Add Dataset based on MemorySegment #1034

[Java] Add Dataset based on MemorySegment #1034

Uh oh!

Conversation

ldematte commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Jun 19, 2025

Uh oh!

ldematte commented Jun 19, 2025

Uh oh!

ldematte commented Jun 19, 2025

Uh oh!

ChrisHegarty commented Jun 19, 2025

Uh oh!

mythrocks commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldematte commented Jul 11, 2025

Uh oh!

ldematte commented Jul 11, 2025

Uh oh!

mythrocks commented Jul 11, 2025

Uh oh!

ldematte commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mythrocks commented Jul 14, 2025

Uh oh!

mythrocks commented Jul 14, 2025

Uh oh!

benfred commented Jul 15, 2025

Uh oh!

mythrocks commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

[Java] Add Dataset based on `MemorySegment` #1034

[Java] Add Dataset based on `MemorySegment` #1034

ldematte commented Jun 19, 2025 •

edited

Loading

mythrocks commented Jul 10, 2025 •

edited

Loading

ldematte commented Jul 14, 2025 •

edited

Loading