-
Notifications
You must be signed in to change notification settings - Fork 130
[Java] Add Dataset based on MemorySegment
#1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Java] Add Dataset based on MemorySegment
#1034
Conversation
I've refactored the code a bit to better handle MemorySegment allocation/deallocation; now I am able to benchmark Dataset creation separately:
Of course this is a tiny part of building an index; even when I use a small index (size 100, dims 1024) the time to create the index still dominates
But there still is a benefit, and I have the poorest of the GPUs on my dev machine (it's a 2070 mobile). |
When I bump up data size ( |
Thanks for doing this @ldematte, I think that this is case for revisiting the Java 21/22 decision that we previously made. I'll take a closer look and give it some thought. |
…c Dataset benchmarks
…segment-dataset # Conflicts: # java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraIndex.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BruteForceIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/DatasetImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java # java/cuvs-java/src/test/java/com/nvidia/cuvs/CagraBuildAndSearchIT.java
# Conflicts: # java/cuvs-java/src/main/java/com/nvidia/cuvs/Dataset.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BruteForceIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/DatasetImpl.java # java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java
Hmm. There seem to be
|
Yeah, it's a chicken and egg problem :) I fixed those in another PR (#1089) but that PR depends on this to be merged first. These tests randomly fail because they are using resources and threading wrong. |
… into java/memorysegment-dataset
(BTW, the issue is already there, it's just random that it happened to fail in this PR -- better fix it sooner than later) |
/ok to test 8582abb |
@mythrocks is there anything else I can do to unblock this PR? |
/ok to test 63c4932 |
The CI builds should be fixed once #1114 is resolved. |
/ok to test 2e65a55 |
/merge |
This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one. - Depends on rapidsai#1033 and rapidsai#1024 - ~~The new API has a `Object memorySegment` parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~ - As discussed, we want to keep targeting Java 21 for the API. This means the API will return a `MethodHandle`, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment. - This factory method can then be used as shown in the tests (see the `DatasetHelper` convenience class/method). - Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements). Fixes rapidsai#698 Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) - Ben Frederickson (https://github.com/benfred) Approvers: - Chris Hegarty (https://github.com/ChrisHegarty) - MithunR (https://github.com/mythrocks) URL: rapidsai#1034
This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one. - Depends on rapidsai#1033 and rapidsai#1024 - ~~The new API has a `Object memorySegment` parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~ - As discussed, we want to keep targeting Java 21 for the API. This means the API will return a `MethodHandle`, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment. - This factory method can then be used as shown in the tests (see the `DatasetHelper` convenience class/method). - Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements). Fixes rapidsai#698 Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) - Ben Frederickson (https://github.com/benfred) Approvers: - Chris Hegarty (https://github.com/ChrisHegarty) - MithunR (https://github.com/mythrocks) URL: rapidsai#1034
This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one. - Depends on rapidsai#1033 and rapidsai#1024 - ~~The new API has a `Object memorySegment` parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this out~~ - As discussed, we want to keep targeting Java 21 for the API. This means the API will return a `MethodHandle`, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment. - This factory method can then be used as shown in the tests (see the `DatasetHelper` convenience class/method). - Benchmarks show a sizeable speedup -- it is still tiny related to the "big picture" (index build time), but there is an improvement and above all we avoid a whole new copy of the input data (halving the memory requirements). Fixes rapidsai#698 Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) - Ben Frederickson (https://github.com/benfred) Approvers: - Chris Hegarty (https://github.com/ChrisHegarty) - MithunR (https://github.com/mythrocks) URL: rapidsai#1034
In #902 and #1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. #1105 / #1102 or #1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from #1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: #1111
…#1111) In rapidsai#902 and rapidsai#1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. rapidsai#1105 / rapidsai#1102 or rapidsai#1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from rapidsai#1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: rapidsai#1111
…#1111) In rapidsai#902 and rapidsai#1034 we introduced a `Dataset` interface to support on-heap and off-heap ("native") memory seamlessly as inputs for cagra and bruteforce index building. As we expand the functionality of cuvs-java, we realized we have similar needs for outputs (see e.g. rapidsai#1105 / rapidsai#1102 or rapidsai#1104). This PR extends `Dataset` to support being used as an output, wrapping native (off-heap) memory in a convenient and efficient way, and providing common utilities to transform to and from on-heap memory. This work is inspired by the existing raft `mdspan` and `DLTensor` data structures, but tailored to our needs (2d only, just 3 data types, etc.). The PR keeps the current implementation simple and minimal on purpose, but structured in a way that is simple to extend. By itself, the PR is just a refactoring to extend the `Dataset` implementation and reorganize the implementation classes; its real usefulness will be in using it in the PRs mentioned above (in fact, this PR has been extracted from rapidsai#1105). The implementation class hierarchy is implemented with future extensions in mind: atm we have one `HostMemoryDatasetImpl`, but we are already thinking to have a corresponding `DeviceMemoryDatasetImpl` that will wrap and manage (views) on GPU memory to avoid (in some cases) extra copies of data from GPU memory to CPU memory only to process them or forward them to another algorithm (e.g quantization followed by indexing). Future work will also include add support/refactoring to allocate and manage GPU memory and DLTensors (e.g. working better with/refactoring `prepareTensor`). Authors: - Lorenzo Dematté (https://github.com/ldematte) - MithunR (https://github.com/mythrocks) Approvers: - MithunR (https://github.com/mythrocks) URL: rapidsai#1111
This PR adds the ability to define a Dataset directly over a MemorySegment, "wrapping" it instead of allocating a new one.
Dataset
#1024The new API has aObject memorySegment
parameter, as we target Java 21 for the API (but 22 for the implementation); it works but it's definitely a hack and we need to sort this outMethodHandle
, and the Java 22 implementation will use it to return a factory method to build a Dataset from a MemorySegment.DatasetHelper
convenience class/method).Fixes #698