[WIP] Add disk2disk serialization foe ACE Algorithm #1410

mfoerste4 · 2025-10-06T11:59:11Z

This PR adds a serialization routine that allows combination of dataset, graph, and mapping as per step (3) of #1404. The data will be combined on-the-fly while streamed from disk to disk while trying to minimize the required host memory.

It is build on top of #1404 . More details to follow.

CC @tfeher , @julianmi

copy-pr-bot · 2025-10-06T11:59:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@anaruse

- Adds the out-of-tree ACE method of @anaruse. This assumes graphs smaller than host memory. - Adds disk_enabled` and `graph_build_dir` parameters to select ACE method.

- Use partitions instead of clusters in ACE to distinguish between ACE clusters and regular KNN graph building clusters.

- Introduced dynamic configuration of nprobes and nlists for IVF-PQ based on partition size to improve KNN graph construction. - Added logging for both IVF-PQ and NN-Descent parameters to provide better insights during the graph building process. - Ensured default parameters are set when no specific graph build parameters are provided.

- Added logic to identify and merge small partitions that do not meet the minimum size requirement for stable KNN graph construction.

- Replaced `disk_enabled` and `graph_build_dir` with `ace_npartitions` and `ace_build_dir` in the parameter parsing logic. - Updated function signatures and documentation to clarify the new partitioning approach for ACE builds.

- Introduced new functions for reordering and storing datasets on disk, optimizing for NVMe performance. - Clarified namings.

…gement - Added a `file_descriptor` class to manage file descriptors with RAII, ensuring proper resource cleanup. - Updated file handling in `ace_write_large_file` and `ace_reorder_and_store_dataset` to use the new wrapper. - Improved error handling and logging for file operations. - Enhanced input validation in `build_ace` function for better robustness.

…tion handling - Introduced a minimum partition size parameter to improve partition stability. - Replaced standard K-means with balanced K-means for more even partition sizes. - Implemented logic to reassign vectors from small partitions to the nearest larger ones. - Added detailed logging for partition statistics and warnings for imbalances.

- Introduced `ace_read_large_file` function for efficient reading of large files in chunks. - Improved error handling and logging in file operations. - Refactored existing file handling in `ace_write_large_file` and `ace_reorder_and_store_dataset` to utilize the new reading function.

- Introduced methods to check if the index is stored on disk and to retrieve the file directory. - Added functionality to set disk storage parameters within the index structure. - Updated the `build_ace` function to set the disk-based index when use_disk = true.

Resolves rapidsai#1344 Authors: - Anupam (https://github.com/aamijar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1402

Currently only IVF-PQ can be used as the graph building algorithm (NN Descent does not support Cosine). As a result, we are limited by IVF-PQ's restriction of data to be of float / half type for the Cosine metric. This PR also fixes an in-place data modification that was being done by IVF-PQ. Opportunities for optimization: NN Descent to support Cosine and compute dataset norms only once -- during NN Descent. Re-use those for CAGRA. [UPDATE 08/21/2025]: NN Descent now support Cosine. This PR allows the initial CAGRA graph to be built by both methods -- IVF_PQ, NN_DESCENT. The IVF_PQ restriction on data types holds, but uint8 and int8 can be supported with NN Descent as the graph building algorithm. ITERATIVE CAGRA SEARCH is currently disabled for Cosine. [UPDATE 09/23/2025]: This PR also adds Cosine support for IVF_PQ with uint8 / int8 inputs. The above mentioned restriction with IVF_PQ has been removed. So with this PR CAGRA supports Cosine wholly, for float, uint8 and int8 inputs. ITERATIVE_SEARCH however still has some issues as the graph building method with the Cosine metric and has been disabled. [UPDATE 09/25/2025]: Binary size comparison for libcuvs.so (CUDA 12.9, x86): branch-25.10: 1154.42 MB This PR: 1160.73 MB Total CAGRA testing time: branch-25.10: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 825.43 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.58 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 663.97 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 397.57 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 408.16 sec ``` This PR: ``` Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 1830.34 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.45 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 1444.14 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 973.64 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 1010.46 sec ``` [UPDATE 09/30/2025]: Updates to CAGRA C++ tests according to the latest PR reviews. New total CAGRA testing time: branch-25.10: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 16.99 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 803.64 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.49 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 667.89 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 420.49 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 429.57 sec ``` This PR: ``` Start 9: NEIGHBORS_ANN_CAGRA_TEST_BUGS 18/37 Test rapidsai#9: NEIGHBORS_ANN_CAGRA_TEST_BUGS ........... Passed 26.62 sec Start 10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST 19/37 Test rapidsai#10: NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST ... Passed 973.23 sec Start 11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST 20/37 Test rapidsai#11: NEIGHBORS_ANN_CAGRA_HELPERS_TEST ........ Passed 0.43 sec Start 12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST 21/37 Test rapidsai#12: NEIGHBORS_ANN_CAGRA_HALF_UINT32_TEST .... Passed 702.02 sec Start 13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST 22/37 Test rapidsai#13: NEIGHBORS_ANN_CAGRA_INT8_UINT32_TEST .... Passed 491.65 sec Start 14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST 23/37 Test rapidsai#14: NEIGHBORS_ANN_CAGRA_UINT8_UINT32_TEST ... Passed 541.43 sec ``` Fixes rapidsai#1288 Fixes rapidsai#389 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#197

- Added a parameter to disable dataset attachment during index building. - Refactored dataset file handling to preallocate space for reordered, augmented, and mapping datasets, improving performance and error handling. - Updated logging to reflect sizes in GiB for consistency. - Adjusted partition handling to prevent creation of excessively small partitions, issuing a warning when necessary.

- Introduced ace_ef_construction parameter to enhance index quality during ACE builds. - Refactored to use to_cagra_params.

- Added ACE as a new graph building algorithm in the CAGRA framework. - Introduced `cuvsAceParams` structure to encapsulate ACE-specific parameters including `ace_npartitions`, `ace_ef_construction`, `ace_build_dir`, and `ace_use_disk`. - Updated relevant parsing functions to handle ACE parameters from configuration. - Refactored index parameter handling to support ACE builds, ensuring compatibility with existing graph building methods. - Enhanced documentation and comments for clarity on ACE functionality.

- Removes the augmented dataset from file to clear up space.

- Updated the CAGRA parameters to use `hnsw_to_cagra_params` for better compatibility with HNSW indexing. - Enhanced documentation for ACE partitioning parameters, clarifying the impact of partition sizes on performance and memory usage. - Added the HNSW to_cagra_params routine to not break the API.

- Changed `graph_build_params` type to a void pointer for flexibility in specifying different graph build parameters. - Enhanced documentation to clarify the usage of `graph_build_params` for various algorithms. - Updated ACE build directory handling to validate path length and characters, resetting to a default if invalid. - Improved logging messages for ACE build operations to reflect memory unit changes.

- The disk mode tests fail until rapidsai#1410 is merged.

@anaruse

This PR introduces **Augmented Core Extraction (ACE)**, an approach proposed by @anaruse for building CAGRA indices on very large datasets that exceed GPU memory capacity. ACE enables users to build high-quality approximate nearest neighbor search indices on datasets that would otherwise be impossible to process on a single GPU. The approach uses the host memory if large enough and falls back to the disk if required. This work is a collaboration: @anaruse, @tfeher, @achirkin, @mfoerste4 ## Algorithm Description 1. **Dataset Partitioning**: The dataset is partitioned using balanced k-means clustering on sampled data. Each vector is assigned to its two closest partition centroids (primary and augmented). The primary partitions are non-overlapping. The augmentation ensures that cross-partition edges are captured in the final graph. Partitions smaller than a minimum threshold are automatically merged with larger partitions to ensure computational efficiency and graph quality. Vectors from small partitions are reassigned to the nearest valid partitions. 2. **Per-Partition Graph Building**: For each partition, a sub-index is built independently (regular `build_knn_graph()` flow) with its primary vectors plus augmented vectors from neighboring partitions. 3. **Graph Combining**: The per-partition graphs are combined into a single unified CAGRA index. Merging is not needed since the primary partitions are non-overlapping. The in-memory variant remaps the local partition IDs to global dataset IDs to create a correct index. The disk variant stores the backward index mappings (`dataset_mapping.bin`), the reordered dataset (`reordered_dataset.bin`) and the optimized CAGRA graph (`cagra_graph.bin`) on disk. The index is then incomplete as show by `cuvs::neighbors::index::on_disk()`. The files are stored in `cuvs::neighbors::index::file_directory()`. The HNSW index serialization was provided by @mfoerste4 in #1410, which was merged here. This adds the `serialize_to_hnsw()` serialization routine that allows combination of dataset, graph, and mapping. The data will be combined on-the-fly while streamed from disk to disk while trying to minimize the required host memory. The host needs enough memory to hold the index though. ## Core Components - **`ace_build()`**: Main routine which users should call. - **`ace_get_partition_labels()`**: Performs balanced k-means clustering to assign each vector to two closest partitions while handling small partition merging. - **`ace_create_forward_and_backward_lists()`**: Creates bidirectional ID mappings between original dataset indices and reordered partition-local indices. - **`ace_set_index_params()`**: Set the index parameters based on the partition and augmented dataset to ensure an efficient KNN graph building. - **`ace_gather_partition_dataset()`**: In-memory only: gather the partition and augmented dataset. - **`ace_adjust_sub_graph_ids`**: In-memory only: Adjust ids in sub search graph and store them into the main search graph. - **`ace_adjust_final_graph_ids`**: In-memory only: Map graph neighbor IDs from reordered space back to original vector IDs. - **`ace_reorder_and_store_dataset`**: Disk only: Reorder the dataset based on partitions and store to disk. Uses write buffers to improve performance. - **`ace_load_partition_dataset_from_disk`**: Disk only: Load partition dataset and augmented dataset from disk. - **`file_descriptor` and `ace_read_large_file()` / `ace_write_large_file()`**: RAII file handle and chunked file I/O operations. - **CAGRA index changes**: Added `on_disk_` flag and `file_directory_` to the CAGRA index structure to support disk-backed indices. - **CAGRA parameter changes**: Added `ace_npartitions` and `ace_build_dir` to the CAGRA parameters for users to specify that ACE should be used and which directory should be used if required. ## Usage ### C++ API ```cpp #include <cuvs/neighbors/cagra.hpp> using namespace cuvs::neighbors; // Configure index parameters cagra::index_params params; params.ace_npartitions = 10; // Number of partitions (unset or <= 1 to disable ACE) params.ace_build_dir = "/tmp/ace_build"; // Directory for intermediate files (should be a fast NVMe) params.graph_degree = 64; params.intermediate_graph_degree = 128; // Build ACE index (dataset can be on host memory) auto dataset = raft::make_host_matrix<float, int64_t>(n_rows, n_cols); // ... load dataset ... auto index = cagra::build_ace(res, params, dataset.view(), params.ace_npartitions); // Search works identically to standard CAGRA if the host has enough memory (index.on_disk() == false) cagra::search_params search_params; auto neighbors = raft::make_device_matrix<uint32_t>(res, n_queries, k); auto distances = raft::make_device_matrix<float>(res, n_queries, k); cagra::search(res, search_params, index, queries, neighbors.view(), distances.view()); ``` ### Storage Requirements 1. `cagra_graph.bin`: `n_rows * graph_degree * sizeof(IdxT)` 2. `dataset_mapping.bin`: `n_rows * sizeof(IdxT)` 2. `reordered_dataset.bin`: Size of the input dataset 3. `augmented_dataset.bin`: Size of the input dataset Authors: - Julian Miller (https://github.com/julianmi) - Anupam (https://github.com/aamijar) - Tarang Jain (https://github.com/tarang-jain) - Malte Förster (https://github.com/mfoerste4) - Jake Awe (https://github.com/AyodeAwe) - Bradley Dice (https://github.com/bdice) - Artem M. Chirkin (https://github.com/achirkin) - Jinsol Park (https://github.com/jinsolp) Approvers: - MithunR (https://github.com/mythrocks) - Robert Maynard (https://github.com/robertmaynard) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #1404

@anaruse

This PR introduces **Augmented Core Extraction (ACE)**, an approach proposed by @anaruse for building CAGRA indices on very large datasets that exceed GPU memory capacity. ACE enables users to build high-quality approximate nearest neighbor search indices on datasets that would otherwise be impossible to process on a single GPU. The approach uses the host memory if large enough and falls back to the disk if required. This work is a collaboration: @anaruse, @tfeher, @achirkin, @mfoerste4 ## Algorithm Description 1. **Dataset Partitioning**: The dataset is partitioned using balanced k-means clustering on sampled data. Each vector is assigned to its two closest partition centroids (primary and augmented). The primary partitions are non-overlapping. The augmentation ensures that cross-partition edges are captured in the final graph. Partitions smaller than a minimum threshold are automatically merged with larger partitions to ensure computational efficiency and graph quality. Vectors from small partitions are reassigned to the nearest valid partitions. 2. **Per-Partition Graph Building**: For each partition, a sub-index is built independently (regular `build_knn_graph()` flow) with its primary vectors plus augmented vectors from neighboring partitions. 3. **Graph Combining**: The per-partition graphs are combined into a single unified CAGRA index. Merging is not needed since the primary partitions are non-overlapping. The in-memory variant remaps the local partition IDs to global dataset IDs to create a correct index. The disk variant stores the backward index mappings (`dataset_mapping.bin`), the reordered dataset (`reordered_dataset.bin`) and the optimized CAGRA graph (`cagra_graph.bin`) on disk. The index is then incomplete as show by `cuvs::neighbors::index::on_disk()`. The files are stored in `cuvs::neighbors::index::file_directory()`. The HNSW index serialization was provided by @mfoerste4 in rapidsai#1410, which was merged here. This adds the `serialize_to_hnsw()` serialization routine that allows combination of dataset, graph, and mapping. The data will be combined on-the-fly while streamed from disk to disk while trying to minimize the required host memory. The host needs enough memory to hold the index though. ## Core Components - **`ace_build()`**: Main routine which users should call. - **`ace_get_partition_labels()`**: Performs balanced k-means clustering to assign each vector to two closest partitions while handling small partition merging. - **`ace_create_forward_and_backward_lists()`**: Creates bidirectional ID mappings between original dataset indices and reordered partition-local indices. - **`ace_set_index_params()`**: Set the index parameters based on the partition and augmented dataset to ensure an efficient KNN graph building. - **`ace_gather_partition_dataset()`**: In-memory only: gather the partition and augmented dataset. - **`ace_adjust_sub_graph_ids`**: In-memory only: Adjust ids in sub search graph and store them into the main search graph. - **`ace_adjust_final_graph_ids`**: In-memory only: Map graph neighbor IDs from reordered space back to original vector IDs. - **`ace_reorder_and_store_dataset`**: Disk only: Reorder the dataset based on partitions and store to disk. Uses write buffers to improve performance. - **`ace_load_partition_dataset_from_disk`**: Disk only: Load partition dataset and augmented dataset from disk. - **`file_descriptor` and `ace_read_large_file()` / `ace_write_large_file()`**: RAII file handle and chunked file I/O operations. - **CAGRA index changes**: Added `on_disk_` flag and `file_directory_` to the CAGRA index structure to support disk-backed indices. - **CAGRA parameter changes**: Added `ace_npartitions` and `ace_build_dir` to the CAGRA parameters for users to specify that ACE should be used and which directory should be used if required. ## Usage ### C++ API ```cpp #include <cuvs/neighbors/cagra.hpp> using namespace cuvs::neighbors; // Configure index parameters cagra::index_params params; params.ace_npartitions = 10; // Number of partitions (unset or <= 1 to disable ACE) params.ace_build_dir = "/tmp/ace_build"; // Directory for intermediate files (should be a fast NVMe) params.graph_degree = 64; params.intermediate_graph_degree = 128; // Build ACE index (dataset can be on host memory) auto dataset = raft::make_host_matrix<float, int64_t>(n_rows, n_cols); // ... load dataset ... auto index = cagra::build_ace(res, params, dataset.view(), params.ace_npartitions); // Search works identically to standard CAGRA if the host has enough memory (index.on_disk() == false) cagra::search_params search_params; auto neighbors = raft::make_device_matrix<uint32_t>(res, n_queries, k); auto distances = raft::make_device_matrix<float>(res, n_queries, k); cagra::search(res, search_params, index, queries, neighbors.view(), distances.view()); ``` ### Storage Requirements 1. `cagra_graph.bin`: `n_rows * graph_degree * sizeof(IdxT)` 2. `dataset_mapping.bin`: `n_rows * sizeof(IdxT)` 2. `reordered_dataset.bin`: Size of the input dataset 3. `augmented_dataset.bin`: Size of the input dataset Authors: - Julian Miller (https://github.com/julianmi) - Anupam (https://github.com/aamijar) - Tarang Jain (https://github.com/tarang-jain) - Malte Förster (https://github.com/mfoerste4) - Jake Awe (https://github.com/AyodeAwe) - Bradley Dice (https://github.com/bdice) - Artem M. Chirkin (https://github.com/achirkin) - Jinsol Park (https://github.com/jinsolp) Approvers: - MithunR (https://github.com/mythrocks) - Robert Maynard (https://github.com/robertmaynard) - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1404

cjnolet · 2025-11-19T21:33:18Z

@mfoerste4 has this already been rolled into the other ACE PR? Can we safely close this one?

mfoerste4 · 2025-11-19T22:49:00Z

@mfoerste4 has this already been rolled into the other ACE PR? Can we safely close this one?

Yes, this can be closed.

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Oct 6, 2025

github-project-automation bot moved this to Todo in Vector Search, ML, & Data Mining Release Board Oct 6, 2025

julianmi and others added 27 commits October 6, 2025 14:48

Integrate @anaruse's ACE method for large graphs

5e11ce6

- Adds the out-of-tree ACE method of @anaruse. This assumes graphs smaller than host memory. - Adds disk_enabled` and `graph_build_dir` parameters to select ACE method.

ACE: Clarify partition naming

2d64ff3

- Use partitions instead of clusters in ACE to distinguish between ACE clusters and regular KNN graph building clusters.

ACE: Implement merging of small partitions

4a779a4

- Added logic to identify and merge small partitions that do not meet the minimum size requirement for stable KNN graph construction.

ACE: Update parameters to clarify ace method usage

9f5d31c

- Replaced `disk_enabled` and `graph_build_dir` with `ace_npartitions` and `ace_build_dir` in the parameter parsing logic. - Updated function signatures and documentation to clarify the new partitioning approach for ACE builds.

ACE: Add timinings

5552873

ACE: Remove unused vector_fwd_list_1 in build_ace

68a0ad8

ACE: Check if we have enough host memory

0e86d18

ACE: Restructure parameter setting

12b0366

ACE: Restructure small partition merging

b0f1d04

ACE: Refactor partition data gathering

e25375f

ACE: Refactor forward backward list creation

434bb4d

ACE: Refactor id adjusting of sub search graph

3b1010f

ACE: Refactor id adjusting of final search graph

31431ba

ACE: Refactor partition label handling and dataset storage

1c53df2

- Introduced new functions for reordering and storing datasets on disk, optimizing for NVMe performance. - Clarified namings.

ACE: Improve file I/O speeds

128031b

ACE: Reduce logging

8efde31

ACE: Fix issue in main loop logging

eecb1a4

ACE: Store backward mapping for HNSW

f9fc127

ACE: Formatting

5056eca

Move eigen_solvers from raft (rapidsai#1402)

b2819f4

Resolves rapidsai#1344 Authors: - Anupam (https://github.com/aamijar) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1402

Merge remote-tracking branch 'upstream/branch-25.12' into ace-disk

19eb437

julianmi and others added 17 commits October 14, 2025 19:10

ACE: Move build dir check

487a3a1

ACE: Add ace_ef_construction parameter for index quality control

aa4f20a

- Introduced ace_ef_construction parameter to enhance index quality during ACE builds. - Refactored to use to_cagra_params.

fix assert comment

df350f0

fix merge conflict

18df7c1

ACE: Fix overflow in byte offset calculation

a4315cf

Merge branch 'branch-25.12' into ace-disk

cdb8077

support hierarchy::none in from_cagra disk-index

56ef910

ACE: Add missing c interfaces

6e20374

ACE: Clean up augmented file

097f78c

- Removes the augmented dataset from file to clear up space.

properly release mmap

7e74a1c

buffer ofstream

19589b8

ACE: Add example

bead16d

ACE: Improve Java and Python interfaces

9459fc3

tfeher added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 20, 2025

mfoerste4 and others added 3 commits October 20, 2025 15:49

Merge remote-tracking branch 'julian/ace-disk' into ace_serialize

201c3b5

Merge branch 'branch-25.12' into ace-disk

c9b39b4

Merge remote-tracking branch 'julian/ace-disk' into ace_serialize

bd4742e

julianmi added a commit to julianmi/cuvs that referenced this pull request Oct 22, 2025

ACE: Add CAGRA ACE unit tests

40752f9

- The disk mode tests fail until rapidsai#1410 is merged.

julianmi mentioned this pull request Oct 22, 2025

Add Augmented Core Extraction Algorithm #1404

Merged

mfoerste4 closed this Nov 19, 2025

github-project-automation bot moved this from Todo to Done in Vector Search, ML, & Data Mining Release Board Nov 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add disk2disk serialization foe ACE Algorithm #1410

[WIP] Add disk2disk serialization foe ACE Algorithm #1410

Uh oh!

mfoerste4 commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

cjnolet commented Nov 19, 2025

Uh oh!

mfoerste4 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

[WIP] Add disk2disk serialization foe ACE Algorithm #1410

[WIP] Add disk2disk serialization foe ACE Algorithm #1410

Uh oh!

Conversation

mfoerste4 commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

cjnolet commented Nov 19, 2025

Uh oh!

mfoerste4 commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants