Skip to content

[Feat] Expose C API for CAGRA merge #860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: branch-25.06
Choose a base branch
from

Conversation

rhdong
Copy link
Member

@rhdong rhdong commented Apr 30, 2025

  • Expose C API for CAGRA merge

@rhdong rhdong requested a review from a team as a code owner April 30, 2025 01:17
@github-actions github-actions bot added the cpp label Apr 30, 2025
@rhdong rhdong added feature request New feature or request non-breaking Introduces a non-breaking change labels Apr 30, 2025
*
*/
/** Strategy for merging CAGRA indices. */
typedef enum {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse this in the C++ version so we don't have to duplicate?

Copy link
Member Author

@rhdong rhdong Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can't do it because C doesn't support namespaces, :: , and enum class. It's not the first time we re-define in C( which should work well), like hash_mode vs cuvsCagraHashMode

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we forward it? Something like "using cuvsCagraMergeStrategy as cuvs::neighbors::cagra::merge_strategy"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this with the the "DistanceType` in pairwise distances. Maybe we can do the same thing here?

Copy link
Member Author

@rhdong rhdong May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this with the the "DistanceType` in pairwise distances. Maybe we can do the same thing here?

Sorry, I found we defined the separate DistanceType and converted it to cpp type via static_cast. Maybe I found the improper position. Could you hint me at the code position, by any chance, many thanks?

@rhdong rhdong requested a review from benfred May 2, 2025 00:07
Comment on lines 300 to 327
if (out_idx_params.graph_build_params) {
auto ivf_params = static_cast<cuvsIvfPqParams*>(out_idx_params.graph_build_params);
if (ivf_params->ivf_pq_build_params) {
auto bp = ivf_params->ivf_pq_build_params;
pq_params.build_params.add_data_on_build = bp->add_data_on_build;
pq_params.build_params.n_lists = bp->n_lists;
pq_params.build_params.kmeans_n_iters = bp->kmeans_n_iters;
pq_params.build_params.kmeans_trainset_fraction = bp->kmeans_trainset_fraction;
pq_params.build_params.pq_bits = bp->pq_bits;
pq_params.build_params.pq_dim = bp->pq_dim;
pq_params.build_params.codebook_kind =
static_cast<cuvs::neighbors::ivf_pq::codebook_gen>(bp->codebook_kind);
pq_params.build_params.force_random_rotation = bp->force_random_rotation;
pq_params.build_params.conservative_memory_allocation =
bp->conservative_memory_allocation;
pq_params.build_params.max_train_points_per_pq_code = bp->max_train_points_per_pq_code;
}
if (ivf_params->ivf_pq_search_params) {
auto sp = ivf_params->ivf_pq_search_params;
pq_params.search_params.n_probes = sp->n_probes;
pq_params.search_params.lut_dtype = sp->lut_dtype;
pq_params.search_params.internal_distance_dtype = sp->internal_distance_dtype;
pq_params.search_params.preferred_shmem_carveout = sp->preferred_shmem_carveout;
}
if (ivf_params->refinement_rate > 1.0f) {
pq_params.refinement_rate = ivf_params->refinement_rate;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of this code seems cut and paste from the _build function - can we move into a function and share it so that we don't need to keep up to date in two spots?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@rhdong
Copy link
Member Author

rhdong commented May 4, 2025

/ok to test cbefdc6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cpp feature request New feature or request non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants