-
Notifications
You must be signed in to change notification settings - Fork 130
Add the example for building the CARGA index in a streaming fashion #1391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.10
Are you sure you want to change the base?
Add the example for building the CARGA index in a streaming fashion #1391
Conversation
@@ -0,0 +1,310 @@ | |||
/* | |||
* Copyright (c) 2024-2025, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a new file, we should only use the current year
|
||
namespace { | ||
|
||
void make_host_dataset(raft::host_matrix_view<float, int64_t, raft::row_major> dataset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we maybe call this "generate host dataset"? Honestly, I think I'd prefer to use make_blobs
from raft
than to just generate completely random uniform vectors. make_blobs at least has some locality to the vector space that can be exploited by IVFPQ and CAGRA. It would also look a little cleaner to have just a simple function call to "make_blobs" and then a copy to host.
|
||
} // namespace | ||
|
||
void streaming_cagra_build_example( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull everything above this point into a corresponding cagra_streaming_example.hpp
header for readability? Ideally the user would be able to follow through the meat of the example first, and then refer to the header for all the implementation details.
The other benefit to this approach is that they essentially can drop the header into their own project and ideally just copy/paste the relevant blocks from the source file into their own applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing. I was trying to follow the existing examples as closely as possible, but I like your idea.
/ok to test 877f61d |
This example shows how to build a CAGRA graph index by streaming host batches:
Closes #1146