-
Notifications
You must be signed in to change notification settings - Fork 99
Wrapper for all-neighbors knn graph building #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.06
Are you sure you want to change the base?
Conversation
struct ivf_pq_params { | ||
cuvs::neighbors::ivf_pq::index_params build_params; | ||
cuvs::neighbors::ivf_pq::search_params search_params; | ||
float refinement_rate = 2.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to turn off refinement initially (set it to 1.0) but I'll defer to you on the recall/perf impact there.
/ok to test |
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a non-blocking review, and it's mostly just bookkeeping questions.
*/ | ||
auto build(const raft::resources& handle, | ||
raft::host_matrix_view<const float, int64_t, row_major> dataset, | ||
int64_t k, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an index param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understood your comment properly. Are you asking if it should be an index param because you think these parameters are not related to the index and therefore the name seems wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that k
should be separated from the index_params
as it is the degree of the graph and not part of the params for building the original index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an index parameter in the sense that the size of the index built depends on the value of k, making it a hyperparameter. It's not the same function as that in a query API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry I think I was looking at the wrong line.
We have k
inside the index
struct. If we had to choose between having k
inside index
vs index_params
, I think the former makes more sense..??
bool return_distances = false) -> index<int64_t, float>; | ||
|
||
/** | ||
* @brief Build a new all-neighbors index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct to assume that this API builds an approximate pairwise distance and index matrix? If yes, can we add that to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@divyegala it's an approximate knn, not an approximate pairwise distance matrix. All-neighbors just means we're computing the nearest neighbors for all the training vectors. I do agree we could define this more clearly in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't think of a better term, but wanted it expressed in the docs that this API builds graphs of shape (nrow, nrow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API builds graphs of shape (nrow x k) which has the top k nearest neighbors for each data. So it's a nearest-neighbor finder on the entire dataset (no separated build and search functions)
void check_metric(const index_params& params) | ||
{ | ||
if (std::holds_alternative<graph_build_params::nn_descent_params>(params.graph_build_params)) { | ||
auto allowed_metrics = params.metric == cuvs::distance::DistanceType::L2Expanded || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we allow inner product as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now depending on this PR
size_t num_cols = dataset.extent(1); | ||
|
||
int num_ranks = 0; | ||
cudaGetDeviceCount(&num_ranks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be part of the API for the user to tell us how many/which GPUs to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment below)
{ | ||
int num_ranks = 0; | ||
cudaGetDeviceCount(&num_ranks); | ||
if (num_ranks > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is the intention but I'm not a fan of the fact that we do multi-GPU build without explicit request from the user. Did I miss somewhere in the docs if it is default behavior? And if the intention is for it to be default, can we add a log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this API is to just dispatch multi gpu build / single gpu build depending on the given visible cuda devices (so that we don't have to add another parameters of n_gpus to this, and also to UMAP).
Ideally, all the user needs to do to exploit this feature should be changing the visible gpus when running the script like CUDA_VISIBLE_DEVICES=0,1,2,3...
without having to explicitly change any parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a user can have multiple visible CUDA devices for many reasons. Just because they have it visible, we should not assume they want us to use them all. This should be opt-in behavior ideally, but otherwise at least a log to let them know we're using all GPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is the intention but I'm not a fan of the fact that we do multi-GPU build without explicit request from the user.
100% agreed w/ divye here. In fact, @viclafargue built the single-node multi-GPU indexes for most of the ANN algorithms fairly quickly and the most time was spent on the API experience. We need to be very intentional and careful that we design the user experience and APIs to be consistent, flexible, and easy to use. Ideally we would be defining, configuring, and otherwise specifying multi-gpu configuration information in the underlying raft::resources
as much as possible so that we can separate the concerns here. This is where I think you should take a look at the raft::resources
APIs more deeply, @jinsolp (in raft/include/core/resource
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @divyegala @cjnolet . Looking through, I realized that we had raft::device_resources_snmg
(seems like a relatively new merge). Would we want to use this to handle resources without explicitly making calls like cudaSetDevice
?
But if we want the user to specify how many / which GPUs they want to use, should we have the device ids as some form of vector as part of the index_params
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if we want the user to specify how many / which GPUs they want to use, should we have the device ids as some form of vector as part of the index_params?
Absolutely not. That's my point- we want the resources/environment to be completely independent of the machine learning hyperparameters. The index_params should be the same no matter if single or multi-gpu. This is now making me realize that we need more than just device_resources_snmg
which always assumes nccl. I think what we need is for the algorithm to tell us whether we need a nccl clique and for the device_resources to respond appropriately, lazily, at runtime, based on the needs of the algorithm. This is actually why we use the factory pattern that we use inside the various raft::resource
instances. Ideally the device_resources_snmg
should carry with it ONLY the gpus that the user wants to use (we can default to all available when unspecified) and for the algorithm to ask for a nccl_clique when it needs it, or not. Does that make sense? We should be able to implement this with a moderately small change tot he current device_resources_snmg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR introduces a wrapper for building all-neighbors knn graphs.
The multi-GPU batched all-neighbors knn graph building algorithm is part of this PR, which enables all-neighbors graph building for datasets that do not fit on device memory.
Batching can be done by using
n_clusters > 1
. Multi-GPU feature is supported simply by usingCUDA_VISIBLE_DEVICES
. Further details below!Related Issue
batch knn multi-gpu
algorithm implementation #727[Supported ANN algorithms]
n_clusters=1
because using batching means we cannot guarantee 100% accuracy)[Supported data types]
Currently only supports fp32 data types, but plan to explore and add fp16 data types in the future.
[Number of nearest clusters and total number of clusters]
batch_ann::index
has two important configurable parameters.n_nearest_clusters
: number of clusters each data is assigned ton_clusters
: total number of clustersSome important notes about these two parameters;
n_nearest_clusters
/n_clusters
determines device memory usage.n_nearest_clusters
/n_clusters
) *num_rows_in_entire_data
number of rows will be put on device memory at once.n_nearest_clusters
/n_clusters
) = 2/10 and 2/20, the latter will use less device memory.n_nearest_clusters
results in better accuracy of the final all-neighbors knn graph.n_nearest_clusters
/n_clusters
) = 4/20 will have better accuracy than 2/10 at the cost of performance.n_clusters
=1, does not perform batching.Dependent on...
How to Use
Can be used like the following
the
batch_ann::build
internally runs on multi gpu or single gpu depending on visible cuda devices.Numbers
All experiments run on L40 GPUs
[Large dataset to check performance and memory usage]
Below is a result of running on a 269GB dataset (wiki all).
[Smaller dataset to verify recall]
It is impossible to generate ground truth all-neighbors graph for a large dataset, so below is the result on a 3.8GB dataset (Deep images) to check how
n_nearest_clusters
andn_clusters
affects recall.n_nearest_clusters
/n_clusters
) = 2/20 and 4/40. These two configurations use similar amount of device memory, but the latter has much better recall