Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAGRA API update and allow async host refinement #131

Merged
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
4416b0b
enable asynchronous host-refinement for cagra index build
mfoerste4 May 17, 2024
946e106
move alloc out of OMP region
mfoerste4 May 17, 2024
7c6644e
added test for refinement -- disabled until detail::build API is exposed
mfoerste4 May 17, 2024
2c3f5e4
expose more cagra build parameters
mfoerste4 May 22, 2024
4d5ff0c
clarify code
mfoerste4 May 22, 2024
545032b
fix merge conflict
mfoerste4 May 22, 2024
d2c1dad
slightly reduce expected recall to catch refinement 1 results on all …
mfoerste4 May 24, 2024
9823831
cleanup API - use std::variant to switch algo of cagra graph build
mfoerste4 May 29, 2024
f310971
fix c-api nn_descent
mfoerste4 May 29, 2024
c1136c5
move documentation to API header
mfoerste4 May 29, 2024
bb87c97
resolve merge conflicts
mfoerste4 May 29, 2024
8506c6a
correct typo
mfoerste4 May 29, 2024
3a6e776
review suggestions
mfoerste4 May 29, 2024
42aa3eb
check valid nn_descent graph-degree / rename build_paramas to graph_b…
mfoerste4 May 29, 2024
68f490b
updating docs
mfoerste4 May 29, 2024
95f6064
Update CAGRA API (NN descent build by default)
tfeher May 30, 2024
3e22b13
Fix style
tfeher May 30, 2024
fd66402
remove cagra's build_host and build_device functions
tfeher May 30, 2024
8b1fbb8
remove debug printouts
tfeher May 30, 2024
e688453
Merge remote-tracking branch 'origin/branch-24.06' into cagra_ivfpq_i…
tfeher May 30, 2024
f5dff7d
fix style
tfeher May 30, 2024
71c779a
Merge branch 'branch-24.06' into cagra_ivfpq_index_async_refine
cjnolet May 30, 2024
2540dbf
Fix style
tfeher May 30, 2024
72d7b62
Fix CAGRA C API default algo params
tfeher May 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 61 additions & 29 deletions cpp/include/cuvs/neighbors/cagra.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
#include "common.hpp"
#include <cuvs/distance/distance.hpp>
#include <cuvs/neighbors/common.hpp>
#include <cuvs/neighbors/ivf_pq.hpp>
#include <cuvs/neighbors/nn_descent.hpp>
#include <raft/core/device_mdspan.hpp>
#include <raft/core/host_device_accessor.hpp>
#include <raft/core/host_mdspan.hpp>
Expand Down Expand Up @@ -385,35 +387,65 @@ struct index : cuvs::neighbors::index {
* @defgroup cagra_cpp_index_build CAGRA index build functions
* @{
*/
auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const float, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<float, uint32_t>;

auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const float, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<float, uint32_t>;

auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const int8_t, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<int8_t, uint32_t>;

auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const int8_t, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<int8_t, uint32_t>;

auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<uint8_t, uint32_t>;

auto build(raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset)
-> cuvs::neighbors::cagra::index<uint8_t, uint32_t>;
auto build(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the documentation from cagra.cuh shall be moved here. @cjnolet to confirm.

Copy link
Member

@cjnolet cjnolet May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct- we should document every function / prototype exposed in include/cuvs/**.hpp. It seems repetitive but the API docs are fully indexed and searchable so when a user finds one of these functions through search, it's guaranteed to have the docs carried along with it.

raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const float, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<float, uint32_t>;

auto build(
raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const float, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<float, uint32_t>;

auto build(
raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const int8_t, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<int8_t, uint32_t>;

auto build(
raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const int8_t, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<int8_t, uint32_t>;

auto build(
raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::device_matrix_view<const uint8_t, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<uint8_t, uint32_t>;

auto build(
raft::resources const& handle,
const cuvs::neighbors::cagra::index_params& params,
raft::host_matrix_view<const uint8_t, int64_t, raft::row_major> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params = std::nullopt,
Copy link
Member

@cjnolet cjnolet May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awkward- a user is going to have to set std::nullopt for at least one of nn-descent_params or ivf_pq params. I think we should make these build_params for cagra. Actually, I think what we should do is take away the graph_build_algo option from build_params and add a graph_build_params option that accepts the cuvs::neighbors::build_params base class, documents it approprirately with what's allowed, and then dispatches accordingly. Also remember to raise an exception if anything other than ivf_pq or nn_descent is used. This allows us to add more graph build algos in the future without too much additional effort but it also makes the public APIs super simple for the user- by default we can just set the build_params to the default nn_descent::build_params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second though- since you need refine_ratio on the ivf_pq_build_params, why not have a new type cagra::graph_build_params which can extend cuvs::neighbors::build_params? Then you can have a cagra::graph_build_params::ivf_pq which extends ivf_pq::build_params and cagra::graph_build_params and adds the refine ratio. Then you can have cagra__graph_build_params::nn_descent which does the same for nn-descent. This way you can have cagra::build_params.graph_build_params which accepts anything that extends cagra::graph_build_params and dispatches the build algo accordingly? This would get all this additional stuff out of cagra::build and keep the API clean and easy to follow, while minimizing duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjnolet , i am not sure whether I understand correctly what you are proposing. An abstract graph_build_params struct that is inherited from different variants, e.g.

namespace ivf_pq { 
  struct graph_build_params : cuvs::neighbors::cagra::index_params {
    cuvs::neighbors::ivf_pq::index_params pq_build_params;
    cuvs::neighbors::ivf_pq::search_params pq_search_params;
    float refinement_rate = 2f;
};
} // namespace

for the ivf_pq variant? To my understanding this would require to either implement interfaces individually or use of polymorphism/pointers that can be casted/dispatched. Do I understand your proposal correctly?

As an alternative cleanup step we could also combine additional variant-specific-parameters into (ivf_pq::)graph_build_params and add them as a single argument to the cagra::index_params

struct index_params : cuvs::neighbors::index_params {
  size_t intermediate_graph_degree = 128;
  size_t graph_degree = 64;
  std::optional<vpq_params> compression = std::nullopt;
  bool construct_index_with_dataset = true;
  std::variant<ivf_pq::graph_build_params, nn_descent::graph_build_params> graph_build_params;
};

What do you think should we go for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your last idea looks great. It's clean, explicit, and tells the user right away what graph_build_params variants are supported.

std::optional<float> refine_rate = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params = std::nullopt,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params = std::nullopt,
bool construct_index_with_dataset = true) -> cuvs::neighbors::cagra::index<uint8_t, uint32_t>;
/**
* @}
*/
Expand Down
21 changes: 19 additions & 2 deletions cpp/src/neighbors/cagra.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -279,6 +279,11 @@ void optimize(
* @param[in] res
* @param[in] params parameters for building the index
* @param[in] dataset a matrix view (host or device) to a row-major matrix [n_rows, dim]
* @param[in] nn_descent_params optional parameters for NN_DESCENT (default: nullopt)
* @param[in] refine_rate optional refinement rate for IVF_PQ (default: nullopt)
* @param[in] pq_build_params optional build parameters for IVF_PQ (default: nullopt)
* @param[in] search_params optional search parameters for IVF_PQ (default: nullopt)
* @param[in] construct_index_with_dataset optional boolean (default: true)
*
* @return the constructed cagra index
*/
Expand All @@ -289,9 +294,21 @@ template <typename T,
index<T, IdxT> build(
raft::resources const& res,
const index_params& params,
raft::mdspan<const T, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset)
raft::mdspan<const T, raft::matrix_extent<int64_t>, raft::row_major, Accessor> dataset,
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params,
std::optional<float> refine_rate,
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params,
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params,
bool construct_index_with_dataset)
{
return cuvs::neighbors::cagra::detail::build<T, IdxT, Accessor>(res, params, dataset);
return cuvs::neighbors::cagra::detail::build<T, IdxT, Accessor>(res,
params,
dataset,
nn_descent_params,
refine_rate,
pq_build_params,
search_params,
construct_index_with_dataset);
}

/**
Expand Down
32 changes: 28 additions & 4 deletions cpp/src/neighbors/cagra_build_float.cu
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,42 @@ namespace cuvs::neighbors::cagra {
#define RAFT_INST_CAGRA_BUILD(T, IdxT) \
auto build(raft::resources const& handle, \
const cuvs::neighbors::cagra::index_params& params, \
raft::device_matrix_view<const T, int64_t, raft::row_major> dataset) \
raft::device_matrix_view<const T, int64_t, raft::row_major> dataset, \
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params, \
std::optional<float> refine_rate, \
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params, \
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params, \
bool construct_index_with_dataset) \
->cuvs::neighbors::cagra::index<T, IdxT> \
{ \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, params, dataset); \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, \
params, \
dataset, \
nn_descent_params, \
refine_rate, \
pq_build_params, \
search_params, \
construct_index_with_dataset); \
} \
\
auto build(raft::resources const& handle, \
const cuvs::neighbors::cagra::index_params& params, \
raft::host_matrix_view<const T, int64_t, raft::row_major> dataset) \
raft::host_matrix_view<const T, int64_t, raft::row_major> dataset, \
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params, \
std::optional<float> refine_rate, \
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params, \
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params, \
bool construct_index_with_dataset) \
->cuvs::neighbors::cagra::index<T, IdxT> \
{ \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, params, dataset); \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, \
params, \
dataset, \
nn_descent_params, \
refine_rate, \
pq_build_params, \
search_params, \
construct_index_with_dataset); \
} \
\
void build_device(raft::resources const& handle, \
Expand Down
32 changes: 28 additions & 4 deletions cpp/src/neighbors/cagra_build_int8.cu
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,42 @@ namespace cuvs::neighbors::cagra {
#define RAFT_INST_CAGRA_BUILD(T, IdxT) \
auto build(raft::resources const& handle, \
const cuvs::neighbors::cagra::index_params& params, \
raft::device_matrix_view<const T, int64_t, raft::row_major> dataset) \
raft::device_matrix_view<const T, int64_t, raft::row_major> dataset, \
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params, \
std::optional<float> refine_rate, \
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params, \
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params, \
bool construct_index_with_dataset) \
->cuvs::neighbors::cagra::index<T, IdxT> \
{ \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, params, dataset); \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, \
params, \
dataset, \
nn_descent_params, \
refine_rate, \
pq_build_params, \
search_params, \
construct_index_with_dataset); \
} \
\
auto build(raft::resources const& handle, \
const cuvs::neighbors::cagra::index_params& params, \
raft::host_matrix_view<const T, int64_t, raft::row_major> dataset) \
raft::host_matrix_view<const T, int64_t, raft::row_major> dataset, \
std::optional<cuvs::neighbors::nn_descent::index_params> nn_descent_params, \
std::optional<float> refine_rate, \
std::optional<cuvs::neighbors::ivf_pq::index_params> pq_build_params, \
std::optional<cuvs::neighbors::ivf_pq::search_params> search_params, \
bool construct_index_with_dataset) \
->cuvs::neighbors::cagra::index<T, IdxT> \
{ \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, params, dataset); \
return cuvs::neighbors::cagra::build<T, IdxT>(handle, \
params, \
dataset, \
nn_descent_params, \
refine_rate, \
pq_build_params, \
search_params, \
construct_index_with_dataset); \
} \
\
void build_device(raft::resources const& handle, \
Expand Down
Loading
Loading