-
Notifications
You must be signed in to change notification settings - Fork 99
[FEA] Build Single Linkage Linkage API #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-25.06
Are you sure you want to change the base?
Conversation
tarang-jain
commented
Apr 12, 2025
•
edited
Loading
edited
- Expose functions for building the dendrogram on the mutual reachability graph
…nto build-linkage
/ok to test |
@tarang-jain, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test 2738db7 |
Notes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR Tarang! LGTM, just a few small suggestions : )
Co-authored-by: Jinsol Park <jinsolp@nvidia.com>
Co-authored-by: Jinsol Park <jinsolp@nvidia.com>
/ok to test |
@tarang-jain, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test a35ad32 |
/ok to test e0c2171 |
@tarang-jain, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/ |
/ok to test e0c2171 |
cuvs::distance::DistanceType metric, | ||
raft::device_vector_view<float, int> core_dists, | ||
raft::device_vector_view<int, int> indptr, | ||
raft::sparse::COO<float, int>& mutual_reachability_coo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the public API, can we please use raft::sparse::device_coo_matrix_view
for this? I'd prefer not to use deprecated RAFT APIs in cuVS public APIs
raft::device_vector_view<int, int> indptr, | ||
raft::sparse::COO<float, int>& mutual_reachability_coo, | ||
raft::device_vector_view<int, int> out_mst_src, | ||
raft::device_vector_view<int, int> out_mst_dst, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a raft::sparse::device_coo_matrix_view
for theout_mst_dst
, out_mst_src
, and out_mst_weights
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mst is sorted by weights (so the src and dst on the edges wont be in any ascending order). On the other hand, the rows and columns will be sorted if we have a COO, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok upon reading up a bit -- it looks like the COO representation does not have row and column index sorting as a hard requirement. I can definitely use a COO in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A coo isn’t required to be sorted. It’s what we call an edge list. The new sparse matrix types are just convenience containers so we don’t have to pass around and manage 3 separate arrays everywhere.
* @param[out] out_delta distances of output | ||
* @param[out] out_size cluster sizes of output | ||
*/ | ||
void build_mutual_reachability_linkage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put this in the cuvs::cluster::agglomerative::helpers
namespace to designate it as a public API that is more of a helper function.
@@ -21,6 +21,7 @@ | |||
|
|||
#include <raft/core/device_mdspan.hpp> | |||
#include <raft/core/resources.hpp> | |||
#include <raft/sparse/coo.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I'm thinking about this- why is this even in the agglomerative
APIs? If this is just the mutual reachaiblity piece, why isn't it in cuvs/neighbors/reachability
here? And why can't it be consolidated with the existing mutual_reachaiblity API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we might want to separate the connectivities from the linkage here- we have a 'mutual_reachability' API already. Can't we separate the mutual reachability (aka all-neighbors graph construction) from the linkage? We can still expose the linkage functions in cuVS, but putting them all together in cuvs seems too specific to hdbscan.
@@ -39,4 +39,34 @@ void single_linkage(raft::resources const& handle, | |||
handle, X, dendrogram, labels, metric, n_clusters, c); | |||
} | |||
} | |||
|
|||
void build_mutual_reachability_linkage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see- this has nothing to do w/ mutual reachability, this is just buildig a linkage. The mutual reachability part would have already been done upstream from here. I think you only need the linkage part here. This is essentialy just building the dendrogram. I would name it appropriately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we are indeed only doing the linkage here, but the connectivities functor used to connect components (to build the mst) is exclusively for Mutual Reachability. I wanted to capture the term "mutual_reachability" in the name to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't name things based on where they are used, we name based on what they are doing. If this isn't doing a mutual reachability then we don't include that in the name. Please name this accordingly or break apart as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are primitives that are used to construct a linkage (aka knn graph + mst) and dendrogram. The reason we have primitives is so they can be composed in different ways for different algorithms. The more we can break them apart the more reusable they become. I asked you to bring these piece into cuVS because knn graph, mst, and thus single linkage cluster (aka nearest neighbors clustering) are all neighborhood methods.
/ok to test 074153a |
/ok to test ee1d5b2 |