-
Notifications
You must be signed in to change notification settings - Fork 335
Optimize check for vertex existence #4966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize check for vertex existence #4966
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this PR ready for review?
I see just debug print statements and commented out code. If the PR is not ready for review, you may better create a draft PR (this also saves CI resources).
cpp/src/c_api/graph_functions.cpp
Outdated
@@ -132,6 +132,7 @@ struct two_hop_neighbors_functor : public cugraph::c_api::abstract_functor { | |||
bool multi_gpu> | |||
void operator()() | |||
{ | |||
printf("\nin two_hop_neighbors \n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete debug print statements.
…4_optimize-vertex-existance-check
cpp/src/c_api/graph_functions.cpp
Outdated
rmm::device_uvector<vertex_t> vertex_array(1, handle_.get_stream()); | ||
|
||
cugraph::detail::sequence_fill( | ||
handle_.get_stream(), vertex_array.data(), vertex_array.size(), vertex_t(vertex_)); | ||
|
||
if constexpr (multi_gpu) { | ||
vertex_array = cugraph::shuffle_ext_vertices(handle_, std::move(vertex_array)); | ||
} | ||
|
||
cugraph::renumber_ext_vertices<vertex_t, multi_gpu>( | ||
handle_, | ||
vertex_array.data(), | ||
vertex_array.size(), | ||
number_map->data(), | ||
graph_view.local_vertex_partition_range_first(), | ||
graph_view.local_vertex_partition_range_last(), | ||
do_expensive_check_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the same vertex_t value is passed to every GPU in multi-GPU, right? Or are you assuming that each GPU can call this function with different vertex_
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the former is the case, you don't need to really shuffle vertices, just need to call compute_gpu_id_from_ext_vertex_t
to find the owning GPU. The owning GPU performs the check and broadcast the check result.
cpp/src/c_api/graph_sg.cpp
Outdated
raft::copy<vertex_t>( | ||
vertices.data(), edgelist_srcs.data(), edgelist_srcs.size(), handle_.get_stream()); | ||
|
||
cugraph::detail::sort_ints(handle_.get_stream(), | ||
raft::device_span<vertex_t>{vertices.data(), vertices.size()}); | ||
|
||
size_t unique_vertices_size = cugraph::detail::unique_ints( | ||
handle_.get_stream(), raft::device_span<vertex_t>{vertices.data(), vertices.size()}); | ||
|
||
vertices.resize(unique_vertices_size + edgelist_dsts.size(), handle_.get_stream()); | ||
|
||
raft::copy<vertex_t>(vertices.data() + unique_vertices_size, | ||
edgelist_dsts.data(), | ||
edgelist_dsts.size(), | ||
handle_.get_stream()); | ||
|
||
cugraph::detail::sort_ints(handle_.get_stream(), | ||
raft::device_span<vertex_t>{vertices.data(), vertices.size()}); | ||
|
||
unique_vertices_size = cugraph::detail::unique_ints( | ||
handle_.get_stream(), raft::device_span<vertex_t>{vertices.data(), vertices.size()}); | ||
|
||
vertices.resize(unique_vertices_size, handle_.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce memory footprint, you can sort & unique sources & destinations separately and merge instead of sort & unique sources, then copy all the destinations, and sort & unique.
You may further cut the memory footprint by first performing hash based group by (https://github.com/rapidsai/cugraph/blob/branch-25.04/cpp/include/cugraph/utilities/shuffle_comm.cuh#L761 with mem_frugal_threshold set), but this might be little bit of an overkill and you may defer this till this actually becomes a bottlenck (especially considering that renumber = false is here mainly for historical/debugging reasons).
If you just want to put working simple implementation here (as this won't be a case actual users will heavily use), just copy all sources & destinations to a single array and sort & unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And don't forget to call shrink_to_fit() to actually release memory after resize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce memory footprint, you can sort & unique sources & destinations separately and
Right. I now sort & unique sources & destinations separately. I also call shrink_to_fit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may further cut the memory footprint by first performing hash based group by ...
Right, we can add this optimization if this becomes a bottleneck which might not be for now. And as you mention, the case this code is targeting is for historical/debugging reasons
*/ | ||
cugraph_error_code_t cugraph_has_vertex(const cugraph_resource_handle_t* handle, | ||
cugraph_graph_t* graph, | ||
const int vertex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use an int here. A vertex can be int32_t or int64_t.
For SSSP, we pass in a size_t
and cast it to either int32_t or int64_t inside the C API implementation.
cpp/src/c_api/graph_functions.cpp
Outdated
struct has_vertex_functor : public cugraph::c_api::abstract_functor { | ||
raft::handle_t const& handle_{}; | ||
cugraph::c_api::cugraph_graph_t* graph_{nullptr}; | ||
const int vertex_{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be an int.
cpp/src/c_api/graph_functions.cpp
Outdated
|
||
has_vertex_functor(::cugraph_resource_handle_t const* handle, | ||
::cugraph_graph_t* graph, | ||
const int vertex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be an int.
cdef cugraph_error_code_t cugraph_has_vertex( | ||
const cugraph_resource_handle_t* handle, | ||
const cugraph_graph_t* graph, | ||
const int vertex, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be an int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
I just have a few minor requests.
@@ -145,6 +147,10 @@ def bfs(ResourceHandle handle, _GPUGraph graph, | |||
|
|||
assert_CAI_type(sources, "sources") | |||
|
|||
# Check if sources are valid | |||
for v in sources.values_host: | |||
if not pylibcugraph.has_vertex(handle, graph, v, do_expensive_check): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you just import has_vertex
from pylibcugraph
, you can change this:
if not pylibcugraph.has_vertex(handle, graph, v, do_expensive_check): | |
if not has_vertex(handle, graph, v, do_expensive_check): |
@@ -20,6 +20,7 @@ from libc.stdint cimport uintptr_t | |||
from libc.stdint cimport int32_t | |||
from libc.limits cimport INT_MAX | |||
|
|||
import pylibcugraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this if you import has_vertex
directly from pylibcugraph
, which you're doing below.
|
||
|
||
return True if result else False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would only have one blank line here.
Also, will this work instead?
return True if result else False | |
return bool(result) |
from pylibcugraph._cugraph_c.array cimport ( | ||
cugraph_type_erased_device_array_view_t, | ||
cugraph_type_erased_device_array_view_free, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these being used here, can they be removed?
copy_to_cupy_array, | ||
create_cugraph_type_erased_device_array_view_from_py_obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these being used here, can they be removed?
/** | ||
* @ingroup utility_wrappers_cpp | ||
* @brief Update a value in a device span to 0 if it matches the target_value or 1 | ||
* | ||
* @tparam value_t type of the value to operate on. Must be either int32_t or int64_t. | ||
* | ||
* @param[out] values device span to update | ||
* @param[in] target_value value to be querried | ||
* @param[in] stream_view stream view | ||
* | ||
*/ | ||
template <typename value_t> | ||
void transform_binary(raft::device_span<value_t> values, | ||
value_t target_value, | ||
raft::handle_t const& handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look generic enough to be included as a utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we at least need a better name for this if you need this to call thrust function from .cpp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better store/return results in bool
array than using value_t
.
Something like
rmm::device_uvector<bool>
elementwise_not_equal(
raft::device_span<value_t> values,
value_t compare,
rmm::cuda_stream_view const& stream_view);
or transform_not_equal or transform_compare_not_equal? I won't sure about the best name but transform_binary
doesn't match very well with what this function does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI: I asked this to chatgpt and got this.
A good name for this function should align with STL naming conventions, which are often verb-based and descriptive of their operation. Some well-known STL functions that perform element-wise transformations or checks include std::transform, std::count_if, and std::any_of.
Since your function checks whether each element differs from a given value and returns a boolean result for each, a name that follows STL style could be:
Suggested names:
mismatch_mask – Inspired by std::mismatch, emphasizing that it creates a mask of mismatches.
not_equal_mask – Mirrors std::not_equal_to, clearly indicating the operation.
transform_not_equal – Follows the std::transform pattern, emphasizing element-wise transformation.
compare_not_equal – Explicit about the comparison operation.
Recommended choice:
not_equal_mask
This name aligns well with STL conventions, is concise, and clearly communicates that the function generates a mask indicating inequality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I might go for chatgpt
next time I am trying to find a name for my thrust-like
function. I chose transform_not_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... you may try cursor chatting as well. Haven't compared but in theory, it can better consider the other code in cugraph codebase.
cpp/src/c_api/graph_functions.cpp
Outdated
/* | ||
if constexpr (multi_gpu) { | ||
vertices = cugraph::shuffle_ext_vertices(handle_, std::move(vertices)); | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete dead code.
cpp/src/c_api/graph_functions.cpp
Outdated
cugraph::detail::transform_binary( | ||
raft::device_span<vertex_t>{vertices.data(), vertices.size()}, | ||
cugraph::invalid_vertex_id<vertex_t>::value, | ||
handle_.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comparison result should better be stored in rmm::device_uvector<bool>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a rmm::device_uvector<bool>
to store the result of the comparison.
cpp/src/c_api/graph_sg.cpp
Outdated
|
||
vertices.resize(unique_edgelist_srcs_size + unique_edgelist_dsts_size, | ||
handle_.get_stream()); | ||
vertices.shrink_to_fit(handle_.get_stream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shrink_to_fit is unnecessary. You are not shrinking a vector here. You just increased the size of the vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right this is addressed
cpp/src/c_api/graph_sg.cpp
Outdated
return; | ||
} | ||
|
||
*number_map = std::move(vertices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this unnecessary. number_map already holds consecutive integers starting from 0.
void transform_not_equal(raft::device_span<value_t> values, | ||
raft::device_span<bool> result, | ||
value_t compare, | ||
raft::handle_t const& handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our convention is to pass handle as the first input argument (and stream as the last).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (except for unnecessary empty lines).
cpp/src/c_api/graph_sg.cpp
Outdated
@@ -292,11 +294,34 @@ struct create_graph_functor : public cugraph::c_api::abstract_functor { | |||
if (renumber_) { | |||
*number_map = std::move(new_number_map.value()); | |||
} else { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empty line is unnecessary.
cpp/src/c_api/graph_sg.cpp
Outdated
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These empty lines are unnecessary.
cpp/src/c_api/graph_sg.cpp
Outdated
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for 3 empty lines. 1 is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent changes look good but I have a question.
cdef cugraph_type_erased_device_array_view_t* \ | ||
result_view_ptr = \ | ||
cugraph_type_erased_device_array_view( | ||
result_ptr) | ||
|
||
cupy_has_vertex = copy_to_cupy_array(c_resource_handle_ptr, result_view_ptr) | ||
|
||
return True if result else False | ||
return cupy_has_vertex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this copy operation added, and are we no longer returning a bool from this function as implied in the docstring? Also, do the benchmarks in the PR description include this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked to @jnke2016 offline: reason for returning the array is to facilitate passing/querying multiple vertices and getting individual bools back for each. This also helps facilitate MG use cases. The benchmarks were not impacted significantly with this change.
@jnke2016 is going to update the docstring and example now that the return value is different.
/merge |
This PR leverages the CAPI to optimize the check for vertex existence which can lead to +100x speedup compared to the cudf based version as it can be noticed from the performance figure below
closes #4956