Refactor Force Atlas 2 #4969

jnke2016 · 2025-03-12T12:20:59Z

This PR exposes FA2 to the PLC API

ChuckHastings · 2025-03-14T17:02:18Z

cpp/src/c_api/legacy_fa2.cpp

+namespace cugraph {
+namespace c_api {
+
+struct cugraph_clustering_result_t {


This seems to be left over from a copy/paste. It's not relevant here.

The new cugraph_layout_result_t object would be defined here instead.

I defined cugraph_layout_result_t instead

ChuckHastings · 2025-03-14T17:04:04Z

cpp/src/layout/legacy/force_atlas2.cu

@@ -78,6 +78,7 @@ void force_atlas2(raft::handle_t const& handle,
                                                            verbose,
                                                            callback);
  }
+


Seems like an arbitrary change. I'd suggest deleting this line to eliminate the file change.

ChuckHastings · 2025-03-14T17:06:31Z

cpp/src/c_api/legacy_fa2.cpp

+
+      cugraph::internals::GraphBasedDimRedCallback* callback = nullptr;
+
+      //cugraph::legacy::GraphBasedDimRedCallback() // FIXME Add support for callback


Does anything in python use this callback feature?

I don't see any other implementations offering that feature. I'm guessing it was for debugging purposes? It would be nice to be able to just drop this feature... but we currently support it.

Thoughts @rlratzel?

This is exposed in the current Python API too, meaning users could theoretically be using it now. However, I really doubt anyone is, and I'm in favor of making this a breaking change as a tradeoff for not having to support, test, and document it in the new PLC-based API.

I like the code and it could be useful for something, but it may mean that we can't release the GIL (which we've been wanting to do in more of our functions), and I suspect we could get most of the intended functionality with an option that enables debug prints if we really wanted to.

Maybe @hlinsen has more info about the intended use case though?

This was an asked by @trxcllnt and @exactlyallan for dynamic viz. There is maybe some cyber use cases or demo that could need it. I think it is also exposed in umap for cuml.

Thanks for the feedback @hlinsen . Do you have suggestions about ways to test FA2 apart from the smoke tests cuGraph and networkX have?

Some ideas:

Checking that we converged by checking some of the values like speed, swinging values here

Trying to make barnes hut deterministic and compare positions

Check consistency in nearest neighbor nodes, look at a top k. Some times the layout is the same but shifted and I think nearest nodes should still be the same ones.

cpp/include/cugraph_c/layout_algorithms.h

ChuckHastings · 2025-03-14T21:33:34Z

cpp/include/cugraph_c/layout_algorithms.h

+ * @param [in]   handle          Handle for accessing resources
+ * @param [in]   graph           Pointer to graph.  NOTE: Graph might be modified if the storage
+ *                               needs to be transposed
+ * @param [out]  pos             Position of the vertices


Note that in the C++ code, pos is a 2-d array and assumes that the output is ordered by internal vertex ids.

Generally, the C API hides the renumbering of vertices from the user (we automatically renumber/unrenumber in the C API before calling the C++ API. As such, this really needs include the vertex ids.

I'd suggest creating a cugraph_layout_result_t type and including vertex id, x position, y position so that you can easily create a data frame for the resulting python.

ChuckHastings · 2025-03-14T21:35:39Z

cpp/include/cugraph_c/layout_algorithms.h

+ *                               needs to be transposed
+ * @param [out]  pos             Position of the vertices
+ * @param [in]   max_iter        Maximum number of iterations. Initial vertex positioning
+ * @param [in]   x_start         Initial vertex positioning (x-axis)


Same issue for x_start and y_start. The values in the legacy C++ implementation are assumed to be in the order of the internal vertex ids... but the python code doesn't know this order.

I'd suggest using the cugraph_layout_result_t object as input here. You can create one, passing in the vertex ids, x_start and y_start values and use it as input.

With your latest refactoring you did not address this concern. The legacy C++ implementation assumes that x_start and y_start are sorted such that element 0 refers to the starting position of vertex 0 using the internal vertex id. If the graph was created with renumbering, there is no way for the caller to know what order to put x_start and y_start.

Simplest solution would be to add a vertices parameter (optional like x_start and y_start) that identified the vertex id for each position.

Then in the implementation you can make a copy of this data into rmm device vectors, renumber the vertices and then sort the vertices/x_start/y_start by vertex id. I think page rank can be used as a model for handling the intial guess.

Also, documentation for x_start and y_start (and the new vertices parameter) should indicate that they are optional. And there should be a CAPI_EXPECTS call that verifies that if you specify x_start and y_start you must also specify vertices.

x_start and y_start are optional. The documentation should reflect the following:

You can pass NULL for x_start and y_start and no initial starting position will be implied

If you pass a value for one you must pass a value for the other

Should also indicate that x_start and y_start the vertex order. You assume in the implementation that the values of x_start and y_start are sorted by vertex id, the documentation should make this clear.

ChuckHastings · 2025-03-14T21:37:19Z

cpp/include/cugraph_c/layout_algorithms.h

+ * @param [in]   verbose         Output convergence info at each interation.
+ * @param [in]   do_expensive_check
+ *                               A flag to run expensive checks for input arguments (if set to true)
+ * @param [out]  result          Opaque object containing the clustering result


This result would be what pos above is. I'd delete the pos parameter above and move it to this position. By convention, the C API functions pass all input parameters before the output parameters.

ChuckHastings · 2025-03-14T21:41:05Z

cpp/src/c_api/legacy_fa2.cpp

+    } else if constexpr (!std::is_same_v<edge_t, int32_t>) {
+      unsupported();
+    } else {
+      // FIXME: Does FA2 expects store_transposed == false


FA2 is expecting a legacy::GraphCOOView, which you're creating below. there's no reason to transpose the graph, you should be able to use it exactly as is.

I'd take out this logic. You can then remove the annotation in the C API documentation that the graph object might be modified... because in the case of this algorithm it would not be.

…4_refactor-fa2

cpp/include/cugraph_c/layout_algorithms.h

ChuckHastings · 2025-03-19T16:18:46Z

cpp/include/cugraph_c/layout_algorithms.h

+ *                               needs to be transposed
+ * @param [out]  pos             Position of the vertices
+ * @param [in]   max_iter        Maximum number of iterations. Initial vertex positioning
+ * @param [in]   x_start         Initial vertex positioning (x-axis)


With your latest refactoring you did not address this concern. The legacy C++ implementation assumes that x_start and y_start are sorted such that element 0 refers to the starting position of vertex 0 using the internal vertex id. If the graph was created with renumbering, there is no way for the caller to know what order to put x_start and y_start.

Simplest solution would be to add a vertices parameter (optional like x_start and y_start) that identified the vertex id for each position.

Then in the implementation you can make a copy of this data into rmm device vectors, renumber the vertices and then sort the vertices/x_start/y_start by vertex id. I think page rank can be used as a model for handling the intial guess.

Also, documentation for x_start and y_start (and the new vertices parameter) should indicate that they are optional. And there should be a CAPI_EXPECTS call that verifies that if you specify x_start and y_start you must also specify vertices.

ChuckHastings · 2025-03-19T16:19:36Z

cpp/src/c_api/legacy_fa2.cpp

+        legacy_coo_graph_view,
+        pos.data(),
+        max_iter_,
+        x_start_ != nullptr ? x_start_->as_type<float>() : nullptr,


As mentioned above, this need to be sorted properly, so you'll need to make a copy and sort by vertex id after renumbering.

After our discussion, I sort both the x_start and y_start by key where the key refers to the position of the number_map

ChuckHastings · 2025-03-19T16:21:37Z

cpp/src/c_api/legacy_fa2.cpp

+
+      cugraph::internals::GraphBasedDimRedCallback* callback = nullptr;
+
+      rmm::device_uvector<vertex_t> vertices(graph_view.local_vertex_partition_range_size(),


I would move this down to where you do x_axis and y_axis... I don't think you need this until constructing the result.

cpp/src/c_api/legacy_fa2.cpp

rlratzel

LGTM once the other review comments are addressed. I just had a few minor issues/questions.

Also, we'll need to make sure this is addressed before updating the cugraph Python API. Since this PR is adding to PLC, I don't believe the callback use cases using cugraph Python are affected (but we need to confirm).

rlratzel · 2025-03-20T07:06:36Z

python/pylibcugraph/pylibcugraph/_cugraph_c/layout_algorithms.pxd

+    cugraph_error_t,
+)
+from pylibcugraph._cugraph_c.array cimport (
+    cugraph_type_erased_device_array_t,


I don't think this is used in this file. Can it be removed?

rlratzel · 2025-03-20T07:06:50Z

python/pylibcugraph/pylibcugraph/_cugraph_c/layout_algorithms.pxd

+
+cdef extern from "cugraph_c/layout_algorithms.h":
+    ###########################################################################
+    # triangle_count


copy/paste error?

rlratzel · 2025-03-20T07:26:21Z

cpp/src/c_api/legacy_fa2.cpp

I'm just curious about the legacy_ prefix for the file name. Is legacy referring to the C++ implementation not using primitives? If so, once it uses primitives, would this file need to be replaced? I figured this would just be called fa2.cpp and only the C++ impl would change once primitives are used.

Is legacy referring to the C++ implementation not using primitives

I believe that is correct @ChuckHastings

If so, once it uses primitives, would this file need to be replaced? I figured this would just be called fa2.cpp and only the C++ impl would change once primitives are used.

Right. No changes to the C (except the file name), and PLC API

Yes, we targeted getting the API definition right for both legacy implementation and some future primitive based implementation. The implementation is currently solely a legacy implementation. The implementation of the C API will look totally different on a primitive based C++ implementation.

The implementation of the C API will look totally different

However the function definition in the C API will most likely be unchanged right? and regarding C the implementation, we will remove the legacy COO logic and call the primitive based C++ implementation right

Correct. Function definition would be the same. The functor definition might be the same... the implementation inside the operator() method would likely be totally different (simpler) as we wouldn't have to worry about legacy graph objects.

Function definition would be the same.

ok, that's what I figured too. In that case, my preference would be to just call the file fa2.cpp since the _legacy prefix isn't needed and will be misleading later, but I don't feel strongly about it, and it can always be renamed later if needed.

I think it's helpful to have the legacy in the name when we're looking over the codebase later. If the transition occurs as we anticipate we'll just do a rename and preserve the history when we transition.

…4_refactor-fa2

ChuckHastings

A few documentation cleanup comments.

ChuckHastings · 2025-03-21T14:59:06Z

cpp/include/cugraph_c/layout_algorithms.h

+ *                               needs to be transposed
+ * @param [out]  pos             Position of the vertices
+ * @param [in]   max_iter        Maximum number of iterations. Initial vertex positioning
+ * @param [in]   x_start         Initial vertex positioning (x-axis)


x_start and y_start are optional. The documentation should reflect the following:

You can pass NULL for x_start and y_start and no initial starting position will be implied

If you pass a value for one you must pass a value for the other

Should also indicate that x_start and y_start the vertex order. You assume in the implementation that the values of x_start and y_start are sorted by vertex id, the documentation should make this clear.

ChuckHastings · 2025-03-21T14:59:59Z

cpp/src/c_api/legacy_fa2.cpp

+  cugraph::c_api::cugraph_layout_result_t** result,
+  cugraph_error_t** error)
+{
+  force_atlas2_functor functor(handle,


We should add CAPI_EXPECTS calls to validate that x_start and y_start are either both NULL or are both specified.

I updated the docs and in case x_start and y_start are not provided, it is now mentioned in the documentation that those will be selected randomly internally.

rlratzel · 2025-03-24T20:47:46Z

/merge

add C API for the legacy fa2

5227c07

jnke2016 requested a review from a team as a code owner March 12, 2025 12:21

jnke2016 self-assigned this Mar 12, 2025

github-actions bot added the cuGraph label Mar 12, 2025

jnke2016 added this to the 25.04 milestone Mar 12, 2025

rlratzel added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 12, 2025

jnke2016 added 2 commits March 13, 2025 01:29

fix typo

8cde98a

add fa2 to the CAPI

b4261e7

jnke2016 requested a review from a team as a code owner March 13, 2025 13:49

github-actions bot added the CMake label Mar 13, 2025

jnke2016 added 2 commits March 13, 2025 10:03

add PLC implementation of force atlas 2

00744d0

update type

06a582e

jnke2016 requested a review from a team as a code owner March 13, 2025 17:04

github-actions bot added the python label Mar 13, 2025

jnke2016 added 9 commits March 14, 2025 05:30

remove debug functions

a128ba9

remove debug print and outdated comments

ce404ee

remove outdated fixme

8f120ca

add doctrings example and resolve fixme

e180a44

remove unused methods

c4a32e0

fix typo

c7f42ab

resolve fixme

60548e9

remove debug print

38f1c5b

add EOL

45c58a6

ChuckHastings reviewed Mar 14, 2025

View reviewed changes

jnke2016 added 5 commits March 18, 2025 08:28

remove extra line

8076a49

create layout result type

e8e9a44

remove outdated code

2ade19d

return layout result

d9a7462

remove const to variable passed by reference

8b52847

jnke2016 added 3 commits March 18, 2025 16:30

add argument for rng_state

9c0bec5

fix style

e4f04ac

Merge remote-tracking branch 'upstream/branch-25.04' into branch-25.0…

e6a731d

…4_refactor-fa2

ChuckHastings reviewed Mar 19, 2025

View reviewed changes

rlratzel reviewed Mar 20, 2025

View reviewed changes

jnke2016 added 8 commits March 20, 2025 06:39

renumber x and y start based on internal vertex IDs

8e716f9

fix style

4b60240

move variable declaration

ddc8003

Merge remote-tracking branch 'upstream/branch-25.04' into branch-25.0…

5b0cc5c

…4_refactor-fa2

fix typo

3440257

add method to sort tuple by keys

4f5ccfb

fix style

0ab7dc9

Merge remote-tracking branch 'upstream/branch-25.04' into branch-25.0…

bbd230a

…4_refactor-fa2

ChuckHastings reviewed Mar 21, 2025

View reviewed changes

jnke2016 added 2 commits March 21, 2025 16:20

update docs and add checks

37f9d5d

fix style

d7ab4bf

ChuckHastings approved these changes Mar 24, 2025

View reviewed changes

rlratzel approved these changes Mar 24, 2025

View reviewed changes

rapids-bot bot merged commit d7564af into rapidsai:branch-25.04 Mar 24, 2025
81 checks passed

rlratzel mentioned this pull request Aug 12, 2025

Update force_atlas2() to remove legacy Cython code #5224

Open


		cugraph::internals::GraphBasedDimRedCallback* callback = nullptr;

		//cugraph::legacy::GraphBasedDimRedCallback() // FIXME Add support for callback


		cugraph::internals::GraphBasedDimRedCallback* callback = nullptr;

		rmm::device_uvector<vertex_t> vertices(graph_view.local_vertex_partition_range_size(),

Refactor Force Atlas 2 #4969

Refactor Force Atlas 2 #4969

Uh oh!

Conversation

jnke2016 commented Mar 12, 2025 • edited by rlratzel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rlratzel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChuckHastings left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlratzel commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

jnke2016 commented Mar 12, 2025 •

edited by rlratzel

Loading