Update Betweenness Centrality normalization #4974

ChuckHastings · 2025-03-14T22:34:42Z

Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.

This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged. Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests. Each of the disabled tests is skipped with a link to that PR as the reason.

Closes #4941

copy-pr-bot · 2025-03-14T22:34:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…irected graph

python/cugraph/cugraph/tests/centrality/test_betweenness_centrality.py

…ality.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

seunghwak

Looks good to me but I have some questions about the logic to set scale_factor.

seunghwak · 2025-03-20T17:00:18Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+      }
+    } else if (graph_view.number_of_vertices() > 2) {
+      scale_factor = static_cast<weight_t>(
+        std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) *


No need to subtract 1 from num_sources? (i.e. static_cast<vertex_t>(num_sources - 1)?)

I assume num_sources == graph_view.number_of_vertices() for full BC. It looks a bit weird to subtract 1 just from graph_view.number_of_vertices().

We had some complex gyrations around the formulas.

There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is (n-1) * (n-2) where n is the number of vertices in the graph. This would occur for a vertex v that has an input edge from every vertex in the graph. The n-1 factor counts every vertex other than v (when we start at v we won't travel back to v and we're not counting the endpoint). and the n-2 factor is the maximum number of paths that could travel through v.

For approximate betweenness, we're only traveling through num_sources samples. So the maximum value would be num_sources * n-2. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertex v.

I agree it looks odd.

seunghwak · 2025-03-20T17:14:40Z

cpp/src/centrality/betweenness_centrality_impl.cuh

+    scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) *
+                   static_cast<weight_t>(num_sources) /
+                   (include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices())
+                                      : static_cast<weight_t>(graph_view.number_of_vertices() - 1));


We don't check vertices.size() (or sum of vertices.size() in multi-GPU) > 0. So, it is technically possible to pass empty seed vertices, and in this case, num_sources = 0 && graph_view.number_of_verties() = 1 is possible; then, scale_factor can become 0 leading to divide by 0.

Good catch. That check exists in the above if statements and not this one. I will add that.

Just pushed an update

ChuckHastings · 2025-03-20T21:34:31Z

/merge

@eriknw

networkx/networkx#7949 implements an update to normalization that was originally described in #4941. We pushed an update in #4974 that addressed the specific examples that the user identified in the original cugraph issue. However, while @eriknw was exploring updates to networkx to address this, he identified a few more edge conditions that needed to be satisfied. This PR addresses those remaining edge conditions. Note that the python tests comparing results to networkx are still disabled. These can't be re-enabled until networkx/networkx#7949 is included in a networkx release. Closes #5006 Closes #5107 Authors: - Chuck Hastings (https://github.com/ChuckHastings) - Rick Ratzel (https://github.com/rlratzel) Approvers: - Seunghwa Kang (https://github.com/seunghwak) - Joseph Nke (https://github.com/jnke2016) - Rick Ratzel (https://github.com/rlratzel) URL: #5105

update BC normalization

03f142e

github-actions bot added the cuGraph label Mar 14, 2025

ChuckHastings self-assigned this Mar 14, 2025

ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels Mar 14, 2025

ChuckHastings marked this pull request as ready for review March 14, 2025 22:35

ChuckHastings requested a review from a team as a code owner March 14, 2025 22:35

update BC to improve normalizationm computation

56b53f4

ChuckHastings requested a review from a team as a code owner March 18, 2025 20:00

github-actions bot added the python label Mar 18, 2025

ChuckHastings added 2 commits March 19, 2025 09:27

after discussion, add back the halving of unnormalized results on und…

4c9b203

…irected graph

Merge branch 'branch-25.04' into fix_betweenness_normalization

52c7962

eriknw mentioned this pull request Mar 19, 2025

Fix bc scale with k endpoints networkx/networkx#7908

Merged

eriknw reviewed Mar 19, 2025

View reviewed changes

python/cugraph/cugraph/tests/centrality/test_betweenness_centrality.py Outdated Show resolved Hide resolved

Update python/cugraph/cugraph/tests/centrality/test_betweenness_centr…

4592042

…ality.py Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

eriknw approved these changes Mar 20, 2025

View reviewed changes

jnke2016 approved these changes Mar 20, 2025

View reviewed changes

seunghwak reviewed Mar 20, 2025

View reviewed changes

ChuckHastings added 2 commits March 20, 2025 11:06

Merge branch 'branch-25.04' into fix_betweenness_normalization

2df0021

handle case where we might get 0

e41ce24

seunghwak approved these changes Mar 20, 2025

View reviewed changes

rapids-bot bot merged commit 6ef7d0b into rapidsai:branch-25.04 Mar 20, 2025
82 checks passed

ChuckHastings mentioned this pull request Jun 4, 2025

Update BC computation to address normalization edge conditions #5105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Betweenness Centrality normalization #4974

Update Betweenness Centrality normalization #4974

Uh oh!

ChuckHastings commented Mar 14, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 14, 2025

Uh oh!

Uh oh!

seunghwak left a comment

Uh oh!

seunghwak Mar 20, 2025

Uh oh!

ChuckHastings Mar 20, 2025

Uh oh!

seunghwak Mar 20, 2025

Uh oh!

ChuckHastings Mar 20, 2025

Uh oh!

ChuckHastings Mar 20, 2025

Uh oh!

ChuckHastings commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

Update Betweenness Centrality normalization #4974

Update Betweenness Centrality normalization #4974

Uh oh!

Conversation

ChuckHastings commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Mar 14, 2025

Uh oh!

Uh oh!

seunghwak left a comment

Choose a reason for hiding this comment

Uh oh!

seunghwak Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

ChuckHastings Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

seunghwak Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

ChuckHastings Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

ChuckHastings Mar 20, 2025

Choose a reason for hiding this comment

Uh oh!

ChuckHastings commented Mar 20, 2025

Uh oh!

Uh oh!

Uh oh!

ChuckHastings commented Mar 14, 2025 •

edited

Loading