Skip to content

Conversation

ChuckHastings
Copy link
Collaborator

@ChuckHastings ChuckHastings commented Mar 14, 2025

Betweenness Centrality normalization is not quite right if you specify not including endpoints and use approximate betweenness.

This PR temporarily disables some of the python tests that compare results with networkx, since the networkx to update the normalization scores is not yet merged. Once networkx/networkx#7908 is merged we should be able to create another PR to enable those tests. Each of the disabled tests is skipped with a link to that PR as the reason.

Closes #4941

Copy link

copy-pr-bot bot commented Mar 14, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ChuckHastings ChuckHastings self-assigned this Mar 14, 2025
@ChuckHastings ChuckHastings added bug Something isn't working non-breaking Non-breaking change labels Mar 14, 2025
@ChuckHastings ChuckHastings marked this pull request as ready for review March 14, 2025 22:35
@ChuckHastings ChuckHastings requested a review from a team as a code owner March 14, 2025 22:35
@ChuckHastings ChuckHastings requested a review from a team as a code owner March 18, 2025 20:00
…ality.py

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>
Copy link
Contributor

@seunghwak seunghwak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me but I have some questions about the logic to set scale_factor.

}
} else if (graph_view.number_of_vertices() > 2) {
scale_factor = static_cast<weight_t>(
std::min(static_cast<vertex_t>(num_sources), graph_view.number_of_vertices() - 1) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to subtract 1 from num_sources? (i.e. static_cast<vertex_t>(num_sources - 1)?)

I assume num_sources == graph_view.number_of_vertices() for full BC. It looks a bit weird to subtract 1 just from graph_view.number_of_vertices().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some complex gyrations around the formulas.

There are a couple of things being accounted for in the scaling factor. In the normalization path, we're trying to divide by the maximum number of times a vertex could appear in the shortest paths. For the full graph, since we're not including endpoints, this is (n-1) * (n-2) where n is the number of vertices in the graph. This would occur for a vertex v that has an input edge from every vertex in the graph. The n-1 factor counts every vertex other than v (when we start at v we won't travel back to v and we're not counting the endpoint). and the n-2 factor is the maximum number of paths that could travel through v.

For approximate betweenness, we're only traveling through num_sources samples. So the maximum value would be num_sources * n-2. This would occur in any combination of the above described graph where the randomly selected sources did not include the vertex v.

I agree it looks odd.

scale_factor = (graph_view.is_symmetric() ? weight_t{2} : weight_t{1}) *
static_cast<weight_t>(num_sources) /
(include_endpoints ? static_cast<weight_t>(graph_view.number_of_vertices())
: static_cast<weight_t>(graph_view.number_of_vertices() - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check vertices.size() (or sum of vertices.size() in multi-GPU) > 0. So, it is technically possible to pass empty seed vertices, and in this case, num_sources = 0 && graph_view.number_of_verties() = 1 is possible; then, scale_factor can become 0 leading to divide by 0.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That check exists in the above if statements and not this one. I will add that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pushed an update

@ChuckHastings
Copy link
Collaborator Author

/merge

@rapids-bot rapids-bot bot merged commit 6ef7d0b into rapidsai:branch-25.04 Mar 20, 2025
82 checks passed
rapids-bot bot pushed a commit that referenced this pull request Jun 12, 2025
networkx/networkx#7949 implements an update to normalization that was originally described in #4941.  We pushed an update in #4974 that addressed the specific examples that the user identified in the original cugraph issue.  However, while @eriknw was exploring updates to networkx to address this, he identified a few more edge conditions that needed to be satisfied.  This PR addresses those remaining edge conditions.

Note that the python tests comparing results to networkx are still disabled.  These can't be re-enabled until networkx/networkx#7949 is included in a networkx release.

Closes #5006 
Closes #5107

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Joseph Nke (https://github.com/jnke2016)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #5105
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuGraph non-breaking Non-breaking change python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: normalization issue in betweenness_centrality
4 participants