Skip to content

[FEA]: Add a reproducibility/seed option to cugraph.louvain (API parity with cugraph.leiden) #5516

@juliaczhao

Description

@juliaczhao

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

cugraph.louvain produces non-deterministic partitions across repeated runs on the same graph, same RAPIDS version, same GPU. cugraph.leiden already accepts a random_state argument, but cugraph.louvain exposes no equivalent, so there is no way to obtain reproducible Louvain results.

cugraph version: 26.04.00 (also seen on earlier releases)

Reproducibility evidence. Running Louvain 6× on a fixed ~2k-node weighted graph (community-detection step of a gene-program pipeline): - 5/6 runs produced an identical partition - 1/6 collapsed ~15 communities into their neighbors (different community count)

The same workload under cugraph.leiden(random_state=42) was bit-identical across all 6 runs. The downstream effect in our case was a ~20% swing in the number of detected communities from one Louvain run to the next.

Describe your ideal solution

A way to make cugraph.louvain reproducible, ideally:

  1. random_state parameter on cugraph.louvain (and pylibcugraph.louvain), matching the existing cugraph.leiden signature, to seed any RNG-driven tie-breaking; and/or
  2. A documented deterministic mode — since Louvain's variation also stems from parallel vertex-move ordering and non-associative floating-point reductions of modularity gains, a seed alone may not be sufficient. An optional flag that fixes traversal order / uses deterministic reductions (accepting a performance cost) would let users trade speed for reproducibility when needed.

API parity with cugraph.leiden is the minimum bar; even just documenting which sources of non-determinism a seed would and would not address would help.

Describe any alternatives you have considered

No response

Additional context

Use case: community detection on weighted similarity graphs in a single-cell genomics pipeline, where reproducible runs are required for published results. Happy to share a minimal reproducer graph if useful.

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions