Skip to content

find_import_cycles: nx.simple_cycles() called without length_bound hangs on large import graphs #1196

Description

@alewcock

Summary

find_import_cycles() in graphify/analyze.py enumerates cycles with an unbounded nx.simple_cycles(file_graph) and only afterwards filters to len(cycle) <= max_cycle_length, breaking once top_n * 10 short cycles are collected:

# Step 2: Find simple cycles, bounded by length.   <-- comment says bounded, call is not
cycles: list[list[str]] = []
for cycle in nx.simple_cycles(file_graph):
    if len(cycle) <= max_cycle_length:
        cycles.append(cycle)
    if len(cycles) >= top_n * 10:
        break

On a real-world repo whose file-level import graph contains many long elementary cycles, the generator yields an astronomically large number of long cycles that all fail the <= max_cycle_length filter, so cycles never reaches top_n * 10 and the early-break never fires. nx.simple_cycles is an unbounded elementary-cycle enumerator (worst case exponential), so report generation hangs at 100% CPU and never returns.

Environment

  • graphifyy 0.8.35, networkx 3.6.1, Python 3.12, macOS (Apple Silicon)
  • Corpus: a large TS + Rust monorepo. Reproduced at both ~12.8k-node and ~172k-node graph sizes.

Repro

  1. Build a graph for any large codebase with cyclic file-level imports.
  2. Report generation (generate()find_import_cycles()) hangs. Observed 16+ minutes with no completion; sample/profiling shows it spinning inside simple_cycles.

Fix

Pass length_bound so networkx prunes during enumeration instead of enumerating every elementary cycle first:

for cycle in nx.simple_cycles(file_graph, length_bound=max_cycle_length):
    ...

length_bound is supported on networkx.simple_cycles since NetworkX 3.1. With it, report generation dropped from "never returns" to ~0.1s on a 32k-node graph in our testing. The subsequent len(cycle) <= max_cycle_length check then becomes redundant but harmless.

Separately (DX note, not the bug)

On graphs of this size, clustering also falls back to pure-Python networkx.community.louvain_communities when graspologic isn't installed, which took >15 min and pegged a core. Installing graspologic (native Leiden) brought it to <1s. Might be worth documenting graspologic as a recommended extra for large graphs, or surfacing a warning when falling back.

Happy to send a PR for the length_bound one-liner if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions