Summary
find_import_cycles() in graphify/analyze.py enumerates cycles with an unbounded nx.simple_cycles(file_graph) and only afterwards filters to len(cycle) <= max_cycle_length, breaking once top_n * 10 short cycles are collected:
# Step 2: Find simple cycles, bounded by length. <-- comment says bounded, call is not
cycles: list[list[str]] = []
for cycle in nx.simple_cycles(file_graph):
if len(cycle) <= max_cycle_length:
cycles.append(cycle)
if len(cycles) >= top_n * 10:
break
On a real-world repo whose file-level import graph contains many long elementary cycles, the generator yields an astronomically large number of long cycles that all fail the <= max_cycle_length filter, so cycles never reaches top_n * 10 and the early-break never fires. nx.simple_cycles is an unbounded elementary-cycle enumerator (worst case exponential), so report generation hangs at 100% CPU and never returns.
Environment
- graphifyy 0.8.35, networkx 3.6.1, Python 3.12, macOS (Apple Silicon)
- Corpus: a large TS + Rust monorepo. Reproduced at both ~12.8k-node and ~172k-node graph sizes.
Repro
- Build a graph for any large codebase with cyclic file-level imports.
- Report generation (
generate() → find_import_cycles()) hangs. Observed 16+ minutes with no completion; sample/profiling shows it spinning inside simple_cycles.
Fix
Pass length_bound so networkx prunes during enumeration instead of enumerating every elementary cycle first:
for cycle in nx.simple_cycles(file_graph, length_bound=max_cycle_length):
...
length_bound is supported on networkx.simple_cycles since NetworkX 3.1. With it, report generation dropped from "never returns" to ~0.1s on a 32k-node graph in our testing. The subsequent len(cycle) <= max_cycle_length check then becomes redundant but harmless.
Separately (DX note, not the bug)
On graphs of this size, clustering also falls back to pure-Python networkx.community.louvain_communities when graspologic isn't installed, which took >15 min and pegged a core. Installing graspologic (native Leiden) brought it to <1s. Might be worth documenting graspologic as a recommended extra for large graphs, or surfacing a warning when falling back.
Happy to send a PR for the length_bound one-liner if useful.
Summary
find_import_cycles()ingraphify/analyze.pyenumerates cycles with an unboundednx.simple_cycles(file_graph)and only afterwards filters tolen(cycle) <= max_cycle_length, breaking oncetop_n * 10short cycles are collected:On a real-world repo whose file-level import graph contains many long elementary cycles, the generator yields an astronomically large number of long cycles that all fail the
<= max_cycle_lengthfilter, socyclesnever reachestop_n * 10and the early-break never fires.nx.simple_cyclesis an unbounded elementary-cycle enumerator (worst case exponential), so report generation hangs at 100% CPU and never returns.Environment
Repro
generate()→find_import_cycles()) hangs. Observed 16+ minutes with no completion;sample/profiling shows it spinning insidesimple_cycles.Fix
Pass
length_boundso networkx prunes during enumeration instead of enumerating every elementary cycle first:length_boundis supported onnetworkx.simple_cyclessince NetworkX 3.1. With it, report generation dropped from "never returns" to ~0.1s on a 32k-node graph in our testing. The subsequentlen(cycle) <= max_cycle_lengthcheck then becomes redundant but harmless.Separately (DX note, not the bug)
On graphs of this size, clustering also falls back to pure-Python
networkx.community.louvain_communitieswhengraspologicisn't installed, which took >15 min and pegged a core. Installinggraspologic(native Leiden) brought it to <1s. Might be worth documentinggraspologicas a recommended extra for large graphs, or surfacing a warning when falling back.Happy to send a PR for the
length_boundone-liner if useful.