Skip to content

cluster-only doesn't persist .graphify_analysis.json, causing export html to silently report "Single community" #1610

Description

@drmzperx

Bug: graphify cluster-only doesn't leave .graphify_analysis.json behind, so a subsequent graphify export html silently skips with "Single community"

Version: graphifyy 0.8.13 (installed via uv tool install)

Steps to reproduce

  1. Run the full pipeline once on a corpus large enough to exceed the HTML viz node limit (>5000 nodes), so graphify-out/graph.json and graphify-out/GRAPH_REPORT.md exist.
  2. Manually edit graphify-out/graph.json (e.g. merge two duplicate nodes) so the graph no longer matches the last full extraction.
  3. Run graphify cluster-only . to re-cluster the edited graph. It reports success (e.g. Done — 387 communities. GRAPH_REPORT.md and graph.json updated.) and, in my case, also printed Skipped graph.html: ... too large for HTML viz (limit: 5000).
  4. Run graphify export html to get the aggregated community view for the oversized graph.

Expected

graphify export html builds the aggregated community meta-graph (one node per community, as it does when run right after the initial full pipeline) and writes graph.html.

Actual

Graph has 5558 nodes (above 5000 limit). Building aggregated community view...
Single community - aggregated view not useful. Skipping graph.html.

No error, no indication anything is wrong — it just silently produces a 0-node/1-node-looking result and skips the file.

Root cause

export html's CLI handler reads community assignments from a specific file, not from graph.json's own per-node community attribute:

# graphify/__main__.py, ~line 2119
analysis_path = Path(_GRAPHIFY_OUT) / ".graphify_analysis.json"
...
# ~line 2240-2242
communities: dict[int, list[str]] = {}
if analysis_path.exists():
    _an = json.loads(analysis_path.read_text(encoding="utf-8"))
    communities = {int(k): v for k, v in _an.get("communities", {}).items()}

graphify cluster-only never writes .graphify_analysis.json — it computes communities internally and writes only graph.json (with per-node community fields) and GRAPH_REPORT.md, then presumably cleans up its own intermediates. So after a cluster-only run, .graphify_analysis.json doesn't exist, communities ends up {}, and to_html's aggregation path (graphify/export.py, to_html(), the node_limit is not None branch) iterates zero communities:

meta = _nx.Graph()
for cid, members in communities.items():   # communities == {} here
    meta.add_node(...)
...
if meta.number_of_nodes() <= 1:
    print("Single community - aggregated view not useful. Skipping graph.html.")
    return

— hence the misleading "Single community" message on a graph that actually has hundreds of real communities (still recoverable from graph.json's per-node community field).

Suggested fix

Either:

  • Have cluster-only also (re)write .graphify_analysis.json (communities/cohesion/gods/surprises) alongside graph.json and GRAPH_REPORT.md, matching what the full pipeline leaves behind at the equivalent step, or
  • Have export html's CLI handler fall back to deriving communities from graph.json's per-node community field when .graphify_analysis.json is missing, rather than defaulting to {}.

Either fix would also make the "Single community" message accurate again — right now it fires even when the graph clearly has many communities, which is confusing to debug from the CLI output alone (I only found the cause by reading graphify/export.py and graphify/__main__.py source directly).

Workaround

Manually reconstruct and write graphify-out/.graphify_analysis.json with {"communities": ..., "cohesion": ..., "gods": ..., "surprises": ...} (derived from graph.json's per-node community field) before calling graphify export html.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions