Skip to content

export obsidian: KeyError in to_obsidian when a community member id has no backing node in G #1236

Description

@antongulin

Summary

graphify export obsidian aborts with KeyError in to_obsidian when a community's member list contains an id that has no backing node in G. The exporter assumes every clustered community member is a key in G.nodes (and in node_filename), but at least one synthesized member id ('agents_doc' in my run) is not — so the whole vault export crashes instead of skipping the dangling member.

Environment

  • graphify(y) 0.8.36
  • networkx 3.6.1
  • Python 3.13.13
  • backend: claude-cli (semantic), macOS

Repro

On a repo whose graph clusters into many communities (mine: ~10k nodes, 1206 communities after cluster-only), where multiple documents normalize to the same stem (e.g. several *-AGENTS.md / doc files across different directories):

graphify extract . --backend claude-cli
graphify cluster-only "$(pwd)" --backend=claude-cli   # succeeds: "Done - 1206 communities. GRAPH_REPORT.md and graph.json updated."
graphify export obsidian --dir "$VAULT"               # crashes

Traceback

Traceback (most recent call last):
  File ".../bin/graphify", line 10, in <module>
    sys.exit(main())
  File ".../graphify/__main__.py", line 3712, in main
    n = _to_obsidian(G, communities, str(obsidian_dir),
                     community_labels=labels or None, cohesion=cohesion or None)
  File ".../graphify/export.py", line 1010, in to_obsidian
    for node_id in sorted(members, key=lambda n: G.nodes[n].get("label", n)):
  File ".../graphify/export.py", line 1010, in <lambda>
    for node_id in sorted(members, key=lambda n: G.nodes[n].get("label", n)):
  File ".../networkx/classes/reportviews.py", line 196, in __getitem__
    return self._nodes[n]
KeyError: 'agents_doc'

Root cause

In to_obsidian (export.py:1010), the ## Members section iterates a community's members and dereferences each via G.nodes[n] (in the sort key) and node_filename[node_id] on the next line:

for node_id in sorted(members, key=lambda n: G.nodes[n].get("label", n)):
    data = G.nodes[node_id]
    node_label = node_filename[node_id]
    ...

members can contain an id that is not a node in G. Evidence from my run: the offending id "agents_doc" occurs 0 times in graphify-out/graph.json but does appear in graphify-out/.graphify_analysis.json (the sidecar the exporter draws community/label data from). The graph does contain real nodes whose ids end in _agents_doc (e.g. t3x_rte_ckeditor_image_classes_agents_doc and other *-AGENTS.md doc nodes from different directories), so this looks like a normalized/collapsed concept id that ends up in community membership without being materialized as a node in G. Either way, the export layer trusts a 1:1 member → G.nodes mapping that does not hold.

Suggested fix

Make the member iteration defensive so one dangling id can't abort the entire vault export:

members = [m for m in members if m in G.nodes and m in node_filename]
for node_id in sorted(members, key=lambda n: G.nodes[n].get("label", n)):
    ...

(optionally log.debug the skipped ids). The deeper fix would be upstream — ensure clustering/label assignment only emits real node ids as community members — but guarding the exporter prevents a single synthesized member from taking down the whole export obsidian run.

Workaround

Excluding the directory whose docs produced the colliding stem (via .graphifyignore) and re-running extract removes the synthesized member and lets the export complete.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions