Skip to content

Incremental --update leaves ghost nodes for deleted files (prune abs-vs-relative source_file mismatch) #1571

Description

@goodjira

Summary

Incremental --update leaves ghost nodes when a file is deleted from the corpus. The deleted file's nodes survive the prune and remain in graph.json indefinitely, because build_merge's prune-matching compares absolute paths (from detect_incremental) against the relative source_file keys stored on nodes.

Version: graphifyy==0.9.3

Root cause

The --update runbook (in the shipped skill, references/update.md) calls:

G = build_merge(
    [new_extraction],
    graph_path='graphify-out/graph.json',
    prune_sources=prune,          # <-- no root= passed
)

prune comes from detect_incremental(...)['deleted_files'], which returns absolute paths (e.g. /repo/HANDOFF.md). Stored nodes keep a relative source_file (e.g. HANDOFF.md).

In build_merge, the prune set is built as {p, _norm_source_file(p, _root_str)}. But because root was never threaded through, _root_str is None, and _norm_source_file(abs, None) returns the absolute path unchanged — so neither entry matches the relative node key:

from graphify.build import _norm_source_file
_norm_source_file('/repo/HANDOFF.md', None)     # -> '/repo/HANDOFF.md'   (no match vs 'HANDOFF.md')
_norm_source_file('/repo/HANDOFF.md', '/repo')  # -> 'HANDOFF.md'          (matches)

The console still prints Pruned N node(s) from M deleted source file(s) (M counts the inputs, not actual matches), so the failure is silent.

Reproduction

  1. Build a graph on a repo containing a doc file, e.g. HANDOFF.md.
  2. Delete HANDOFF.md.
  3. Run /graphify <repo> --update (or the build_merge call from references/update.md).
  4. Observe: graph.json still contains all HANDOFF.md-sourced nodes; [n for n in nodes if n['source_file']=='HANDOFF.md'] is non-empty.

Suggested fix

Either (or both):

  1. Runbook: pass root in references/update.md's build_merge call, consistent with the rest of the runbook which already threads root='INPUT_PATH':
    G = build_merge([new_extraction], graph_path='graphify-out/graph.json',
                    prune_sources=prune, root='INPUT_PATH')
  2. Library hardening: in build_merge, when root is None, default the prune-normalization root to the graph's scan root (e.g. graph_path.parent.parent or the saved .graphify_root) so absolute prune_sources always relativize to the stored key form.

Impact

Any --update that deletes files accumulates stale nodes/edges across runs, silently degrading graph accuracy over time. Workaround for now: after an --update with deletions, manually drop nodes whose source_file no longer exists on disk and rebuild.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions