Skip to content

graphify update never prunes a deleted import's edge (stale edges -> false circular-dependency findings) #1521

Description

@UltronOfSpace

graphify update never prunes a deleted import's edge → stale edges → false structural findings

Component: graphify (PyPI graphifyy) incremental update
Affected versions: confirmed on 0.8.51 (latest) and 0.8.44
Severity: correctness — produces silently wrong graphs that drive false analysis (e.g. phantom circular-dependency reports)
Platform observed: Windows 11, Python 3.13 (behavior is in pure graph-merge logic, not platform-specific)


Summary

When an import (or any edge-producing reference) is deleted from a file, graphify update re-extracts the file and writes a new graph, but the old edge is carried forward — it is never pruned. graphify update --force does not fix it either; only a full clean rebuild (delete graph.json, then update) removes the stale edge.

Because the edge survives, downstream analysis is wrong. In our case a file used to import another, the import was refactored out (replaced by a registration/callback pattern), and the stale edge made the dependency graph report a circular dependency that no longer exists for months — until a clean rebuild was forced.

Minimal reproduction

mkdir -p repro/src && cd repro

printf "import { foo } from './b';\nexport function useA(): void { foo(); }\n" > src/a.ts
printf "export function foo(): void {}\n" > src/b.ts
graphify update .                       # builds graph: a.ts -> b.ts import edge exists

printf "export function useA(): void {}\n" > src/a.ts    # DELETE the import
graphify update .                       # re-extract + rebuild

# BUG: graphify-out/graph.json still contains the a.ts -> b.ts import edge.
#      `graphify update . --force` does not remove it either.
#      Only `rm -f graphify-out/graph.json && graphify update .` removes it.

Check programmatically:

import json
g = json.load(open("graphify-out/graph.json"))
id2f = {n["id"]: n.get("source_file") for n in g["nodes"]}
print([l for l in g["links"]
       if l.get("relation") in ("imports", "imports_from")
       and "a.ts" in str(l.get("source_file") or id2f.get(l["source"]))
       and "b.ts" in str(id2f.get(l["target"]))])
# Expected after deleting the import: []   Actual: two stale edges.

Expected vs actual

  • Expected: after re-extraction, a re-extracted file's removed edges are gone from the graph.
  • Actual: the removed edges persist. graphify update prints e.g. [graphify watch] Rebuilt: 4 nodes, 5 edges — the 5 edges still include the 2 stale a.ts -> b.ts import edges.

The surviving edge clearly belongs to the changed file (note source_file):

{"relation": "imports_from", "context": "import", "source_file": "src/a.ts",
 "source_location": "L1", "source": "src_a", "target": "src_b"}

Root cause (with code references, v0.8.51)

flowchart TD
    U["graphify update ."] --> RC["watch.py _rebuild_code<br/>(prints 'Rebuilt: N edges')"]
    RC --> EX["extract re-extracted files"]
    EX --> WR["write graph.json"]
    WR --> STALE["deleted import's edge survives — BUG"]
    BM["build.py build_merge<br/>(518-533: drop existing nodes/edges<br/>whose source_file was re-extracted)"]
    RC -. "does NOT call / does NOT replicate" .-> BM
    BM -. "recommended: apply this prune here too" .-> RC
Loading

The correct stale-edge prevention already exists in build.py::build_merge:

build.py:480-483"Re-extracted files REPLACE their prior contribution: any source_file present in new_chunks is dropped from the loaded graph before merging, so a changed file's stale nodes/edges don't accumulate."

…implemented at build.py:518-533: collect new_sources (every source_file re-extracted) and drop every existing node and edge whose source_file is in that set.

But the graphify update CLI path does not use build_merge. It goes through watch.py (_rebuild_code, the path that prints [graphify watch] Rebuilt: …). That module contains no reference to build_merge, new_sources, or the source_file-replacement logic — it builds the graph from extract(...) output and writes it without dropping a re-extracted file's prior edges. So the very protection build_merge documents is bypassed on the most common path (graphify update, which the README and watch hooks recommend running after edits).

Additional notes:

  • --force only relaxes the node-shrink guard (build.py:588-597, "refuse to shrink graph … pass prune_sources"); it does not add edge pruning, so it does not help.
  • The same_topology short-circuit in watch.py:740-759 is not the cause here — in the repro the graph is rebuilt and rewritten; the rewritten graph simply still contains the stale edges.

Recommended fix

Apply build_merge's source_file replacement on the incremental update path too. Either:

  1. Route the update/_rebuild_code merge through build_merge() (preferred — single source of truth), or
  2. Replicate build.py:518-533 in watch.py's rebuild: before/while merging the re-extracted chunks into the existing graph, drop every existing node and edge whose source_file is among the re-extracted files.

Edges reliably carry their own source_file (e.g. the stale edge above has "source_file": "src/a.ts"), so the edge-side prune is straightforward and symmetric with the node-side prune. This makes graphify update self-correcting and removes the need for users to periodically rm -f graph.json.

Impact / why it matters

graphify update is the documented "keep the graph current" command and is commonly wired into commit hooks / watch. Every deleted import (or other removed reference) leaves a ghost edge with no warning, which silently corrupts structural analyses — circular-dependency detection, hub/coupling metrics, impact maps — until someone happens to do a full clean rebuild. The graph drifts further from reality the longer incremental updates run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions