build_merge / --update drops existing graph's hyperedges from unchanged files (only nodes+edges are read from graph.json)

## Summary

`build_merge()` — the function backing `graphify --update` — reads only `nodes` and `edges` from the existing `graph.json`, never `hyperedges`. As a result, **every incremental update silently drops all hyperedges belonging to files that weren't re-extracted in that run.** After a full build the graph carries all its hyperedges; the first `--update` that touches even one file collapses the graph's hyperedge set down to just the hyperedges of the changed file(s).

This is the highest-signal part of the semantic layer (domain-flow groupings), so the loss is disproportionately damaging to `query`/`explain` quality.

Version: `graphifyy==0.9.3`.

## Root cause

In `build_merge()` (`build.py:687`), the existing graph is loaded with only:

```python
existing_nodes = list(data.get("nodes", []))   # build.py:719
existing_edges = list(data.get(links_key, [])) # build.py:720
```

`data.get("hyperedges")` is never read. The merged graph's hyperedges are then set solely from the new chunks in `build()`:

```python
hyperedges = extraction.get("hyperedges", [])
if hyperedges:
    ...
    G.graph["hyperedges"] = hyperedges         # build.py:591  (plain overwrite)
```

`to_json()` faithfully writes whatever `build_merge` left (`export.py:536`), so the output graph ends up with only the changed files' hyperedges. `--update` reaches this via `_build_merge(...)` at `__main__.py:4844`.

## Why this looks like an unintended gap (not by design)

The codebase already knows how to preserve hyperedges across a merge — build_merge just doesn't use either mechanism:

- `attach_hyperedges()` (`export.py:464`) merges new hyperedges into the existing set with id-level dedup:
  ```python
  existing = G.graph.get("hyperedges", [])
  seen_ids = {h["id"] for h in existing}
  for h in hyperedges:
      if h.get("id") and h["id"] not in seen_ids:
          existing.append(h)
  G.graph["hyperedges"] = existing
  ```
- The `watch` path explicitly carries existing hyperedges forward (`watch.py:682`):
  ```python
  "hyperedges": existing.get("hyperedges", []),
  ```

Only the `build_merge` / `--update` path plain-overwrites from the new chunks.

## Reproduction

1. Full-build a repo that produces hyperedges from several doc files (`graphify .`). Note the hyperedge count in `graph.json` (`.graph.hyperedges` / top-level `hyperedges`).
2. Modify a single file and run `graphify --update`.
3. `graph.json` now contains only the hyperedges extracted from that one changed file; every hyperedge from the untouched files is gone.

Observed in practice: a repo whose semantic cache holds 57 hyperedges across its doc files had `graph.json` collapse to 2 hyperedges after an `--update` that re-extracted 2 files. The 55 lost hyperedges were all from unchanged files and remain intact in `graphify-out/cache/semantic/*` — only the merged graph dropped them.

## Impact

- Every incremental update degrades the semantic layer; the graph only ever retains the last-updated files' hyperedges.
- Distinct from #1005 (which drops hyperedges only in `graph.html` when the viz node limit is exceeded — `graph.json` is unaffected there). Here the loss is in `graph.json` itself.
- Also distinct from #1561 (member-list alias keys during extraction) and #1430 (extraction prompt drift). This is purely the merge/preservation step.

## Suggested fix

Have `build_merge()` read `data.get("hyperedges", [])` from the existing graph and merge it with the new chunks' hyperedges via `attach_hyperedges()` (id-dedup), with re-extracted files' hyperedges replacing their prior contribution by `source_file` — mirroring how nodes/edges are already replaced-per-source in `build_merge`, and how `watch.py` already preserves them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

build_merge / --update drops existing graph's hyperedges from unchanged files (only nodes+edges are read from graph.json) #1574

Summary

Root cause

Why this looks like an unintended gap (not by design)

Reproduction

Impact

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

build_merge / --update drops existing graph's hyperedges from unchanged files (only nodes+edges are read from graph.json) #1574

Description

Summary

Root cause

Why this looks like an unintended gap (not by design)

Reproduction

Impact

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions