extract: AttributeError ('list' object has no attribute 'get') at merge when a semantic chunk fails — partial results discarded, semantic cache write also fails

## Summary
`graphify extract` crashes with `AttributeError: 'list' object has no attribute 'get'` at the merge step when a semantic chunk fails and partial results are returned — discarding all successful chunks. The semantic cache write fails with the same error just before the crash.

## Environment
- graphifyy **0.9.5**, source build from `v8` @ `cf4b4ef85a72c407b5e1cb5e0678faa0497a2747`
- Python 3.12 (venv), Ubuntu 24.04
- Backend: `--backend ollama --model hermes3:8b --token-budget 4000`, `GRAPHIFY_OLLAMA_NUM_CTX=8192`, `GRAPHIFY_MAX_OUTPUT_TOKENS=3072`
- Corpus: ~119 markdown docs + 23 code files → 34 chunks

## What happened
33/34 chunks succeeded; 1 chunk failed (request timeout → bisect exhausted). Then:

```text
[graphify] WARNING: 1/34 semantic chunk(s) failed — see errors above. Partial results returned.
[graphify extract] warning: could not write semantic cache: 'list' object has no attribute 'get'
Traceback (most recent call last):
  File "~/graphify-trial/venv/bin/graphify", line 6, in <module>
    sys.exit(main())
  File ".../site-packages/graphify/__main__.py", line 4860, in main
    e.get("source_file", "") for e in sem_result.get("edges", [])
AttributeError: 'list' object has no attribute 'get'
```

No `graph.json` is written; all successful extraction work is lost. Because the semantic cache write fails too, a re-run re-extracts everything.

## Root cause (from reading the source)
`sem_result["edges"]` can contain a **list** entry instead of a dict — a malformed LLM response (JSON array where an edge object belongs) that slips past response validation, apparently on the failed-chunk/partial path. The `_sem_extracted` comprehension at `__main__.py:4860` then calls `.get()` on it. The semantic-cache writer iterates the same entries and fails the same way (caught, warning only).

## Suggested fix
Normalize/sanitize collected semantic results before cache-write/merge (or harden per-entry validation where fresh chunk results are extended), e.g.:

```python
for k in ("nodes", "edges", "hyperedges"):
    sem_result[k] = [x for x in sem_result.get(k, []) if isinstance(x, dict)]
```

## Workaround we're running
The 4-line sanitize above inserted just before the `# Merge AST + semantic ...` block — with it, the same corpus completes: partial results flow through, `graph.json` written, 1 failed chunk re-queues incrementally as designed (#933 comment behavior).

Happy to provide more detail. Thanks for graphify — the local-first design (tree-sitter + ollama backend) is exactly why we adopted it.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

extract: AttributeError ('list' object has no attribute 'get') at merge when a semantic chunk fails — partial results discarded, semantic cache write also fails #1631

Summary

Environment

What happened

Root cause (from reading the source)

Suggested fix

Workaround we're running

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

extract: AttributeError ('list' object has no attribute 'get') at merge when a semantic chunk fails — partial results discarded, semantic cache write also fails #1631

Description

Summary

Environment

What happened

Root cause (from reading the source)

Suggested fix

Workaround we're running

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions