Summary
graphify extract crashes with AttributeError: 'list' object has no attribute 'get' at the merge step when a semantic chunk fails and partial results are returned — discarding all successful chunks. The semantic cache write fails with the same error just before the crash.
Environment
- graphifyy 0.9.5, source build from
v8 @ cf4b4ef85a72c407b5e1cb5e0678faa0497a2747
- Python 3.12 (venv), Ubuntu 24.04
- Backend:
--backend ollama --model hermes3:8b --token-budget 4000, GRAPHIFY_OLLAMA_NUM_CTX=8192, GRAPHIFY_MAX_OUTPUT_TOKENS=3072
- Corpus: ~119 markdown docs + 23 code files → 34 chunks
What happened
33/34 chunks succeeded; 1 chunk failed (request timeout → bisect exhausted). Then:
[graphify] WARNING: 1/34 semantic chunk(s) failed — see errors above. Partial results returned.
[graphify extract] warning: could not write semantic cache: 'list' object has no attribute 'get'
Traceback (most recent call last):
File "~/graphify-trial/venv/bin/graphify", line 6, in <module>
sys.exit(main())
File ".../site-packages/graphify/__main__.py", line 4860, in main
e.get("source_file", "") for e in sem_result.get("edges", [])
AttributeError: 'list' object has no attribute 'get'
No graph.json is written; all successful extraction work is lost. Because the semantic cache write fails too, a re-run re-extracts everything.
Root cause (from reading the source)
sem_result["edges"] can contain a list entry instead of a dict — a malformed LLM response (JSON array where an edge object belongs) that slips past response validation, apparently on the failed-chunk/partial path. The _sem_extracted comprehension at __main__.py:4860 then calls .get() on it. The semantic-cache writer iterates the same entries and fails the same way (caught, warning only).
Suggested fix
Normalize/sanitize collected semantic results before cache-write/merge (or harden per-entry validation where fresh chunk results are extended), e.g.:
for k in ("nodes", "edges", "hyperedges"):
sem_result[k] = [x for x in sem_result.get(k, []) if isinstance(x, dict)]
Workaround we're running
The 4-line sanitize above inserted just before the # Merge AST + semantic ... block — with it, the same corpus completes: partial results flow through, graph.json written, 1 failed chunk re-queues incrementally as designed (#933 comment behavior).
Happy to provide more detail. Thanks for graphify — the local-first design (tree-sitter + ollama backend) is exactly why we adopted it.
Summary
graphify extractcrashes withAttributeError: 'list' object has no attribute 'get'at the merge step when a semantic chunk fails and partial results are returned — discarding all successful chunks. The semantic cache write fails with the same error just before the crash.Environment
v8@cf4b4ef85a72c407b5e1cb5e0678faa0497a2747--backend ollama --model hermes3:8b --token-budget 4000,GRAPHIFY_OLLAMA_NUM_CTX=8192,GRAPHIFY_MAX_OUTPUT_TOKENS=3072What happened
33/34 chunks succeeded; 1 chunk failed (request timeout → bisect exhausted). Then:
No
graph.jsonis written; all successful extraction work is lost. Because the semantic cache write fails too, a re-run re-extracts everything.Root cause (from reading the source)
sem_result["edges"]can contain a list entry instead of a dict — a malformed LLM response (JSON array where an edge object belongs) that slips past response validation, apparently on the failed-chunk/partial path. The_sem_extractedcomprehension at__main__.py:4860then calls.get()on it. The semantic-cache writer iterates the same entries and fails the same way (caught, warning only).Suggested fix
Normalize/sanitize collected semantic results before cache-write/merge (or harden per-entry validation where fresh chunk results are extended), e.g.:
Workaround we're running
The 4-line sanitize above inserted just before the
# Merge AST + semantic ...block — with it, the same corpus completes: partial results flow through,graph.jsonwritten, 1 failed chunk re-queues incrementally as designed (#933 comment behavior).Happy to provide more detail. Thanks for graphify — the local-first design (tree-sitter + ollama backend) is exactly why we adopted it.