Description
When a semantic extraction chunk fails (e.g. 429 rate limit), graphify marks
the files in that chunk as processed in manifest.json anyway. On the next run,
incremental mode reports "0 files changed" and skips them permanently — those
files never make it into the graph.
Steps to reproduce
- Run
graphify extract with Gemini free tier (TPM=250K/min):
graphify extract . --backend gemini --token-budget 100000 --max-concurrency 1
- Chunks 1 and 2 succeed (~100K tokens each, 200K total within the minute window)
- Chunk 3 hits 429 — cumulative 300K tokens exceeds TPM=250K limit
- Re-run the same command
- Output: incremental mode, 0 files to re-extract
- Files from chunk 3 onwards are absent from the graph — silently
Expected behavior
Files whose chunk failed should remain pending in manifest.
Re-running should build new chunks only from pending files.
Root cause
manifest.json records file state (mtime/hash) per file, not chunk-level
success/failure. There is no way to distinguish "file was successfully
semantically extracted" from "file was grouped into a chunk that later failed".
Environment
- graphify 0.8.8
- Backend:
gemini (gemini-3-flash), --max-concurrency 1, --token-budget 100000
- Some chunks succeeded, others failed with 429 (TPM=250K/min limit exceeded)
Suggested fix
Write chunk results to manifest only after the chunk completes successfully.
Files whose chunk failed keep their previous manifest state and are included
in the next run's chunk planning.
Description
When a semantic extraction chunk fails (e.g. 429 rate limit), graphify marks
the files in that chunk as processed in
manifest.jsonanyway. On the next run,incremental mode reports "0 files changed" and skips them permanently — those
files never make it into the graph.
Steps to reproduce
graphify extractwith Gemini free tier (TPM=250K/min):graphify extract . --backend gemini --token-budget 100000 --max-concurrency 1Expected behavior
Files whose chunk failed should remain
pendingin manifest.Re-running should build new chunks only from pending files.
Root cause
manifest.jsonrecords file state (mtime/hash) per file, not chunk-levelsuccess/failure. There is no way to distinguish "file was successfully
semantically extracted" from "file was grouped into a chunk that later failed".
Environment
gemini(gemini-3-flash),--max-concurrency 1,--token-budget 100000Suggested fix
Write chunk results to manifest only after the chunk completes successfully.
Files whose chunk failed keep their previous manifest state and are included
in the next run's chunk planning.