Skip to content

Files in failed semantic chunks are permanently skipped on re-run #933

Description

@sub4biz

Description

When a semantic extraction chunk fails (e.g. 429 rate limit), graphify marks
the files in that chunk as processed in manifest.json anyway. On the next run,
incremental mode reports "0 files changed" and skips them permanently — those
files never make it into the graph.

Steps to reproduce

  1. Run graphify extract with Gemini free tier (TPM=250K/min):
    graphify extract . --backend gemini --token-budget 100000 --max-concurrency 1
  2. Chunks 1 and 2 succeed (~100K tokens each, 200K total within the minute window)
  3. Chunk 3 hits 429 — cumulative 300K tokens exceeds TPM=250K limit
  4. Re-run the same command
  5. Output: incremental mode, 0 files to re-extract
  6. Files from chunk 3 onwards are absent from the graph — silently

Expected behavior

Files whose chunk failed should remain pending in manifest.
Re-running should build new chunks only from pending files.

Root cause

manifest.json records file state (mtime/hash) per file, not chunk-level
success/failure. There is no way to distinguish "file was successfully
semantically extracted" from "file was grouped into a chunk that later failed".

Environment

  • graphify 0.8.8
  • Backend: gemini (gemini-3-flash), --max-concurrency 1, --token-budget 100000
  • Some chunks succeeded, others failed with 429 (TPM=250K/min limit exceeded)

Suggested fix

Write chunk results to manifest only after the chunk completes successfully.
Files whose chunk failed keep their previous manifest state and are included
in the next run's chunk planning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions