Skip to content

fix: prevent infinite retry loop from orphaned tool_use blocks#15152

Closed
ryankass-cb wants to merge 1 commit intoanomalyco:devfrom
ryankass-cb:fix/orphaned-tool-use-retry-loop
Closed

fix: prevent infinite retry loop from orphaned tool_use blocks#15152
ryankass-cb wants to merge 1 commit intoanomalyco:devfrom
ryankass-cb:fix/orphaned-tool-use-retry-loop

Conversation

@ryankass-cb
Copy link

Summary

Fixes an unrecoverable session crash caused by orphaned tool_use blocks triggering an infinite retry loop against the Anthropic API.

Error: tool_use ids were found without tool_result blocks immediately after: toolu_vrtx_...

Root Cause

Three bugs combine to create the infinite loop:

  1. Retry loop skips orphan cleanup (processor.ts): The continue statement in the retry path jumps back to while(true), bypassing the orphan cleanup code (lines 393-409) that marks pending/running tools as errors. Orphaned tool_use blocks persist in message history.

  2. Stale messages on retry (processor.ts): streamInput.messages is built once before the retry loop and never refreshed from the database. Even if orphan cleanup ran, retries send the same broken messages.

  3. invalid_request_error incorrectly classified as retryable (retry.ts): The catch-all return JSON.stringify(json) makes ALL JSON error bodies retryable, including invalid_request_error — a structural issue that fails identically on every attempt.

Fixes

File Change
processor.ts Move orphan cleanup before the retry continue; rebuild messages from DB on retry via new rebuildMessages callback
llm.ts Add optional rebuildMessages callback to StreamInput interface
prompt.ts Wire up rebuildMessages to re-read messages from DB and convert via toModelMessages()
retry.ts Mark invalid_request_error as non-retryable; replace catch-all return JSON.stringify(json) with return undefined
transform.ts Add defensive repairOrphanedToolUse() validation as last-resort before API calls — injects synthetic error tool_result for any orphaned tool_use blocks

Defense in Depth

The fix operates at three layers:

  1. Prevention: Orphan cleanup runs before retry, messages rebuilt from clean DB state
  2. Circuit breaker: invalid_request_error stops the retry loop immediately
  3. Last resort: repairOrphanedToolUse() in the transform layer catches any orphans that slip through other defenses

Reproduction

  1. Start a session with tools enabled
  2. Trigger a tool call, then interrupt/abort mid-execution (timeout, network error, etc.)
  3. The session enters an infinite retry loop showing the invalid_request_error repeatedly
  4. The session becomes unrecoverable — no way to continue without starting fresh

Three bugs combined to cause unrecoverable sessions when tool execution
was interrupted:

1. Retry loop skips orphan cleanup: When processor.ts retries after an
   API error, the `continue` statement at the retry path jumps back to
   the top of the while loop, bypassing the orphan cleanup code that
   marks pending/running tools as errors. This leaves orphaned tool_use
   blocks in the message history.

2. Stale messages on retry: The streamInput.messages array is built once
   before the retry loop starts and never refreshed. Even if orphan
   cleanup ran, the retry would send the same stale messages with
   orphaned tool_use blocks.

3. invalid_request_error incorrectly retried: The catch-all
   `return JSON.stringify(json)` in retry.ts makes ALL JSON error bodies
   retryable, including `invalid_request_error` which is a structural
   issue that will fail identically on every attempt, creating an
   infinite loop.

Fixes:
- Move orphan cleanup before the retry `continue` in processor.ts
- Add `rebuildMessages` callback to StreamInput so processor can
  refresh messages from DB after orphan cleanup on retry
- Mark `invalid_request_error` as non-retryable in retry.ts
- Remove catch-all retry classification for unrecognized error types
- Add defensive `repairOrphanedToolUse()` in transform.ts as a
  last-resort validation that injects synthetic error tool_results
  for any orphaned tool_use blocks before sending to the API
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions bot added needs:issue needs:compliance This means the issue will auto-close after 2 hours. labels Feb 26, 2026
@github-actions
Copy link
Contributor

This PR doesn't fully meet our contributing guidelines and PR template.

What needs to be fixed:

  • PR description is missing required template sections. Please use the PR template.

Please edit this PR description to address the above within 2 hours, or it will be automatically closed.

If you believe this was flagged incorrectly, please let a maintainer know.

@github-actions
Copy link
Contributor

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found a potentially related PR:

PR #8497: "fix: handle dangling tool_use blocks for LiteLLM proxy compatibility"

PR #14456: "fix(core): repair interleaved text/tool-call parts in assistant messages"

However, PR #15152 (the current PR) appears to be the primary/newest fix addressing the specific infinite retry loop issue with orphaned tool_use blocks, while the older PRs address related but distinct aspects of tool_use handling.

@github-actions
Copy link
Contributor

This pull request has been automatically closed because it was not updated to meet our contributing guidelines within the 2-hour window.

Feel free to open a new pull request that follows our guidelines.

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Feb 26, 2026
@github-actions github-actions bot closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant