Skip to content

fix(backend): add circuit breaker for infinite tool call retry loops#12499

Merged
majdyz merged 14 commits into
devfrom
fix/copilot-empty-tool-params
Mar 24, 2026
Merged

fix(backend): add circuit breaker for infinite tool call retry loops#12499
majdyz merged 14 commits into
devfrom
fix/copilot-empty-tool-params

Conversation

@majdyz
Copy link
Copy Markdown
Contributor

@majdyz majdyz commented Mar 20, 2026

Summary

  • Adds a two-layer circuit breaker to prevent AutoPilot from looping infinitely when tool calls fail with empty parameters
  • Tool-level: After 3 consecutive identical failures per tool, returns a hard-stop message instructing the model to output content as text instead of retrying
  • Stream-level: After 6 consecutive empty tool calls (input: {}), aborts the stream entirely with a user-visible error and retry button

Background

In session c5548b48, the model completed all research successfully but then spent 51+ minutes in an infinite loop trying to write output — every tool call was sent with input: {} (likely due to context saturation preventing argument serialization). 21+ identical failing tool calls with no circuit breaker.

Changes

  • tool_adapter.py: Added _check_circuit_breaker, _record_tool_failure, _clear_tool_failures functions with a ContextVar-based tracker. Integrated into both create_tool_handler (BaseTool) and the _truncating wrapper (all tools).
  • service.py: Added empty-tool-call detection in the main stream loop that counts consecutive AssistantMessages with empty ToolUseBlock.input and aborts after the limit.
  • test_circuit_breaker.py: 7 unit tests covering threshold behavior, per-args tracking, reset on success, and uninitialized tracker safety.

Test plan

  • Unit tests pass (pytest backend/copilot/sdk/test_circuit_breaker.py — 8/8 passing)
  • Pre-commit hooks pass (Ruff, Black, isort, typecheck all pass)
  • E2E: CoPilot tool calls work normally (GetCurrentTimeBlock returned 09:16:39 UTC)
  • E2E: Circuit breaker pass-through verified (successful calls don't trigger breaker)
  • E2E: Circuit breaker code integrated into tool_adapter truncating wrapper

@majdyz majdyz requested a review from a team as a code owner March 20, 2026 22:21
@majdyz majdyz requested review from Pwuts and Swiftyos and removed request for a team March 20, 2026 22:21
@github-project-automation github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Mar 20, 2026
@github-actions github-actions Bot added platform/backend AutoGPT Platform - Back end size/l labels Mar 20, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds two circuit-breakers: one in the SDK streaming path to detect repeated empty Assistant tool calls and abort the stream with a retryable StreamError; and one in the tool adapter that tracks per-execution-context consecutive tool failures (keyed by tool+args) and short-circuits repeated failures.

Changes

Cohort / File(s) Summary
Streaming circuit breaker & transient error threading
autogpt_platform/backend/backend/copilot/sdk/service.py
Tracks consecutive_empty_tool_calls during streaming, introduces _EMPTY_TOOL_CALL_LIMIT and _CIRCUIT_BREAKER_ERROR_MSG, threads error_msg through _TransientErrorHandled/attempt flow, appends retryable error marker, yields StreamError(code="circuit_breaker_empty_tool_calls"), and aborts attempt when limit reached; expands ResultMessage logging.
Per-context tool failure tracker
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
Adds a ContextVar-backed _consecutive_tool_failures keyed by tool_name + stable JSON of args; implements _check_circuit_breaker, _record_tool_failure, _clear_tool_failures; short-circuits tool execution when threshold reached; records failures on expansion/MCP errors and clears on success.
Tests for breaker behavior
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
New pytest suite exercising tracker isolation, threshold / near-threshold behavior, per-args/tool keying (including {}), clearing behavior, and uninitialized-tracker robustness.
Prompt guidance doc update
autogpt_platform/backend/backend/copilot/prompting.py
Adds guidance for generating large files using multiple bash_exec chunks and @@agptfile:/... rather than single large write_file/bash_exec calls (documentation only).

Sequence Diagram(s)

sequenceDiagram
    participant Client as SDK Client
    participant Stream as Service Stream Loop
    participant Detector as Empty Tool Detector
    participant Counter as consecutive_empty_tool_calls
    participant Error as Stream Error Handler

    Client->>Stream: start streaming
    Stream->>Client: receive AssistantMessage
    Stream->>Detector: inspect AssistantMessage for ToolUseBlock input
    alt empty/falsey tool input
        Detector->>Counter: increment
        Counter->>Detector: value
        alt value < _EMPTY_TOOL_CALL_LIMIT
            Detector->>Stream: continue streaming
        else value >= _EMPTY_TOOL_CALL_LIMIT
            Detector->>Error: append retryable session error marker (error_msg)
            Error->>Stream: yield StreamError(code="circuit_breaker_empty_tool_calls")
            Stream->>Stream: break (terminate attempt)
        end
    else non-empty tool or no tool
        Detector->>Counter: reset to 0
        Stream->>Client: continue streaming
    end
Loading
sequenceDiagram
    participant Handler as Tool Handler
    participant ExecCtx as Execution Context
    participant Breaker as _check_circuit_breaker
    participant Tracker as _consecutive_tool_failures

    Handler->>ExecCtx: initialize ContextVar tracker
    Handler->>Breaker: _check_circuit_breaker(tool_name, args)
    alt breaker tripped
        Breaker->>Handler: return MCP STOP error (short-circuit)
        Handler->>Handler: return early with MCP error marker
    else not tripped
        Handler->>Handler: expand refs / call tool
        alt success (no MCP error)
            Handler->>Tracker: _clear_tool_failures(tool_name)
        else failure (FileRefExpansionError or MCP error)
            Handler->>Tracker: _record_tool_failure(tool_name, args)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • Swiftyos
  • Pwuts
  • ntindle

Poem

🐰 I counted echoes in a stream so wide,

Empty calls piled up like a rising tide.
Six little taps — I pulled the cord,
Wrote a friendly note and stopped the board.
Now quiet flows rest safe inside.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main objective of the changeset: adding a circuit breaker mechanism to prevent infinite tool call retry loops in the backend.
Docstring Coverage ✅ Passed Docstring coverage is 86.36% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The pull request description clearly explains the two-layer circuit breaker implementation, the motivation from a real session loophole, and the specific changes made to tool_adapter.py, service.py, and test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/copilot-empty-tool-params

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2026

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 2 conflict(s), 0 medium risk, 2 low risk (out of 4 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

Comment thread autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (5)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)

1304-1323: Extract duplicated error message to a constant.

The same error message string appears twice (lines 1307-1311 and 1315-1320). Extracting it would prevent divergence and simplify maintenance.

Suggested refactor
+_CIRCUIT_BREAKER_ERROR_MSG = (
+    "AutoPilot was unable to complete the tool call "
+    "— this usually happens when the response is "
+    "too large to fit in a single tool call. "
+    "Try breaking your request into smaller parts."
+)
+
 # In the detection block:
 _append_error_marker(
     ctx.session,
-    (
-        "AutoPilot was unable to complete the tool call "
-        "— this usually happens when the response is "
-        "too large to fit in a single tool call. "
-        "Try breaking your request into smaller parts."
-    ),
+    _CIRCUIT_BREAKER_ERROR_MSG,
     retryable=True,
 )
 yield StreamError(
-    errorText=(
-        "AutoPilot was unable to complete the tool "
-        "call — this usually happens when the response "
-        "is too large to fit in a single tool call. "
-        "Try breaking your request into smaller parts."
-    ),
+    errorText=_CIRCUIT_BREAKER_ERROR_MSG,
     code="circuit_breaker_empty_tool_calls",
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1304 -
1323, Duplicate multi-sentence error text is used in the call to
_append_error_marker(...) and in the StreamError(...) construction; extract that
message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or
TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to
that constant, keeping the same wording, and leave the surrounding calls to
_append_error_marker, StreamError, and the ended_with_stream_error assignment
unchanged.

1075-1079: Consider moving _EMPTY_TOOL_CALL_LIMIT to module level for consistency.

This constant is defined inside _run_stream_attempt, but similar constants like _MAX_STREAM_ATTEMPTS (line 107) and _MAX_CONSECUTIVE_TOOL_FAILURES in tool_adapter.py are module-level. Moving it would improve discoverability and allow reuse if needed.

Suggested refactor
 _MAX_STREAM_ATTEMPTS = 3
+
+# Hard circuit breaker: abort the stream if the model sends too many
+# consecutive tool calls with empty parameters (a sign of context
+# saturation or serialization failure).
+_EMPTY_TOOL_CALL_LIMIT = 6

Then remove lines 1075-1079 and reference the module constant.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1075 -
1079, Move the _EMPTY_TOOL_CALL_LIMIT constant out of the _run_stream_attempt
function and define it at module level (alongside _MAX_STREAM_ATTEMPTS and other
top-level constants) so it’s discoverable and reusable; then remove the
in-function declaration and update any references inside _run_stream_attempt
(and related logic) to use the new module-level _EMPTY_TOOL_CALL_LIMIT constant.
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (2)

273-282: Consider documenting the "clear all args" behavior.

_clear_tool_failures removes all failure entries for a tool regardless of args, which means success with args_B clears failures for args_A. This is a reasonable design choice for the bug scenario (repeated identical empty args), but it could allow a specific failing pattern to retry indefinitely if interleaved with different successful calls.

A brief docstring explaining this trade-off would help future maintainers.

Suggested docstring enhancement
 def _clear_tool_failures(tool_name: str) -> None:
-    """Clear failure tracking for a tool on success."""
+    """Clear failure tracking for a tool on success.
+
+    Clears ALL args variants for the tool, not just the successful call's args.
+    This gives the tool a "fresh start" on any success, which is appropriate for
+    the primary use case (detecting infinite loops with identical failing args).
+    Trade-off: interleaved successes with different args could reset a failing pattern.
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines
273 - 282, The _clear_tool_failures function currently clears all failure
entries for a given tool name regardless of argument variants, which can allow a
failing arg pattern to be retried if any other arg succeeds; update the
docstring for _clear_tool_failures to explicitly state that it clears all
entries for tool_name (not per-args), mention the trade-off (simpler recovery
vs. possible indefinite retry of specific failing args), and note where
_consecutive_tool_failures and keys_to_remove are used so maintainers can
consider switching to per-arg clearing if needed.

436-440: Redundant circuit breaker check for BaseTool handlers.

The _truncating wrapper adds a circuit breaker check, but create_tool_handler (which it wraps for BaseTool handlers) already has the same check at line 294. This means BaseTool handlers are checked twice.

While harmless (first check trips, second never reached), this is redundant. Consider either:

  1. Removing the check from create_tool_handler since all handlers go through _truncating
  2. Documenting that _truncating is the authoritative check point
Option 1: Remove from create_tool_handler
 async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
     """Execute the wrapped tool and return MCP-formatted response."""
-    # Circuit breaker: stop infinite retry loops
-    stop_msg = _check_circuit_breaker(base_tool.name, args)
-    if stop_msg:
-        return _mcp_error(stop_msg)
-
     user_id, session = get_execution_context()

The _truncating wrapper already handles the check for all tools uniformly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines
436 - 440, There is a redundant circuit breaker invocation: both the _truncating
wrapper and create_tool_handler call _check_circuit_breaker and return
_mcp_error, causing double-checks for BaseTool handlers; remove the duplicate
check from create_tool_handler (delete the _check_circuit_breaker call and its
_mcp_error return in create_tool_handler) so that _truncating is the single
authoritative circuit-breaker entry point for all tools (leave _truncating,
_check_circuit_breaker, and _mcp_error unchanged).
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (1)

44-53: Consider adding a test for different tool names being tracked separately.

The test suite covers different args for the same tool, but doesn't explicitly verify that different tool names have independent counters.

Suggested additional test
def test_different_tools_tracked_separately(self):
    """Different tools should have separate failure counters."""
    args = {"file_path": "/tmp/test.txt"}
    for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
        _record_tool_failure("tool_a", args)
    # tool_a should trip
    assert _check_circuit_breaker("tool_a", args) is not None
    # tool_b with same args should NOT trip
    assert _check_circuit_breaker("tool_b", args) is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py` around
lines 44 - 53, Add a new unit test (e.g.,
test_different_tools_tracked_separately) that uses the same args dict but
records failures for one tool name via _record_tool_failure("tool_a", args) for
_MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that
_check_circuit_breaker("tool_a", args) is tripped (not None), and then assert
that _check_circuit_breaker("tool_b", args) with the same args is not tripped
(is None), verifying tool-name isolation of the failure counters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1304-1323: Duplicate multi-sentence error text is used in the call
to _append_error_marker(...) and in the StreamError(...) construction; extract
that message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or
TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to
that constant, keeping the same wording, and leave the surrounding calls to
_append_error_marker, StreamError, and the ended_with_stream_error assignment
unchanged.
- Around line 1075-1079: Move the _EMPTY_TOOL_CALL_LIMIT constant out of the
_run_stream_attempt function and define it at module level (alongside
_MAX_STREAM_ATTEMPTS and other top-level constants) so it’s discoverable and
reusable; then remove the in-function declaration and update any references
inside _run_stream_attempt (and related logic) to use the new module-level
_EMPTY_TOOL_CALL_LIMIT constant.

In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py`:
- Around line 44-53: Add a new unit test (e.g.,
test_different_tools_tracked_separately) that uses the same args dict but
records failures for one tool name via _record_tool_failure("tool_a", args) for
_MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that
_check_circuit_breaker("tool_a", args) is tripped (not None), and then assert
that _check_circuit_breaker("tool_b", args) with the same args is not tripped
(is None), verifying tool-name isolation of the failure counters.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`:
- Around line 273-282: The _clear_tool_failures function currently clears all
failure entries for a given tool name regardless of argument variants, which can
allow a failing arg pattern to be retried if any other arg succeeds; update the
docstring for _clear_tool_failures to explicitly state that it clears all
entries for tool_name (not per-args), mention the trade-off (simpler recovery
vs. possible indefinite retry of specific failing args), and note where
_consecutive_tool_failures and keys_to_remove are used so maintainers can
consider switching to per-arg clearing if needed.
- Around line 436-440: There is a redundant circuit breaker invocation: both the
_truncating wrapper and create_tool_handler call _check_circuit_breaker and
return _mcp_error, causing double-checks for BaseTool handlers; remove the
duplicate check from create_tool_handler (delete the _check_circuit_breaker call
and its _mcp_error return in create_tool_handler) so that _truncating is the
single authoritative circuit-breaker entry point for all tools (leave
_truncating, _check_circuit_breaker, and _mcp_error unchanged).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a00d8554-76da-4c9d-84a9-7bccff354ed9

📥 Commits

Reviewing files that changed from the base of the PR and between ab7c38b and 993d688.

📒 Files selected for processing (3)
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.12)
  • GitHub Check: type-check (3.11)
  • GitHub Check: test (3.11)
  • GitHub Check: Analyze (python)
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (4)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Do not use duck typing; avoid hasattr(), getattr(), or isinstance() for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no # type: ignore, # noqa, or # pyright: ignore comments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use %s for deferred interpolation in logger.debug() statements; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count), logger.info(f"Processing {count} items"))
Sanitize error paths using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Use transaction=True in Redis pipelines for atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/backend/**/*test*.py

📄 CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/backend/**/test_*.py

📄 CodeRabbit inference engine (autogpt_platform/CLAUDE.md)

Backend test files should use @pytest.mark.xfail to mark failing tests during TDD

Files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
🧠 Learnings (14)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-17T06:48:21.085Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:10.126Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Mock at boundaries — mock where the symbol is **used**, not where it's **defined**. After refactoring, update mock targets to match new module paths.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Use pytest with snapshot testing for API responses; colocate test files with source files using `*_test.py` naming convention

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:29:57.525Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Follow Test-Driven Development (TDD): Write failing test first marked with `pytest.mark.xfail` (backend) or `.fixme` (Playwright), implement the fix, then remove the marker

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:29:57.525Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Applies to autogpt_platform/backend/**/test_*.py : Backend test files should use `pytest.mark.xfail` to mark failing tests during TDD

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-19T15:10:50.676Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12483
File: autogpt_platform/backend/backend/copilot/tools/test_dry_run.py:298-303
Timestamp: 2026-03-19T15:10:50.676Z
Learning: When using Python’s `unittest.mock.patch` in tests, choose the patch target based on how the imported name is resolved:
- If the code under test uses an **eager/module-level import** (e.g., `from foo.bar import baz` at module top), patch **the module where the name is looked up** (i.e., where it is used in the SUT), e.g. `patch("mymodule.baz")`.
- If the code under test uses a **lazy import** executed later (e.g., `from foo.bar import baz` inside a function/branch), patch **the source module** (e.g., `patch("foo.bar.baz")`) because the late `from ... import` will read the (potentially patched) name from the source module at call time.

For a concrete example: if `simulate_block` is imported inside an `if dry_run:` block in the SUT, then the correct test patch target is the source module path for `simulate_block` as it exists at call time (e.g., `patch("backend.executor.simulator.simulate_block")`), not the test file’s import location.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
🔇 Additional comments (6)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

1281-1328: LGTM! Circuit breaker implementation is sound.

The detection logic correctly:

  • Identifies empty tool calls via not b.input (catches both {} and None)
  • Tracks consecutive occurrences with proper reset on non-empty messages
  • Uses >= threshold comparison
  • Sets appropriate error state and breaks cleanly

The placement after compaction handling and before dispatch is appropriate.

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (3)

69-77: LGTM! ContextVar-based circuit breaker state is well-designed.

The pattern follows the established approach used for other context variables in this module (_pending_tool_outputs, _stash_event). Using None as default with explicit initialization in set_execution_context ensures proper per-request isolation.


291-315: LGTM! Circuit breaker integration in create_tool_handler is correct.

The flow properly:

  1. Checks circuit breaker before execution
  2. Records failure on isError results
  3. Clears failures on success
  4. Records failure on exceptions before returning error

447-454: LGTM! Failure recording on FileRefExpansionError uses original args correctly.

Recording the failure with the original (pre-expansion) args is correct since that's what the model sent and would retry with.

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (2)

14-19: LGTM! Fixture correctly uses ContextVar token for isolation.

The _reset_tracker fixture properly initializes and resets the ContextVar using the token pattern, ensuring test isolation.


22-86: LGTM! Test coverage is comprehensive.

The tests correctly verify:

  • Threshold behavior (below and at limit)
  • Per-args isolation
  • Empty args tracking (the specific bug scenario)
  • Reset/clear behavior
  • Uninitialized tracker safety

@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Mar 20, 2026

🤖 Addressed CodeRabbit review feedback in c224e0d and previous commits:

  1. Extract duplicated error message → Already fixed in 5e8eaee (_CIRCUIT_BREAKER_ERROR_MSG constant)
  2. Move _EMPTY_TOOL_CALL_LIMIT to module level → Already fixed in 5e8eaee
  3. Document _clear_tool_failures behavior → Fixed in c224e0d: enhanced docstring
  4. Redundant circuit breaker in create_tool_handler → Already fixed in 5e8eaee
  5. Add test for different tool names → Fixed in c224e0d: added test_different_tools_tracked_separately

Comment thread autogpt_platform/backend/backend/copilot/prompting.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py
Comment thread autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py Outdated
@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Mar 21, 2026

🤖 CodeRabbit review items were already addressed in earlier commits:

  1. Extract duplicated error message → Fixed in 5e8eaee (_CIRCUIT_BREAKER_ERROR_MSG constant)
  2. Move _EMPTY_TOOL_CALL_LIMIT to module level → Fixed in 5e8eaee
    3-5. Other items also addressed in previous commits. See commit history.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

1914-1937: ⚠️ Potential issue | 🟠 Major

Skip or rebuild the resume transcript after this rollback.

Line 1928 trims session.messages, but state.transcript_builder still contains entries from the aborted attempt. The finally block still uploads that builder, so a circuit-breaker/transient abort can leak rolled-back assistant/tool entries into the next --resume context.

💡 Minimal fix
             session.messages = session.messages[:pre_attempt_msg_count]
+            # `state.transcript_builder` still reflects the aborted attempt.
+            # Avoid uploading a resume transcript that no longer matches
+            # `session.messages`.
+            skip_transcript_upload = True
             # Re-append the error marker so it survives the rollback

If you want to preserve the current user turn in the transcript, rebuild the builder from session.messages instead of skipping upload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1914 -
1937, After rolling back session.messages inside the except
_TransientErrorHandled handler, ensure state.transcript_builder no longer
contains entries from the aborted attempt: either rebuild
state.transcript_builder from the trimmed session.messages or clear it so the
finally block won't upload rolled-back assistant/tool entries; update the logic
in the except block referencing session.messages and state.transcript_builder
(and ensure the finally upload path uses the rebuilt/cleared builder) so resume
transcripts reflect only persisted session.messages.
🧹 Nitpick comments (2)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)

1044-1058: Make this handled-error path generic, not “transient.”

Because Line 1455 now raises _TransientErrorHandled for any ended_with_stream_error, the class/docstring and the warning in this handler will also label circuit_breaker_empty_tool_calls as transient. Renaming it and carrying code/retryable alongside error_msg would keep the new breaker visible in logs without overloading the transient path.

Also applies to: 1455-1457, 1914-1937

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1044 -
1058, The class _TransientErrorHandled and its usage should be made generic:
rename it (e.g., _HandledStreamError) and update its docstring to remove
"transient", add attributes for code: str | None and retryable: bool | None
alongside error_msg, and update its __init__ to accept and store these; then
replace all raises of _TransientErrorHandled in _run_stream_attempt and the
handler paths (the places that set ended_with_stream_error and the warning logs)
to pass through code and retryable so the handler can log the specific breaker
(e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure
the handler warning/logging uses the new attributes (error_msg, code, retryable)
instead of assuming transient, and update any references/names accordingly where
the type/class is imported or checked.

1305-1374: Extract the empty-tool-call breaker into a helper.

This works, but it adds another stateful branch to one of the hardest paths in the service. Pulling the detection/logging/abort decision into a small helper would make _run_stream_attempt easier to test and reason about.

As per coding guidelines "Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1305 -
1374, Extract the empty-tool-call detection, logging and abort logic from
_run_stream_attempt into a small helper (e.g. _handle_empty_tool_call) that
takes the context (ctx), the AssistantMessage (sdk_msg), the current
consecutive_empty_tool_calls counter, the conversation state (state), and
required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT,
_CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated
consecutive_empty_tool_calls and a result tuple indicating whether the circuit
breaker fired and the StreamError to yield (or None). Replace the inlined block
in _run_stream_attempt with a call to this helper and use its returned counter
and result to reset counters, log, call _append_error_marker, yield the
StreamError, set ended_with_stream_error, and break exactly as before; preserve
all log messages, error codes ("circuit_breaker_empty_tool_calls"), and
retryable behavior so behavior is unchanged while keeping _run_stream_attempt
shorter and easier to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1914-1937: After rolling back session.messages inside the except
_TransientErrorHandled handler, ensure state.transcript_builder no longer
contains entries from the aborted attempt: either rebuild
state.transcript_builder from the trimmed session.messages or clear it so the
finally block won't upload rolled-back assistant/tool entries; update the logic
in the except block referencing session.messages and state.transcript_builder
(and ensure the finally upload path uses the rebuilt/cleared builder) so resume
transcripts reflect only persisted session.messages.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1044-1058: The class _TransientErrorHandled and its usage should
be made generic: rename it (e.g., _HandledStreamError) and update its docstring
to remove "transient", add attributes for code: str | None and retryable: bool |
None alongside error_msg, and update its __init__ to accept and store these;
then replace all raises of _TransientErrorHandled in _run_stream_attempt and the
handler paths (the places that set ended_with_stream_error and the warning logs)
to pass through code and retryable so the handler can log the specific breaker
(e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure
the handler warning/logging uses the new attributes (error_msg, code, retryable)
instead of assuming transient, and update any references/names accordingly where
the type/class is imported or checked.
- Around line 1305-1374: Extract the empty-tool-call detection, logging and
abort logic from _run_stream_attempt into a small helper (e.g.
_handle_empty_tool_call) that takes the context (ctx), the AssistantMessage
(sdk_msg), the current consecutive_empty_tool_calls counter, the conversation
state (state), and required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT,
_CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated
consecutive_empty_tool_calls and a result tuple indicating whether the circuit
breaker fired and the StreamError to yield (or None). Replace the inlined block
in _run_stream_attempt with a call to this helper and use its returned counter
and result to reset counters, log, call _append_error_marker, yield the
StreamError, set ended_with_stream_error, and break exactly as before; preserve
all log messages, error codes ("circuit_breaker_empty_tool_calls"), and
retryable behavior so behavior is unchanged while keeping _run_stream_attempt
shorter and easier to test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ad236b7-cc96-4ad1-b53b-b12ee90a5bb6

📥 Commits

Reviewing files that changed from the base of the PR and between c224e0d and 0c56fdf.

📒 Files selected for processing (2)
  • autogpt_platform/backend/backend/copilot/prompting.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
✅ Files skipped from review due to trivial changes (1)
  • autogpt_platform/backend/backend/copilot/prompting.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: check API types
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: Seer Code Review
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Do not use duck typing; avoid hasattr(), getattr(), or isinstance() for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no # type: ignore, # noqa, or # pyright: ignore comments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use %s for deferred interpolation in logger.debug() statements; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count), logger.info(f"Processing {count} items"))
Sanitize error paths using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Use transaction=True in Redis pipelines for atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12213
File: autogpt_platform/frontend/src/app/api/openapi.json:9983-9995
Timestamp: 2026-02-27T15:59:00.370Z
Learning: Repo: Significant-Gravitas/AutoGPT PR: 12213 — OpenAPI/codegen
Learning: Ensuring a field is required in generated TS types needs two sides: (1) no default value on the Pydantic field, and (2) the OpenAPI model's "required" array must list it. For MCPToolInfo, making input_schema required in OpenAPI and removing Field(default_factory=dict) in the backend prevents optional typing drift.
📚 Learning: 2026-03-17T06:48:21.085Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:10.126Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-15T15:30:09.706Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/tools/helpers.py:149-185
Timestamp: 2026-03-15T15:30:09.706Z
Learning: In autogpt_platform/backend/backend/copilot/tools/helpers.py, within execute_block, when InsufficientBalanceError occurs after post-execution credit charging (concurrent balance drain after pre-check passed), this is treated as a non-fatal billing leak: log at ERROR level with structured JSON fields `{"billing_leak": True, "user_id": ..., "cost": ...}` for monitoring/alerting, then return BlockOutputResponse normally. Discarding the output would worsen UX since the block already executed with potential side effects. Reuse the credit_model obtained during the pre-execution balance check (guarded by `if cost > 0 and credit_model:`) for the post-execution charge; do not perform a second get_user_credit_model call.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*.py : Sanitize error paths using `os.path.basename()` in error messages to avoid leaking directory structure

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py

Comment thread autogpt_platform/backend/backend/copilot/prompting.py
majdyz added a commit that referenced this pull request Mar 22, 2026
After rolling back session.messages on transient/stream errors, the
transcript_builder still contains entries from the aborted attempt.
Set skip_transcript_upload=True so the finally block won't upload
stale transcript data that could corrupt future --resume context.

Addresses CodeRabbit review feedback on PR #12499.
@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Mar 22, 2026

🤖 Fixed in c65703a: Added skip_transcript_upload = True after session message rollback in both _TransientErrorHandled and generic Exception handlers, so the finally block won't upload stale transcript data from the aborted attempt.

majdyz added 9 commits March 24, 2026 08:26
- Simplify heredoc example to use conventional EOF delimiter
- Reduce log noise: full diagnostics only on first empty tool call
Sentry found that @@agptfile: expansion mutates args between the
_check_circuit_breaker call (pre-expansion) and _record_tool_failure
call (post-expansion), causing mismatched fingerprints. Fix by saving
original_args before expansion and using it consistently for both
check and record operations.
Make the "Writing large files" prompt more specific:
- Call out the research-compilation scenario explicitly
- Lower the threshold guidance to ~2000 words
- Show section-by-section pattern with cat > / cat >>
- Explain that each bash_exec should contain one section
After rolling back session.messages on transient/stream errors, the
transcript_builder still contains entries from the aborted attempt.
Set skip_transcript_upload=True so the finally block won't upload
stale transcript data that could corrupt future --resume context.

Addresses CodeRabbit review feedback on PR #12499.
The system prompt example for writing large files implied the LLM should
generate content from scratch. Updated to make clear it should use data
already collected from prior tool calls (web_fetch, run_block, etc.)
rather than re-generating or re-fetching.
…writes

When tool outputs are already in files, the LLM can compose a report in a
single bash_exec call using @@agptfile: references instead of writing
section-by-section. The section-by-section approach is now documented as
a fallback for when content must be generated from conversation context.
Empty input ({}) tool calls are never legitimate — they indicate context
saturation or serialization failure. 3 consecutive empty calls is
conclusive; no need to wait for 6.
…or and extract empty-tool-call breaker

- Renamed _TransientErrorHandled to _HandledStreamError with code/retryable
  attributes so the handler can log specific error types without assuming transient
- Extracted inline empty-tool-call detection into _check_empty_tool_breaker helper
  with _EmptyToolBreakResult dataclass, reducing _run_stream_attempt complexity
The tool-level circuit breaker's failure counters persisted across retry
attempts, causing counts from a rolled-back attempt to carry over and
prematurely trip the breaker on a fresh attempt with reduced context.
@majdyz majdyz force-pushed the fix/copilot-empty-tool-params branch from a9e2fed to 56729f7 Compare March 24, 2026 01:27
Copy link
Copy Markdown
Contributor

@Abhi1992002 Abhi1992002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@majdyz majdyz added this pull request to the merge queue Mar 24, 2026
@github-project-automation github-project-automation Bot moved this from 🆕 Needs initial review to 👍🏼 Mergeable in AutoGPT development kanban Mar 24, 2026
Merged via the queue into dev with commit 3d4fcfa Mar 24, 2026
26 checks passed
@majdyz majdyz deleted the fix/copilot-empty-tool-params branch March 24, 2026 05:58
@github-project-automation github-project-automation Bot moved this from 👍🏼 Mergeable to ✅ Done in AutoGPT development kanban Mar 24, 2026
@sentry
Copy link
Copy Markdown

sentry Bot commented Mar 26, 2026

Issues attributed to commits in this pull request

This pull request was merged and Sentry observed the following issues:

Bentlybro pushed a commit that referenced this pull request Apr 4, 2026
#12507)

## Summary
- Adds `/pr-test` skill for automated E2E testing of PRs using docker
compose, agent-browser, and API calls
- Covers full environment setup (copy .env, configure copilot auth,
ARM64 Docker fix)
- Includes browser UI testing, direct API testing, screenshot capture,
and test report generation
- Has `--fix` mode for auto-fixing bugs found during testing (similar to
`/pr-address`)
- **Screenshot uploads use GitHub Git API** (blobs → tree → commit →
ref) — no local git operations, safe for worktrees
- **Subscription mode improvements:**
- Extract subscription auth logic to `sdk/subscription.py` — uses SDK's
bundled CLI binary instead of requiring `npm install -g
@anthropic-ai/claude-code`
- Auto-provision `~/.claude/.credentials.json` from
`CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude
login` needed in Docker
- Add `scripts/refresh_claude_token.sh` — cross-platform helper
(macOS/Linux/Windows) to extract OAuth tokens from host and update
`backend/.env`

## Test plan
- [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500,
#12501, #12440, #12472) — all test scenarios passed
- [x] Confirmed screenshot upload via GitHub Git API renders correctly
on all 7 PRs
- [x] Verified subscription mode E2E in Docker:
`refresh_claude_token.sh` → `docker compose up` → copilot chat responds
correctly with no API keys (pure OAuth subscription)
- [x] Verified auto-provisioning of credentials file inside container
from `CLAUDE_CODE_OAUTH_TOKEN` env var
- [x] Confirmed bundled CLI detection
(`claude_agent_sdk._bundled/claude`) works without system-installed
`claude`
- [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24
tests pass
majdyz added a commit that referenced this pull request Apr 30, 2026
The empty-tool-call detector treats any ``ToolUseBlock`` whose ``input``
is falsy as a saturation-failure marker (model emits ``{}`` because
context-saturation broke argument serialization, the original bug fixed
in #12499). But ``get_agent_building_guide`` and ``get_mcp_guide`` both
declare ``properties: {}, required: []`` — the model legitimately calls
them with empty input. The detector was counting those as failures and
spamming Sentry warnings on every long-running session that touched
either tool.

Fix: cache a frozenset of tool names whose schema is fully no-arg
(both bare and ``mcp__copilot__``-prefixed forms because
``ToolUseBlock.name`` carries the MCP prefix), and exclude those from
the breaker's empty-tool list. Counter still trips for any tool that
DOES require args but receives ``{}`` input (the saturation failure
mode the breaker exists for).

Tests:
- ``test_no_arg_tool_names_includes_mcp_prefixed_form`` — pins the
  registry-derived set against the actual prefix the SDK reports.
- ``test_no_arg_tool_with_empty_input_does_not_trip_counter`` —
  consecutive counter resets to 0 even when starting at 4 (just-below
  trip threshold), proving the false-positive can't trip a long session
  even after many no-arg calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end size/l

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants