fix(backend): add circuit breaker for infinite tool call retry loops by majdyz · Pull Request #12499 · Significant-Gravitas/AutoGPT

majdyz · 2026-03-20T22:21:23Z

Summary

Adds a two-layer circuit breaker to prevent AutoPilot from looping infinitely when tool calls fail with empty parameters
Tool-level: After 3 consecutive identical failures per tool, returns a hard-stop message instructing the model to output content as text instead of retrying
Stream-level: After 6 consecutive empty tool calls (input: {}), aborts the stream entirely with a user-visible error and retry button

Background

In session c5548b48, the model completed all research successfully but then spent 51+ minutes in an infinite loop trying to write output — every tool call was sent with input: {} (likely due to context saturation preventing argument serialization). 21+ identical failing tool calls with no circuit breaker.

Changes

tool_adapter.py: Added _check_circuit_breaker, _record_tool_failure, _clear_tool_failures functions with a ContextVar-based tracker. Integrated into both create_tool_handler (BaseTool) and the _truncating wrapper (all tools).
service.py: Added empty-tool-call detection in the main stream loop that counts consecutive AssistantMessages with empty ToolUseBlock.input and aborts after the limit.
test_circuit_breaker.py: 7 unit tests covering threshold behavior, per-args tracking, reset on success, and uninitialized tracker safety.

Test plan

Unit tests pass (pytest backend/copilot/sdk/test_circuit_breaker.py — 8/8 passing)
Pre-commit hooks pass (Ruff, Black, isort, typecheck all pass)
E2E: CoPilot tool calls work normally (GetCurrentTimeBlock returned 09:16:39 UTC)
E2E: Circuit breaker pass-through verified (successful calls don't trigger breaker)
E2E: Circuit breaker code integrated into tool_adapter truncating wrapper

coderabbitai · 2026-03-20T22:21:45Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds two circuit-breakers: one in the SDK streaming path to detect repeated empty Assistant tool calls and abort the stream with a retryable StreamError; and one in the tool adapter that tracks per-execution-context consecutive tool failures (keyed by tool+args) and short-circuits repeated failures.

Changes

Cohort / File(s)	Summary
Streaming circuit breaker & transient error threading `autogpt_platform/backend/backend/copilot/sdk/service.py`	Tracks `consecutive_empty_tool_calls` during streaming, introduces `_EMPTY_TOOL_CALL_LIMIT` and `_CIRCUIT_BREAKER_ERROR_MSG`, threads `error_msg` through `_TransientErrorHandled`/attempt flow, appends retryable error marker, yields `StreamError(code="circuit_breaker_empty_tool_calls")`, and aborts attempt when limit reached; expands ResultMessage logging.
Per-context tool failure tracker `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`	Adds a `ContextVar`-backed `_consecutive_tool_failures` keyed by `tool_name` + stable JSON of `args`; implements `_check_circuit_breaker`, `_record_tool_failure`, `_clear_tool_failures`; short-circuits tool execution when threshold reached; records failures on expansion/MCP errors and clears on success.
Tests for breaker behavior `autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py`	New pytest suite exercising tracker isolation, threshold / near-threshold behavior, per-args/tool keying (including `{}`), clearing behavior, and uninitialized-tracker robustness.
Prompt guidance doc update `autogpt_platform/backend/backend/copilot/prompting.py`	Adds guidance for generating large files using multiple `bash_exec` chunks and `@@agptfile:/...` rather than single large `write_file`/`bash_exec` calls (documentation only).

Sequence Diagram(s)

sequenceDiagram
    participant Client as SDK Client
    participant Stream as Service Stream Loop
    participant Detector as Empty Tool Detector
    participant Counter as consecutive_empty_tool_calls
    participant Error as Stream Error Handler

    Client->>Stream: start streaming
    Stream->>Client: receive AssistantMessage
    Stream->>Detector: inspect AssistantMessage for ToolUseBlock input
    alt empty/falsey tool input
        Detector->>Counter: increment
        Counter->>Detector: value
        alt value < _EMPTY_TOOL_CALL_LIMIT
            Detector->>Stream: continue streaming
        else value >= _EMPTY_TOOL_CALL_LIMIT
            Detector->>Error: append retryable session error marker (error_msg)
            Error->>Stream: yield StreamError(code="circuit_breaker_empty_tool_calls")
            Stream->>Stream: break (terminate attempt)
        end
    else non-empty tool or no tool
        Detector->>Counter: reset to 0
        Stream->>Client: continue streaming
    end

sequenceDiagram
    participant Handler as Tool Handler
    participant ExecCtx as Execution Context
    participant Breaker as _check_circuit_breaker
    participant Tracker as _consecutive_tool_failures

    Handler->>ExecCtx: initialize ContextVar tracker
    Handler->>Breaker: _check_circuit_breaker(tool_name, args)
    alt breaker tripped
        Breaker->>Handler: return MCP STOP error (short-circuit)
        Handler->>Handler: return early with MCP error marker
    else not tripped
        Handler->>Handler: expand refs / call tool
        alt success (no MCP error)
            Handler->>Tracker: _clear_tool_failures(tool_name)
        else failure (FileRefExpansionError or MCP error)
            Handler->>Tracker: _record_tool_failure(tool_name, args)
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(copilot): remove legacy copilot, add baseline non-SDK mode with tool calling #12276 — overlaps changes in service.py streaming/chat completion flow and stream error/tool-call behavior.
fix(frontend): API stream issues leaking into prompt #12063 — modifies SSE/StreamError surface and normalization, related to appended stream error markers.
hotfix(backend/copilot): capture tool results in transcript #12323 — changes streaming loop handling of tool/assistant outputs; insertion points overlap with these breaker changes.

Suggested reviewers

Swiftyos
Pwuts
ntindle

Poem

🐰 I counted echoes in a stream so wide,

Empty calls piled up like a rising tide.
Six little taps — I pulled the cord,
Wrote a friendly note and stopped the board.
Now quiet flows rest safe inside.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main objective of the changeset: adding a circuit breaker mechanism to prevent infinite tool call retry loops in the backend.
Docstring Coverage	✅ Passed	Docstring coverage is 86.36% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The pull request description clearly explains the two-layer circuit breaker implementation, the motivation from a real session loophole, and the specific changes made to tool_adapter.py, service.py, and test files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/copilot-empty-tool-params

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-20T22:22:10Z

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

feat(backend/copilot): parallel block execution via infrastructure-level pre-launch #12472 (majdyz · updated 13h ago)
- 📁 autogpt_platform/backend/backend/copilot/sdk/
  - service.py (2 conflicts, ~16 lines)
  - tool_adapter.py (1 conflict, ~60 lines)
fix(copilot): resolve OTel context ValueError during stream cancellation #12421 (Otto-AGPT · updated 5d ago)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

fix(backend): register AutoPilot sessions with stream registry for SSE updates #12500 (majdyz · updated just now)
- autogpt_platform/backend/backend/copilot/sdk/service.py: L1930-1937
feat(copilot): add tool/block capability filtering to AutoPilotBlock #12482 (majdyz · updated just now)
- autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py: L72-81, L91-96

Summary: 2 conflict(s), 0 medium risk, 2 low risk (out of 4 PRs with file overlap)

Auto-generated on push. Ignores: openapi.json, lock files.

coderabbitai

🧹 Nitpick comments (5)

autogpt_platform/backend/backend/copilot/sdk/service.py (2)

1304-1323: Extract duplicated error message to a constant.

The same error message string appears twice (lines 1307-1311 and 1315-1320). Extracting it would prevent divergence and simplify maintenance.

Suggested refactor

+_CIRCUIT_BREAKER_ERROR_MSG = (
+    "AutoPilot was unable to complete the tool call "
+    "— this usually happens when the response is "
+    "too large to fit in a single tool call. "
+    "Try breaking your request into smaller parts."
+)
+
 # In the detection block:
 _append_error_marker(
     ctx.session,
-    (
-        "AutoPilot was unable to complete the tool call "
-        "— this usually happens when the response is "
-        "too large to fit in a single tool call. "
-        "Try breaking your request into smaller parts."
-    ),
+    _CIRCUIT_BREAKER_ERROR_MSG,
     retryable=True,
 )
 yield StreamError(
-    errorText=(
-        "AutoPilot was unable to complete the tool "
-        "call — this usually happens when the response "
-        "is too large to fit in a single tool call. "
-        "Try breaking your request into smaller parts."
-    ),
+    errorText=_CIRCUIT_BREAKER_ERROR_MSG,
     code="circuit_breaker_empty_tool_calls",
 )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1304 -
1323, Duplicate multi-sentence error text is used in the call to
_append_error_marker(...) and in the StreamError(...) construction; extract that
message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or
TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to
that constant, keeping the same wording, and leave the surrounding calls to
_append_error_marker, StreamError, and the ended_with_stream_error assignment
unchanged.

1075-1079: Consider moving _EMPTY_TOOL_CALL_LIMIT to module level for consistency.

This constant is defined inside _run_stream_attempt, but similar constants like _MAX_STREAM_ATTEMPTS (line 107) and _MAX_CONSECUTIVE_TOOL_FAILURES in tool_adapter.py are module-level. Moving it would improve discoverability and allow reuse if needed.

Suggested refactor

 _MAX_STREAM_ATTEMPTS = 3
+
+# Hard circuit breaker: abort the stream if the model sends too many
+# consecutive tool calls with empty parameters (a sign of context
+# saturation or serialization failure).
+_EMPTY_TOOL_CALL_LIMIT = 6

Then remove lines 1075-1079 and reference the module constant.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1075 -
1079, Move the _EMPTY_TOOL_CALL_LIMIT constant out of the _run_stream_attempt
function and define it at module level (alongside _MAX_STREAM_ATTEMPTS and other
top-level constants) so it’s discoverable and reusable; then remove the
in-function declaration and update any references inside _run_stream_attempt
(and related logic) to use the new module-level _EMPTY_TOOL_CALL_LIMIT constant.

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (2)

273-282: Consider documenting the "clear all args" behavior.

_clear_tool_failures removes all failure entries for a tool regardless of args, which means success with args_B clears failures for args_A. This is a reasonable design choice for the bug scenario (repeated identical empty args), but it could allow a specific failing pattern to retry indefinitely if interleaved with different successful calls.

A brief docstring explaining this trade-off would help future maintainers.
Suggested docstring enhancement
 def _clear_tool_failures(tool_name: str) -> None:
-    """Clear failure tracking for a tool on success."""
+    """Clear failure tracking for a tool on success.
+
+    Clears ALL args variants for the tool, not just the successful call's args.
+    This gives the tool a "fresh start" on any success, which is appropriate for
+    the primary use case (detecting infinite loops with identical failing args).
+    Trade-off: interleaved successes with different args could reset a failing pattern.
+    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines
273 - 282, The _clear_tool_failures function currently clears all failure
entries for a given tool name regardless of argument variants, which can allow a
failing arg pattern to be retried if any other arg succeeds; update the
docstring for _clear_tool_failures to explicitly state that it clears all
entries for tool_name (not per-args), mention the trade-off (simpler recovery
vs. possible indefinite retry of specific failing args), and note where
_consecutive_tool_failures and keys_to_remove are used so maintainers can
consider switching to per-arg clearing if needed.
436-440: Redundant circuit breaker check for BaseTool handlers.

The _truncating wrapper adds a circuit breaker check, but create_tool_handler (which it wraps for BaseTool handlers) already has the same check at line 294. This means BaseTool handlers are checked twice.

While harmless (first check trips, second never reached), this is redundant. Consider either:

Removing the check from create_tool_handler since all handlers go through _truncating

Documenting that _truncating is the authoritative check point
Option 1: Remove from create_tool_handler
 async def tool_handler(args: dict[str, Any]) -> dict[str, Any]:
     """Execute the wrapped tool and return MCP-formatted response."""
-    # Circuit breaker: stop infinite retry loops
-    stop_msg = _check_circuit_breaker(base_tool.name, args)
-    if stop_msg:
-        return _mcp_error(stop_msg)
-
     user_id, session = get_execution_context()
The _truncating wrapper already handles the check for all tools uniformly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines
436 - 440, There is a redundant circuit breaker invocation: both the _truncating
wrapper and create_tool_handler call _check_circuit_breaker and return
_mcp_error, causing double-checks for BaseTool handlers; remove the duplicate
check from create_tool_handler (delete the _check_circuit_breaker call and its
_mcp_error return in create_tool_handler) so that _truncating is the single
authoritative circuit-breaker entry point for all tools (leave _truncating,
_check_circuit_breaker, and _mcp_error unchanged).

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (1)

44-53: Consider adding a test for different tool names being tracked separately.

The test suite covers different args for the same tool, but doesn't explicitly verify that different tool names have independent counters.

Suggested additional test

def test_different_tools_tracked_separately(self):
    """Different tools should have separate failure counters."""
    args = {"file_path": "/tmp/test.txt"}
    for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES):
        _record_tool_failure("tool_a", args)
    # tool_a should trip
    assert _check_circuit_breaker("tool_a", args) is not None
    # tool_b with same args should NOT trip
    assert _check_circuit_breaker("tool_b", args) is None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py` around
lines 44 - 53, Add a new unit test (e.g.,
test_different_tools_tracked_separately) that uses the same args dict but
records failures for one tool name via _record_tool_failure("tool_a", args) for
_MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that
_check_circuit_breaker("tool_a", args) is tripped (not None), and then assert
that _check_circuit_breaker("tool_b", args) with the same args is not tripped
(is None), verifying tool-name isolation of the failure counters.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1304-1323: Duplicate multi-sentence error text is used in the call
to _append_error_marker(...) and in the StreamError(...) construction; extract
that message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or
TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to
that constant, keeping the same wording, and leave the surrounding calls to
_append_error_marker, StreamError, and the ended_with_stream_error assignment
unchanged.
- Around line 1075-1079: Move the _EMPTY_TOOL_CALL_LIMIT constant out of the
_run_stream_attempt function and define it at module level (alongside
_MAX_STREAM_ATTEMPTS and other top-level constants) so it’s discoverable and
reusable; then remove the in-function declaration and update any references
inside _run_stream_attempt (and related logic) to use the new module-level
_EMPTY_TOOL_CALL_LIMIT constant.

In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py`:
- Around line 44-53: Add a new unit test (e.g.,
test_different_tools_tracked_separately) that uses the same args dict but
records failures for one tool name via _record_tool_failure("tool_a", args) for
_MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that
_check_circuit_breaker("tool_a", args) is tripped (not None), and then assert
that _check_circuit_breaker("tool_b", args) with the same args is not tripped
(is None), verifying tool-name isolation of the failure counters.

In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`:
- Around line 273-282: The _clear_tool_failures function currently clears all
failure entries for a given tool name regardless of argument variants, which can
allow a failing arg pattern to be retried if any other arg succeeds; update the
docstring for _clear_tool_failures to explicitly state that it clears all
entries for tool_name (not per-args), mention the trade-off (simpler recovery
vs. possible indefinite retry of specific failing args), and note where
_consecutive_tool_failures and keys_to_remove are used so maintainers can
consider switching to per-arg clearing if needed.
- Around line 436-440: There is a redundant circuit breaker invocation: both the
_truncating wrapper and create_tool_handler call _check_circuit_breaker and
return _mcp_error, causing double-checks for BaseTool handlers; remove the
duplicate check from create_tool_handler (delete the _check_circuit_breaker call
and its _mcp_error return in create_tool_handler) so that _truncating is the
single authoritative circuit-breaker entry point for all tools (leave
_truncating, _check_circuit_breaker, and _mcp_error unchanged).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a00d8554-76da-4c9d-84a9-7bccff354ed9

📥 Commits

Reviewing files that changed from the base of the PR and between ab7c38b and 993d688.

📒 Files selected for processing (3)

autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: check API types
GitHub Check: Seer Code Review
GitHub Check: end-to-end tests
GitHub Check: test (3.13)
GitHub Check: test (3.12)
GitHub Check: type-check (3.13)
GitHub Check: type-check (3.12)
GitHub Check: type-check (3.11)
GitHub Check: test (3.11)
GitHub Check: Analyze (python)
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (4)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Do not use duck typing; avoid hasattr(), getattr(), or isinstance() for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no # type: ignore, # noqa, or # pyright: ignore comments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use %s for deferred interpolation in logger.debug() statements; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count), logger.info(f"Processing {count} items"))
Sanitize error paths using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Use transaction=True in Redis pipelines for atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...

Files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

autogpt_platform/backend/**/*test*.py

📄 CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

autogpt_platform/backend/**/test_*.py

📄 CodeRabbit inference engine (autogpt_platform/CLAUDE.md)

Backend test files should use @pytest.mark.xfail to mark failing tests during TDD

Files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

🧠 Learnings (14)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

📚 Learning: 2026-03-16T17:00:02.827Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-17T06:48:21.085Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-17T10:57:10.126Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-20T09:30:22.638Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Mock at boundaries — mock where the symbol is **used**, not where it's **defined**. After refactoring, update mock targets to match new module paths.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-20T09:30:22.638Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Use pytest with snapshot testing for API responses; colocate test files with source files using `*_test.py` naming convention

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-20T09:29:57.525Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Follow Test-Driven Development (TDD): Write failing test first marked with `pytest.mark.xfail` (backend) or `.fixme` (Playwright), implement the fix, then remove the marker

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-20T09:29:57.525Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Applies to autogpt_platform/backend/**/test_*.py : Backend test files should use `pytest.mark.xfail` to mark failing tests during TDD

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

📚 Learning: 2026-03-19T15:10:50.676Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12483
File: autogpt_platform/backend/backend/copilot/tools/test_dry_run.py:298-303
Timestamp: 2026-03-19T15:10:50.676Z
Learning: When using Python’s `unittest.mock.patch` in tests, choose the patch target based on how the imported name is resolved:
- If the code under test uses an **eager/module-level import** (e.g., `from foo.bar import baz` at module top), patch **the module where the name is looked up** (i.e., where it is used in the SUT), e.g. `patch("mymodule.baz")`.
- If the code under test uses a **lazy import** executed later (e.g., `from foo.bar import baz` inside a function/branch), patch **the source module** (e.g., `patch("foo.bar.baz")`) because the late `from ... import` will read the (potentially patched) name from the source module at call time.

For a concrete example: if `simulate_block` is imported inside an `if dry_run:` block in the SUT, then the correct test patch target is the source module path for `simulate_block` as it exists at call time (e.g., `patch("backend.executor.simulator.simulate_block")`), not the test file’s import location.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py

🔇 Additional comments (6)

autogpt_platform/backend/backend/copilot/sdk/service.py (1)

1281-1328: LGTM! Circuit breaker implementation is sound.

The detection logic correctly:

Identifies empty tool calls via not b.input (catches both {} and None)

Tracks consecutive occurrences with proper reset on non-empty messages

Uses >= threshold comparison

Sets appropriate error state and breaks cleanly

The placement after compaction handling and before dispatch is appropriate.

autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (3)

69-77: LGTM! ContextVar-based circuit breaker state is well-designed.

The pattern follows the established approach used for other context variables in this module (_pending_tool_outputs, _stash_event). Using None as default with explicit initialization in set_execution_context ensures proper per-request isolation.

291-315: LGTM! Circuit breaker integration in create_tool_handler is correct.

The flow properly:

Checks circuit breaker before execution

Records failure on isError results

Clears failures on success

Records failure on exceptions before returning error

447-454: LGTM! Failure recording on FileRefExpansionError uses original args correctly.

Recording the failure with the original (pre-expansion) args is correct since that's what the model sent and would retry with.

autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (2)

14-19: LGTM! Fixture correctly uses ContextVar token for isolation.

The _reset_tracker fixture properly initializes and resets the ContextVar using the token pattern, ensuring test isolation.

22-86: LGTM! Test coverage is comprehensive.

The tests correctly verify:

Threshold behavior (below and at limit)

Per-args isolation

Empty args tracking (the specific bug scenario)

Reset/clear behavior

Uninitialized tracker safety

majdyz · 2026-03-20T22:50:04Z

🤖 Addressed CodeRabbit review feedback in c224e0d and previous commits:

Extract duplicated error message → Already fixed in 5e8eaee (_CIRCUIT_BREAKER_ERROR_MSG constant)
Move _EMPTY_TOOL_CALL_LIMIT to module level → Already fixed in 5e8eaee
Document _clear_tool_failures behavior → Fixed in c224e0d: enhanced docstring
Redundant circuit breaker in create_tool_handler → Already fixed in 5e8eaee
Add test for different tool names → Fixed in c224e0d: added test_different_tools_tracked_separately

majdyz · 2026-03-21T16:36:24Z

🤖 CodeRabbit review items were already addressed in earlier commits:

Extract duplicated error message → Fixed in 5e8eaee (_CIRCUIT_BREAKER_ERROR_MSG constant)
Move _EMPTY_TOOL_CALL_LIMIT to module level → Fixed in 5e8eaee
3-5. Other items also addressed in previous commits. See commit history.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

autogpt_platform/backend/backend/copilot/sdk/service.py (1)
1914-1937: ⚠️ Potential issue | 🟠 Major

Skip or rebuild the resume transcript after this rollback.

Line 1928 trims session.messages, but state.transcript_builder still contains entries from the aborted attempt. The finally block still uploads that builder, so a circuit-breaker/transient abort can leak rolled-back assistant/tool entries into the next --resume context.
💡 Minimal fix
             session.messages = session.messages[:pre_attempt_msg_count]
+            # `state.transcript_builder` still reflects the aborted attempt.
+            # Avoid uploading a resume transcript that no longer matches
+            # `session.messages`.
+            skip_transcript_upload = True
             # Re-append the error marker so it survives the rollback
If you want to preserve the current user turn in the transcript, rebuild the builder from session.messages instead of skipping upload.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1914 -
1937, After rolling back session.messages inside the except
_TransientErrorHandled handler, ensure state.transcript_builder no longer
contains entries from the aborted attempt: either rebuild
state.transcript_builder from the trimmed session.messages or clear it so the
finally block won't upload rolled-back assistant/tool entries; update the logic
in the except block referencing session.messages and state.transcript_builder
(and ensure the finally upload path uses the rebuilt/cleared builder) so resume
transcripts reflect only persisted session.messages.

🧹 Nitpick comments (2)

autogpt_platform/backend/backend/copilot/sdk/service.py (2)
1044-1058: Make this handled-error path generic, not “transient.”

Because Line 1455 now raises _TransientErrorHandled for any ended_with_stream_error, the class/docstring and the warning in this handler will also label circuit_breaker_empty_tool_calls as transient. Renaming it and carrying code/retryable alongside error_msg would keep the new breaker visible in logs without overloading the transient path.

Also applies to: 1455-1457, 1914-1937
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1044 -
1058, The class _TransientErrorHandled and its usage should be made generic:
rename it (e.g., _HandledStreamError) and update its docstring to remove
"transient", add attributes for code: str | None and retryable: bool | None
alongside error_msg, and update its __init__ to accept and store these; then
replace all raises of _TransientErrorHandled in _run_stream_attempt and the
handler paths (the places that set ended_with_stream_error and the warning logs)
to pass through code and retryable so the handler can log the specific breaker
(e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure
the handler warning/logging uses the new attributes (error_msg, code, retryable)
instead of assuming transient, and update any references/names accordingly where
the type/class is imported or checked.
1305-1374: Extract the empty-tool-call breaker into a helper.

This works, but it adds another stateful branch to one of the hardest paths in the service. Pulling the detection/logging/abort decision into a small helper would make _run_stream_attempt easier to test and reason about.

As per coding guidelines "Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1305 -
1374, Extract the empty-tool-call detection, logging and abort logic from
_run_stream_attempt into a small helper (e.g. _handle_empty_tool_call) that
takes the context (ctx), the AssistantMessage (sdk_msg), the current
consecutive_empty_tool_calls counter, the conversation state (state), and
required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT,
_CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated
consecutive_empty_tool_calls and a result tuple indicating whether the circuit
breaker fired and the StreamError to yield (or None). Replace the inlined block
in _run_stream_attempt with a call to this helper and use its returned counter
and result to reset counters, log, call _append_error_marker, yield the
StreamError, set ended_with_stream_error, and break exactly as before; preserve
all log messages, error codes ("circuit_breaker_empty_tool_calls"), and
retryable behavior so behavior is unchanged while keeping _run_stream_attempt
shorter and easier to test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1914-1937: After rolling back session.messages inside the except
_TransientErrorHandled handler, ensure state.transcript_builder no longer
contains entries from the aborted attempt: either rebuild
state.transcript_builder from the trimmed session.messages or clear it so the
finally block won't upload rolled-back assistant/tool entries; update the logic
in the except block referencing session.messages and state.transcript_builder
(and ensure the finally upload path uses the rebuilt/cleared builder) so resume
transcripts reflect only persisted session.messages.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1044-1058: The class _TransientErrorHandled and its usage should
be made generic: rename it (e.g., _HandledStreamError) and update its docstring
to remove "transient", add attributes for code: str | None and retryable: bool |
None alongside error_msg, and update its __init__ to accept and store these;
then replace all raises of _TransientErrorHandled in _run_stream_attempt and the
handler paths (the places that set ended_with_stream_error and the warning logs)
to pass through code and retryable so the handler can log the specific breaker
(e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure
the handler warning/logging uses the new attributes (error_msg, code, retryable)
instead of assuming transient, and update any references/names accordingly where
the type/class is imported or checked.
- Around line 1305-1374: Extract the empty-tool-call detection, logging and
abort logic from _run_stream_attempt into a small helper (e.g.
_handle_empty_tool_call) that takes the context (ctx), the AssistantMessage
(sdk_msg), the current consecutive_empty_tool_calls counter, the conversation
state (state), and required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT,
_CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated
consecutive_empty_tool_calls and a result tuple indicating whether the circuit
breaker fired and the StreamError to yield (or None). Replace the inlined block
in _run_stream_attempt with a call to this helper and use its returned counter
and result to reset counters, log, call _append_error_marker, yield the
StreamError, set ended_with_stream_error, and break exactly as before; preserve
all log messages, error codes ("circuit_breaker_empty_tool_calls"), and
retryable behavior so behavior is unchanged while keeping _run_stream_attempt
shorter and easier to test.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ad236b7-cc96-4ad1-b53b-b12ee90a5bb6

📥 Commits

Reviewing files that changed from the base of the PR and between c224e0d and 0c56fdf.

📒 Files selected for processing (2)

autogpt_platform/backend/backend/copilot/prompting.py
autogpt_platform/backend/backend/copilot/sdk/service.py

✅ Files skipped from review due to trivial changes (1)

autogpt_platform/backend/backend/copilot/prompting.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: check API types
GitHub Check: end-to-end tests
GitHub Check: test (3.11)
GitHub Check: test (3.12)
GitHub Check: test (3.13)
GitHub Check: Seer Code Review
GitHub Check: Check PR Status

🧰 Additional context used

📓 Path-based instructions (2)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Do not use duck typing; avoid hasattr(), getattr(), or isinstance() for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no # type: ignore, # noqa, or # pyright: ignore comments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use %s for deferred interpolation in logger.debug() statements; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count), logger.info(f"Processing {count} items"))
Sanitize error paths using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Use transaction=True in Redis pipelines for atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/service.py

🧠 Learnings (10)

📓 Common learnings

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12213
File: autogpt_platform/frontend/src/app/api/openapi.json:9983-9995
Timestamp: 2026-02-27T15:59:00.370Z
Learning: Repo: Significant-Gravitas/AutoGPT PR: 12213 — OpenAPI/codegen
Learning: Ensuring a field is required in generated TS types needs two sides: (1) no default value on the Pydantic field, and (2) the OpenAPI model's "required" array must list it. For MCPToolInfo, making input_schema required in OpenAPI and removing Field(default_factory=dict) in the backend prevents optional typing drift.

📚 Learning: 2026-03-17T06:48:21.085Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-17T10:57:10.126Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-16T17:00:02.827Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-15T15:30:09.706Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/tools/helpers.py:149-185
Timestamp: 2026-03-15T15:30:09.706Z
Learning: In autogpt_platform/backend/backend/copilot/tools/helpers.py, within execute_block, when InsufficientBalanceError occurs after post-execution credit charging (concurrent balance drain after pre-check passed), this is treated as a non-fatal billing leak: log at ERROR level with structured JSON fields `{"billing_leak": True, "user_id": ..., "cost": ...}` for monitoring/alerting, then return BlockOutputResponse normally. Discarding the output would worsen UX since the block already executed with potential side effects. Reuse the credit_model obtained during the pre-execution balance check (guarded by `if cost > 0 and credit_model:`) for the post-execution charge; do not perform a second get_user_credit_model call.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-20T09:30:22.638Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*.py : Sanitize error paths using `os.path.basename()` in error messages to avoid leaking directory structure

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

📚 Learning: 2026-03-16T16:35:40.236Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/service.py

After rolling back session.messages on transient/stream errors, the transcript_builder still contains entries from the aborted attempt. Set skip_transcript_upload=True so the finally block won't upload stale transcript data that could corrupt future --resume context. Addresses CodeRabbit review feedback on PR #12499.

majdyz · 2026-03-22T12:33:24Z

🤖 Fixed in c65703a: Added skip_transcript_upload = True after session message rollback in both _TransientErrorHandled and generic Exception handlers, so the finally block won't upload stale transcript data from the aborted attempt.

- Simplify heredoc example to use conventional EOF delimiter - Reduce log noise: full diagnostics only on first empty tool call

Sentry found that @@agptfile: expansion mutates args between the _check_circuit_breaker call (pre-expansion) and _record_tool_failure call (post-expansion), causing mismatched fingerprints. Fix by saving original_args before expansion and using it consistently for both check and record operations.

Make the "Writing large files" prompt more specific: - Call out the research-compilation scenario explicitly - Lower the threshold guidance to ~2000 words - Show section-by-section pattern with cat > / cat >> - Explain that each bash_exec should contain one section

After rolling back session.messages on transient/stream errors, the transcript_builder still contains entries from the aborted attempt. Set skip_transcript_upload=True so the finally block won't upload stale transcript data that could corrupt future --resume context. Addresses CodeRabbit review feedback on PR #12499.

The system prompt example for writing large files implied the LLM should generate content from scratch. Updated to make clear it should use data already collected from prior tool calls (web_fetch, run_block, etc.) rather than re-generating or re-fetching.

…writes When tool outputs are already in files, the LLM can compose a report in a single bash_exec call using @@agptfile: references instead of writing section-by-section. The section-by-section approach is now documented as a fallback for when content must be generated from conversation context.

Empty input ({}) tool calls are never legitimate — they indicate context saturation or serialization failure. 3 consecutive empty calls is conclusive; no need to wait for 6.

…or and extract empty-tool-call breaker - Renamed _TransientErrorHandled to _HandledStreamError with code/retryable attributes so the handler can log specific error types without assuming transient - Extracted inline empty-tool-call detection into _check_empty_tool_breaker helper with _EmptyToolBreakResult dataclass, reducing _run_stream_attempt complexity

The tool-level circuit breaker's failure counters persisted across retry attempts, causing counts from a rolled-back attempt to carry over and prematurely trip the breaker on a fresh attempt with reduced context.

Abhi1992002

lgtm

sentry · 2026-03-26T11:16:12Z

Issues attributed to commits in this pull request

This pull request was merged and Sentry observed the following issues:

‼️ RuntimeError: Attempted to exit cancel scope in a... in app:prod-behave:cloud

#12507) ## Summary - Adds `/pr-test` skill for automated E2E testing of PRs using docker compose, agent-browser, and API calls - Covers full environment setup (copy .env, configure copilot auth, ARM64 Docker fix) - Includes browser UI testing, direct API testing, screenshot capture, and test report generation - Has `--fix` mode for auto-fixing bugs found during testing (similar to `/pr-address`) - **Screenshot uploads use GitHub Git API** (blobs → tree → commit → ref) — no local git operations, safe for worktrees - **Subscription mode improvements:** - Extract subscription auth logic to `sdk/subscription.py` — uses SDK's bundled CLI binary instead of requiring `npm install -g @anthropic-ai/claude-code` - Auto-provision `~/.claude/.credentials.json` from `CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude login` needed in Docker - Add `scripts/refresh_claude_token.sh` — cross-platform helper (macOS/Linux/Windows) to extract OAuth tokens from host and update `backend/.env` ## Test plan - [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500, #12501, #12440, #12472) — all test scenarios passed - [x] Confirmed screenshot upload via GitHub Git API renders correctly on all 7 PRs - [x] Verified subscription mode E2E in Docker: `refresh_claude_token.sh` → `docker compose up` → copilot chat responds correctly with no API keys (pure OAuth subscription) - [x] Verified auto-provisioning of credentials file inside container from `CLAUDE_CODE_OAUTH_TOKEN` env var - [x] Confirmed bundled CLI detection (`claude_agent_sdk._bundled/claude`) works without system-installed `claude` - [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24 tests pass

The empty-tool-call detector treats any ``ToolUseBlock`` whose ``input`` is falsy as a saturation-failure marker (model emits ``{}`` because context-saturation broke argument serialization, the original bug fixed in #12499). But ``get_agent_building_guide`` and ``get_mcp_guide`` both declare ``properties: {}, required: []`` — the model legitimately calls them with empty input. The detector was counting those as failures and spamming Sentry warnings on every long-running session that touched either tool. Fix: cache a frozenset of tool names whose schema is fully no-arg (both bare and ``mcp__copilot__``-prefixed forms because ``ToolUseBlock.name`` carries the MCP prefix), and exclude those from the breaker's empty-tool list. Counter still trips for any tool that DOES require args but receives ``{}`` input (the saturation failure mode the breaker exists for). Tests: - ``test_no_arg_tool_names_includes_mcp_prefixed_form`` — pins the registry-derived set against the actual prefix the SDK reports. - ``test_no_arg_tool_with_empty_input_does_not_trip_counter`` — consecutive counter resets to 0 even when starting at 4 (just-below trip threshold), proving the false-positive can't trip a long session even after many no-arg calls.

majdyz requested a review from a team as a code owner March 20, 2026 22:21

majdyz requested review from Pwuts and Swiftyos and removed request for a team March 20, 2026 22:21

github-project-automation Bot added this to AutoGPT development kanban Mar 20, 2026

github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Mar 20, 2026

github-actions Bot added platform/backend AutoGPT Platform - Back end size/l labels Mar 20, 2026