fix(backend): add circuit breaker for infinite tool call retry loops#12499
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds two circuit-breakers: one in the SDK streaming path to detect repeated empty Assistant tool calls and abort the stream with a retryable StreamError; and one in the tool adapter that tracks per-execution-context consecutive tool failures (keyed by tool+args) and short-circuits repeated failures. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as SDK Client
participant Stream as Service Stream Loop
participant Detector as Empty Tool Detector
participant Counter as consecutive_empty_tool_calls
participant Error as Stream Error Handler
Client->>Stream: start streaming
Stream->>Client: receive AssistantMessage
Stream->>Detector: inspect AssistantMessage for ToolUseBlock input
alt empty/falsey tool input
Detector->>Counter: increment
Counter->>Detector: value
alt value < _EMPTY_TOOL_CALL_LIMIT
Detector->>Stream: continue streaming
else value >= _EMPTY_TOOL_CALL_LIMIT
Detector->>Error: append retryable session error marker (error_msg)
Error->>Stream: yield StreamError(code="circuit_breaker_empty_tool_calls")
Stream->>Stream: break (terminate attempt)
end
else non-empty tool or no tool
Detector->>Counter: reset to 0
Stream->>Client: continue streaming
end
sequenceDiagram
participant Handler as Tool Handler
participant ExecCtx as Execution Context
participant Breaker as _check_circuit_breaker
participant Tracker as _consecutive_tool_failures
Handler->>ExecCtx: initialize ContextVar tracker
Handler->>Breaker: _check_circuit_breaker(tool_name, args)
alt breaker tripped
Breaker->>Handler: return MCP STOP error (short-circuit)
Handler->>Handler: return early with MCP error marker
else not tripped
Handler->>Handler: expand refs / call tool
alt success (no MCP error)
Handler->>Tracker: _clear_tool_failures(tool_name)
else failure (FileRefExpansionError or MCP error)
Handler->>Tracker: _record_tool_failure(tool_name, args)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 2 conflict(s), 0 medium risk, 2 low risk (out of 4 PRs with file overlap) Auto-generated on push. Ignores: |
There was a problem hiding this comment.
🧹 Nitpick comments (5)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)
1304-1323: Extract duplicated error message to a constant.The same error message string appears twice (lines 1307-1311 and 1315-1320). Extracting it would prevent divergence and simplify maintenance.
Suggested refactor
+_CIRCUIT_BREAKER_ERROR_MSG = ( + "AutoPilot was unable to complete the tool call " + "— this usually happens when the response is " + "too large to fit in a single tool call. " + "Try breaking your request into smaller parts." +) + # In the detection block: _append_error_marker( ctx.session, - ( - "AutoPilot was unable to complete the tool call " - "— this usually happens when the response is " - "too large to fit in a single tool call. " - "Try breaking your request into smaller parts." - ), + _CIRCUIT_BREAKER_ERROR_MSG, retryable=True, ) yield StreamError( - errorText=( - "AutoPilot was unable to complete the tool " - "call — this usually happens when the response " - "is too large to fit in a single tool call. " - "Try breaking your request into smaller parts." - ), + errorText=_CIRCUIT_BREAKER_ERROR_MSG, code="circuit_breaker_empty_tool_calls", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1304 - 1323, Duplicate multi-sentence error text is used in the call to _append_error_marker(...) and in the StreamError(...) construction; extract that message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to that constant, keeping the same wording, and leave the surrounding calls to _append_error_marker, StreamError, and the ended_with_stream_error assignment unchanged.
1075-1079: Consider moving_EMPTY_TOOL_CALL_LIMITto module level for consistency.This constant is defined inside
_run_stream_attempt, but similar constants like_MAX_STREAM_ATTEMPTS(line 107) and_MAX_CONSECUTIVE_TOOL_FAILURESintool_adapter.pyare module-level. Moving it would improve discoverability and allow reuse if needed.Suggested refactor
_MAX_STREAM_ATTEMPTS = 3 + +# Hard circuit breaker: abort the stream if the model sends too many +# consecutive tool calls with empty parameters (a sign of context +# saturation or serialization failure). +_EMPTY_TOOL_CALL_LIMIT = 6Then remove lines 1075-1079 and reference the module constant.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1075 - 1079, Move the _EMPTY_TOOL_CALL_LIMIT constant out of the _run_stream_attempt function and define it at module level (alongside _MAX_STREAM_ATTEMPTS and other top-level constants) so it’s discoverable and reusable; then remove the in-function declaration and update any references inside _run_stream_attempt (and related logic) to use the new module-level _EMPTY_TOOL_CALL_LIMIT constant.autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (2)
273-282: Consider documenting the "clear all args" behavior.
_clear_tool_failuresremoves all failure entries for a tool regardless of args, which means success withargs_Bclears failures forargs_A. This is a reasonable design choice for the bug scenario (repeated identical empty args), but it could allow a specific failing pattern to retry indefinitely if interleaved with different successful calls.A brief docstring explaining this trade-off would help future maintainers.
Suggested docstring enhancement
def _clear_tool_failures(tool_name: str) -> None: - """Clear failure tracking for a tool on success.""" + """Clear failure tracking for a tool on success. + + Clears ALL args variants for the tool, not just the successful call's args. + This gives the tool a "fresh start" on any success, which is appropriate for + the primary use case (detecting infinite loops with identical failing args). + Trade-off: interleaved successes with different args could reset a failing pattern. + """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines 273 - 282, The _clear_tool_failures function currently clears all failure entries for a given tool name regardless of argument variants, which can allow a failing arg pattern to be retried if any other arg succeeds; update the docstring for _clear_tool_failures to explicitly state that it clears all entries for tool_name (not per-args), mention the trade-off (simpler recovery vs. possible indefinite retry of specific failing args), and note where _consecutive_tool_failures and keys_to_remove are used so maintainers can consider switching to per-arg clearing if needed.
436-440: Redundant circuit breaker check for BaseTool handlers.The
_truncatingwrapper adds a circuit breaker check, butcreate_tool_handler(which it wraps for BaseTool handlers) already has the same check at line 294. This means BaseTool handlers are checked twice.While harmless (first check trips, second never reached), this is redundant. Consider either:
- Removing the check from
create_tool_handlersince all handlers go through_truncating- Documenting that
_truncatingis the authoritative check pointOption 1: Remove from create_tool_handler
async def tool_handler(args: dict[str, Any]) -> dict[str, Any]: """Execute the wrapped tool and return MCP-formatted response.""" - # Circuit breaker: stop infinite retry loops - stop_msg = _check_circuit_breaker(base_tool.name, args) - if stop_msg: - return _mcp_error(stop_msg) - user_id, session = get_execution_context()The
_truncatingwrapper already handles the check for all tools uniformly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py` around lines 436 - 440, There is a redundant circuit breaker invocation: both the _truncating wrapper and create_tool_handler call _check_circuit_breaker and return _mcp_error, causing double-checks for BaseTool handlers; remove the duplicate check from create_tool_handler (delete the _check_circuit_breaker call and its _mcp_error return in create_tool_handler) so that _truncating is the single authoritative circuit-breaker entry point for all tools (leave _truncating, _check_circuit_breaker, and _mcp_error unchanged).autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (1)
44-53: Consider adding a test for different tool names being tracked separately.The test suite covers different args for the same tool, but doesn't explicitly verify that different tool names have independent counters.
Suggested additional test
def test_different_tools_tracked_separately(self): """Different tools should have separate failure counters.""" args = {"file_path": "/tmp/test.txt"} for _ in range(_MAX_CONSECUTIVE_TOOL_FAILURES): _record_tool_failure("tool_a", args) # tool_a should trip assert _check_circuit_breaker("tool_a", args) is not None # tool_b with same args should NOT trip assert _check_circuit_breaker("tool_b", args) is None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py` around lines 44 - 53, Add a new unit test (e.g., test_different_tools_tracked_separately) that uses the same args dict but records failures for one tool name via _record_tool_failure("tool_a", args) for _MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that _check_circuit_breaker("tool_a", args) is tripped (not None), and then assert that _check_circuit_breaker("tool_b", args) with the same args is not tripped (is None), verifying tool-name isolation of the failure counters.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1304-1323: Duplicate multi-sentence error text is used in the call
to _append_error_marker(...) and in the StreamError(...) construction; extract
that message into a single constant (e.g., TOOL_CALL_TOO_LARGE_ERROR or
TOOL_CALL_TOO_LARGE_MSG) and replace both inline strings with a reference to
that constant, keeping the same wording, and leave the surrounding calls to
_append_error_marker, StreamError, and the ended_with_stream_error assignment
unchanged.
- Around line 1075-1079: Move the _EMPTY_TOOL_CALL_LIMIT constant out of the
_run_stream_attempt function and define it at module level (alongside
_MAX_STREAM_ATTEMPTS and other top-level constants) so it’s discoverable and
reusable; then remove the in-function declaration and update any references
inside _run_stream_attempt (and related logic) to use the new module-level
_EMPTY_TOOL_CALL_LIMIT constant.
In `@autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py`:
- Around line 44-53: Add a new unit test (e.g.,
test_different_tools_tracked_separately) that uses the same args dict but
records failures for one tool name via _record_tool_failure("tool_a", args) for
_MAX_CONSECUTIVE_TOOL_FAILURES iterations, assert that
_check_circuit_breaker("tool_a", args) is tripped (not None), and then assert
that _check_circuit_breaker("tool_b", args) with the same args is not tripped
(is None), verifying tool-name isolation of the failure counters.
In `@autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`:
- Around line 273-282: The _clear_tool_failures function currently clears all
failure entries for a given tool name regardless of argument variants, which can
allow a failing arg pattern to be retried if any other arg succeeds; update the
docstring for _clear_tool_failures to explicitly state that it clears all
entries for tool_name (not per-args), mention the trade-off (simpler recovery
vs. possible indefinite retry of specific failing args), and note where
_consecutive_tool_failures and keys_to_remove are used so maintainers can
consider switching to per-arg clearing if needed.
- Around line 436-440: There is a redundant circuit breaker invocation: both the
_truncating wrapper and create_tool_handler call _check_circuit_breaker and
return _mcp_error, causing double-checks for BaseTool handlers; remove the
duplicate check from create_tool_handler (delete the _check_circuit_breaker call
and its _mcp_error return in create_tool_handler) so that _truncating is the
single authoritative circuit-breaker entry point for all tools (leave
_truncating, _check_circuit_breaker, and _mcp_error unchanged).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a00d8554-76da-4c9d-84a9-7bccff354ed9
📒 Files selected for processing (3)
autogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.pyautogpt_platform/backend/backend/copilot/sdk/tool_adapter.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: end-to-end tests
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.12)
- GitHub Check: type-check (3.11)
- GitHub Check: test (3.11)
- GitHub Check: Analyze (python)
- GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (4)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Do not use duck typing; avoidhasattr(),getattr(), orisinstance()for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no# type: ignore,# noqa, or# pyright: ignorecomments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use%sfor deferred interpolation inlogger.debug()statements; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count),logger.info(f"Processing {count} items"))
Sanitize error paths usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Usetransaction=Truein Redis pipelines for atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...
Files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/backend/**/*test*.py
📄 CodeRabbit inference engine (AGENTS.md)
Run
poetry run testfor backend testing (runs pytest with docker based postgres + prisma)
Files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
autogpt_platform/backend/**/test_*.py
📄 CodeRabbit inference engine (autogpt_platform/CLAUDE.md)
Backend test files should use
@pytest.mark.xfailto mark failing tests during TDD
Files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
🧠 Learnings (14)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.pyautogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-17T06:48:21.085Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:10.126Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Mock at boundaries — mock where the symbol is **used**, not where it's **defined**. After refactoring, update mock targets to match new module paths.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : Use pytest with snapshot testing for API responses; colocate test files with source files using `*_test.py` naming convention
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:29:57.525Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Follow Test-Driven Development (TDD): Write failing test first marked with `pytest.mark.xfail` (backend) or `.fixme` (Playwright), implement the fix, then remove the marker
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-20T09:29:57.525Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:29:57.525Z
Learning: Applies to autogpt_platform/backend/**/test_*.py : Backend test files should use `pytest.mark.xfail` to mark failing tests during TDD
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
📚 Learning: 2026-03-19T15:10:50.676Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12483
File: autogpt_platform/backend/backend/copilot/tools/test_dry_run.py:298-303
Timestamp: 2026-03-19T15:10:50.676Z
Learning: When using Python’s `unittest.mock.patch` in tests, choose the patch target based on how the imported name is resolved:
- If the code under test uses an **eager/module-level import** (e.g., `from foo.bar import baz` at module top), patch **the module where the name is looked up** (i.e., where it is used in the SUT), e.g. `patch("mymodule.baz")`.
- If the code under test uses a **lazy import** executed later (e.g., `from foo.bar import baz` inside a function/branch), patch **the source module** (e.g., `patch("foo.bar.baz")`) because the late `from ... import` will read the (potentially patched) name from the source module at call time.
For a concrete example: if `simulate_block` is imported inside an `if dry_run:` block in the SUT, then the correct test patch target is the source module path for `simulate_block` as it exists at call time (e.g., `patch("backend.executor.simulator.simulate_block")`), not the test file’s import location.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py
🔇 Additional comments (6)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
1281-1328: LGTM! Circuit breaker implementation is sound.The detection logic correctly:
- Identifies empty tool calls via
not b.input(catches both{}andNone)- Tracks consecutive occurrences with proper reset on non-empty messages
- Uses
>=threshold comparison- Sets appropriate error state and breaks cleanly
The placement after compaction handling and before dispatch is appropriate.
autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (3)
69-77: LGTM! ContextVar-based circuit breaker state is well-designed.The pattern follows the established approach used for other context variables in this module (
_pending_tool_outputs,_stash_event). UsingNoneas default with explicit initialization inset_execution_contextensures proper per-request isolation.
291-315: LGTM! Circuit breaker integration increate_tool_handleris correct.The flow properly:
- Checks circuit breaker before execution
- Records failure on
isErrorresults- Clears failures on success
- Records failure on exceptions before returning error
447-454: LGTM! Failure recording on FileRefExpansionError uses original args correctly.Recording the failure with the original (pre-expansion) args is correct since that's what the model sent and would retry with.
autogpt_platform/backend/backend/copilot/sdk/test_circuit_breaker.py (2)
14-19: LGTM! Fixture correctly uses ContextVar token for isolation.The
_reset_trackerfixture properly initializes and resets the ContextVar using the token pattern, ensuring test isolation.
22-86: LGTM! Test coverage is comprehensive.The tests correctly verify:
- Threshold behavior (below and at limit)
- Per-args isolation
- Empty args tracking (the specific bug scenario)
- Reset/clear behavior
- Uninitialized tracker safety
|
🤖 Addressed CodeRabbit review feedback in c224e0d and previous commits:
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
1914-1937:⚠️ Potential issue | 🟠 MajorSkip or rebuild the resume transcript after this rollback.
Line 1928 trims
session.messages, butstate.transcript_builderstill contains entries from the aborted attempt. Thefinallyblock still uploads that builder, so a circuit-breaker/transient abort can leak rolled-back assistant/tool entries into the next--resumecontext.💡 Minimal fix
session.messages = session.messages[:pre_attempt_msg_count] + # `state.transcript_builder` still reflects the aborted attempt. + # Avoid uploading a resume transcript that no longer matches + # `session.messages`. + skip_transcript_upload = True # Re-append the error marker so it survives the rollbackIf you want to preserve the current user turn in the transcript, rebuild the builder from
session.messagesinstead of skipping upload.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1914 - 1937, After rolling back session.messages inside the except _TransientErrorHandled handler, ensure state.transcript_builder no longer contains entries from the aborted attempt: either rebuild state.transcript_builder from the trimmed session.messages or clear it so the finally block won't upload rolled-back assistant/tool entries; update the logic in the except block referencing session.messages and state.transcript_builder (and ensure the finally upload path uses the rebuilt/cleared builder) so resume transcripts reflect only persisted session.messages.
🧹 Nitpick comments (2)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)
1044-1058: Make this handled-error path generic, not “transient.”Because Line 1455 now raises
_TransientErrorHandledfor anyended_with_stream_error, the class/docstring and the warning in this handler will also labelcircuit_breaker_empty_tool_callsas transient. Renaming it and carryingcode/retryablealongsideerror_msgwould keep the new breaker visible in logs without overloading the transient path.Also applies to: 1455-1457, 1914-1937
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1044 - 1058, The class _TransientErrorHandled and its usage should be made generic: rename it (e.g., _HandledStreamError) and update its docstring to remove "transient", add attributes for code: str | None and retryable: bool | None alongside error_msg, and update its __init__ to accept and store these; then replace all raises of _TransientErrorHandled in _run_stream_attempt and the handler paths (the places that set ended_with_stream_error and the warning logs) to pass through code and retryable so the handler can log the specific breaker (e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure the handler warning/logging uses the new attributes (error_msg, code, retryable) instead of assuming transient, and update any references/names accordingly where the type/class is imported or checked.
1305-1374: Extract the empty-tool-call breaker into a helper.This works, but it adds another stateful branch to one of the hardest paths in the service. Pulling the detection/logging/abort decision into a small helper would make
_run_stream_attempteasier to test and reason about.As per coding guidelines "Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 1305 - 1374, Extract the empty-tool-call detection, logging and abort logic from _run_stream_attempt into a small helper (e.g. _handle_empty_tool_call) that takes the context (ctx), the AssistantMessage (sdk_msg), the current consecutive_empty_tool_calls counter, the conversation state (state), and required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT, _CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated consecutive_empty_tool_calls and a result tuple indicating whether the circuit breaker fired and the StreamError to yield (or None). Replace the inlined block in _run_stream_attempt with a call to this helper and use its returned counter and result to reset counters, log, call _append_error_marker, yield the StreamError, set ended_with_stream_error, and break exactly as before; preserve all log messages, error codes ("circuit_breaker_empty_tool_calls"), and retryable behavior so behavior is unchanged while keeping _run_stream_attempt shorter and easier to test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1914-1937: After rolling back session.messages inside the except
_TransientErrorHandled handler, ensure state.transcript_builder no longer
contains entries from the aborted attempt: either rebuild
state.transcript_builder from the trimmed session.messages or clear it so the
finally block won't upload rolled-back assistant/tool entries; update the logic
in the except block referencing session.messages and state.transcript_builder
(and ensure the finally upload path uses the rebuilt/cleared builder) so resume
transcripts reflect only persisted session.messages.
---
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 1044-1058: The class _TransientErrorHandled and its usage should
be made generic: rename it (e.g., _HandledStreamError) and update its docstring
to remove "transient", add attributes for code: str | None and retryable: bool |
None alongside error_msg, and update its __init__ to accept and store these;
then replace all raises of _TransientErrorHandled in _run_stream_attempt and the
handler paths (the places that set ended_with_stream_error and the warning logs)
to pass through code and retryable so the handler can log the specific breaker
(e.g., circuit_breaker_empty_tool_calls) without labeling it transient. Ensure
the handler warning/logging uses the new attributes (error_msg, code, retryable)
instead of assuming transient, and update any references/names accordingly where
the type/class is imported or checked.
- Around line 1305-1374: Extract the empty-tool-call detection, logging and
abort logic from _run_stream_attempt into a small helper (e.g.
_handle_empty_tool_call) that takes the context (ctx), the AssistantMessage
(sdk_msg), the current consecutive_empty_tool_calls counter, the conversation
state (state), and required globals/constants (logger, _EMPTY_TOOL_CALL_LIMIT,
_CIRCUIT_BREAKER_ERROR_MSG, _append_error_marker) and returns the updated
consecutive_empty_tool_calls and a result tuple indicating whether the circuit
breaker fired and the StreamError to yield (or None). Replace the inlined block
in _run_stream_attempt with a call to this helper and use its returned counter
and result to reset counters, log, call _append_error_marker, yield the
StreamError, set ended_with_stream_error, and break exactly as before; preserve
all log messages, error codes ("circuit_breaker_empty_tool_calls"), and
retryable behavior so behavior is unchanged while keeping _run_stream_attempt
shorter and easier to test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1ad236b7-cc96-4ad1-b53b-b12ee90a5bb6
📒 Files selected for processing (2)
autogpt_platform/backend/backend/copilot/prompting.pyautogpt_platform/backend/backend/copilot/sdk/service.py
✅ Files skipped from review due to trivial changes (1)
- autogpt_platform/backend/backend/copilot/prompting.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: check API types
- GitHub Check: end-to-end tests
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: Seer Code Review
- GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...for all Python package commands (install, migrations, tests, linting, formatting)
Use top-level imports only; avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Do not use duck typing; avoidhasattr(),getattr(), orisinstance()for type dispatch; use typed interfaces, unions, or protocols instead
Use Pydantic models over dataclass, namedtuple, or dict for structured data
Do not use linter suppressors; no# type: ignore,# noqa, or# pyright: ignorecomments; fix the type or code instead
Use list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first; avoid deep nesting
Use%sfor deferred interpolation inlogger.debug()statements; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count),logger.info(f"Processing {count} items"))
Sanitize error paths usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (time-of-check-time-of-use) vulnerabilities; avoid check-then-act patterns for file access and credit charging
Usetransaction=Truein Redis pipelines for atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; split by responsibility if a file grows beyond this (extract helpers, models, or sub-modules into new files). Never keep appending to a long file.
Keep functions under ~40 lines; extract named helpers when a function grows longer. Long functions indicate mixed concerns, not just complexity.
Use top-down ordering: define the main/public function or class f...
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12213
File: autogpt_platform/frontend/src/app/api/openapi.json:9983-9995
Timestamp: 2026-02-27T15:59:00.370Z
Learning: Repo: Significant-Gravitas/AutoGPT PR: 12213 — OpenAPI/codegen
Learning: Ensuring a field is required in generated TS types needs two sides: (1) no default value on the Pydantic field, and (2) the OpenAPI model's "required" array must list it. For MCPToolInfo, making input_schema required in OpenAPI and removing Field(default_factory=dict) in the backend prevents optional typing drift.
📚 Learning: 2026-03-17T06:48:21.085Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:21.085Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:10.126Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:10.126Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-15T15:30:09.706Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/tools/helpers.py:149-185
Timestamp: 2026-03-15T15:30:09.706Z
Learning: In autogpt_platform/backend/backend/copilot/tools/helpers.py, within execute_block, when InsufficientBalanceError occurs after post-execution credit charging (concurrent balance drain after pre-check passed), this is treated as a non-fatal billing leak: log at ERROR level with structured JSON fields `{"billing_leak": True, "user_id": ..., "cost": ...}` for monitoring/alerting, then return BlockOutputResponse normally. Discarding the output would worsen UX since the block already executed with potential side effects. Reuse the credit_model obtained during the pre-execution balance check (guarded by `if cost > 0 and credit_model:`) for the post-execution charge; do not perform a second get_user_credit_model call.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-20T09:30:22.638Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/CLAUDE.md:0-0
Timestamp: 2026-03-20T09:30:22.638Z
Learning: Applies to autogpt_platform/backend/**/*.py : Sanitize error paths using `os.path.basename()` in error messages to avoid leaking directory structure
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
After rolling back session.messages on transient/stream errors, the transcript_builder still contains entries from the aborted attempt. Set skip_transcript_upload=True so the finally block won't upload stale transcript data that could corrupt future --resume context. Addresses CodeRabbit review feedback on PR #12499.
|
🤖 Fixed in c65703a: Added |
- Simplify heredoc example to use conventional EOF delimiter - Reduce log noise: full diagnostics only on first empty tool call
Sentry found that @@agptfile: expansion mutates args between the _check_circuit_breaker call (pre-expansion) and _record_tool_failure call (post-expansion), causing mismatched fingerprints. Fix by saving original_args before expansion and using it consistently for both check and record operations.
Make the "Writing large files" prompt more specific: - Call out the research-compilation scenario explicitly - Lower the threshold guidance to ~2000 words - Show section-by-section pattern with cat > / cat >> - Explain that each bash_exec should contain one section
After rolling back session.messages on transient/stream errors, the transcript_builder still contains entries from the aborted attempt. Set skip_transcript_upload=True so the finally block won't upload stale transcript data that could corrupt future --resume context. Addresses CodeRabbit review feedback on PR #12499.
The system prompt example for writing large files implied the LLM should generate content from scratch. Updated to make clear it should use data already collected from prior tool calls (web_fetch, run_block, etc.) rather than re-generating or re-fetching.
…writes When tool outputs are already in files, the LLM can compose a report in a single bash_exec call using @@agptfile: references instead of writing section-by-section. The section-by-section approach is now documented as a fallback for when content must be generated from conversation context.
Empty input ({}) tool calls are never legitimate — they indicate context
saturation or serialization failure. 3 consecutive empty calls is
conclusive; no need to wait for 6.
…or and extract empty-tool-call breaker - Renamed _TransientErrorHandled to _HandledStreamError with code/retryable attributes so the handler can log specific error types without assuming transient - Extracted inline empty-tool-call detection into _check_empty_tool_breaker helper with _EmptyToolBreakResult dataclass, reducing _run_stream_attempt complexity
The tool-level circuit breaker's failure counters persisted across retry attempts, causing counts from a rolled-back attempt to carry over and prematurely trip the breaker on a fresh attempt with reduced context.
a9e2fed to
56729f7
Compare
Issues attributed to commits in this pull requestThis pull request was merged and Sentry observed the following issues:
|
#12507) ## Summary - Adds `/pr-test` skill for automated E2E testing of PRs using docker compose, agent-browser, and API calls - Covers full environment setup (copy .env, configure copilot auth, ARM64 Docker fix) - Includes browser UI testing, direct API testing, screenshot capture, and test report generation - Has `--fix` mode for auto-fixing bugs found during testing (similar to `/pr-address`) - **Screenshot uploads use GitHub Git API** (blobs → tree → commit → ref) — no local git operations, safe for worktrees - **Subscription mode improvements:** - Extract subscription auth logic to `sdk/subscription.py` — uses SDK's bundled CLI binary instead of requiring `npm install -g @anthropic-ai/claude-code` - Auto-provision `~/.claude/.credentials.json` from `CLAUDE_CODE_OAUTH_TOKEN` env var on container startup — no `claude login` needed in Docker - Add `scripts/refresh_claude_token.sh` — cross-platform helper (macOS/Linux/Windows) to extract OAuth tokens from host and update `backend/.env` ## Test plan - [x] Validated skill on multiple PRs (#12482, #12483, #12499, #12500, #12501, #12440, #12472) — all test scenarios passed - [x] Confirmed screenshot upload via GitHub Git API renders correctly on all 7 PRs - [x] Verified subscription mode E2E in Docker: `refresh_claude_token.sh` → `docker compose up` → copilot chat responds correctly with no API keys (pure OAuth subscription) - [x] Verified auto-provisioning of credentials file inside container from `CLAUDE_CODE_OAUTH_TOKEN` env var - [x] Confirmed bundled CLI detection (`claude_agent_sdk._bundled/claude`) works without system-installed `claude` - [x] `poetry run pytest backend/copilot/sdk/service_test.py` — 24/24 tests pass
The empty-tool-call detector treats any ``ToolUseBlock`` whose ``input``
is falsy as a saturation-failure marker (model emits ``{}`` because
context-saturation broke argument serialization, the original bug fixed
in #12499). But ``get_agent_building_guide`` and ``get_mcp_guide`` both
declare ``properties: {}, required: []`` — the model legitimately calls
them with empty input. The detector was counting those as failures and
spamming Sentry warnings on every long-running session that touched
either tool.
Fix: cache a frozenset of tool names whose schema is fully no-arg
(both bare and ``mcp__copilot__``-prefixed forms because
``ToolUseBlock.name`` carries the MCP prefix), and exclude those from
the breaker's empty-tool list. Counter still trips for any tool that
DOES require args but receives ``{}`` input (the saturation failure
mode the breaker exists for).
Tests:
- ``test_no_arg_tool_names_includes_mcp_prefixed_form`` — pins the
registry-derived set against the actual prefix the SDK reports.
- ``test_no_arg_tool_with_empty_input_does_not_trip_counter`` —
consecutive counter resets to 0 even when starting at 4 (just-below
trip threshold), proving the false-positive can't trip a long session
even after many no-arg calls.
Summary
input: {}), aborts the stream entirely with a user-visible error and retry buttonBackground
In session
c5548b48, the model completed all research successfully but then spent 51+ minutes in an infinite loop trying to write output — every tool call was sent withinput: {}(likely due to context saturation preventing argument serialization). 21+ identical failing tool calls with no circuit breaker.Changes
tool_adapter.py: Added_check_circuit_breaker,_record_tool_failure,_clear_tool_failuresfunctions with aContextVar-based tracker. Integrated into bothcreate_tool_handler(BaseTool) and the_truncatingwrapper (all tools).service.py: Added empty-tool-call detection in the main stream loop that counts consecutiveAssistantMessages with emptyToolUseBlock.inputand aborts after the limit.test_circuit_breaker.py: 7 unit tests covering threshold behavior, per-args tracking, reset on success, and uninitialized tracker safety.Test plan
pytest backend/copilot/sdk/test_circuit_breaker.py— 8/8 passing)