fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#12918
Conversation
…failure exit SECRT-2275 — when an SDK turn was interrupted (transient API errors with exhausted retries, mid-stream LLM exceptions, or context-overflow with all attempts exhausted) the retry loop's pre-decision rollback discarded the assistant's partial work (text + tool calls + reasoning) that had been incrementally appended to session.messages during the failed attempt. Users described it as "the turn is gone": their UI streamed tokens live, then a refresh showed an empty turn and the next message would prompt the model to "continue" with no context, so it picked an unrelated old task. Fix: capture the rolled-back partial in the retry-loop exception handlers and re-attach it via a single helper on every final-failure branch (including the events_yielded > 0 path that previously skipped the error marker entirely and the non-context-non-transient + attempts-exhausted paths). Synthesize "interrupted" tool_result rows for any orphan tool_use so the next turn's LLM context stays API-valid. Successful retry breaks clear the captured partial so attempt #1's rolled-back content doesn't leak into a successful attempt #2's history. Baseline path already preserves partial via its existing finally block; only SDK was affected.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a test module and refactors SDK streaming/failure handling to capture per-attempt partial assistant/tool output, roll back session/transcript state on interrupted attempts, flush unresolved tool uses into synthetic tool messages, and restore the captured partial once on final failure with a single error marker. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant SDK as SDKService
participant Adapter as ResponseAdapter
participant Session as ChatSession
Client->>SDK: start stream_chat_completion_sdk()
SDK->>Adapter: begin attempt (streaming)
Adapter-->>SDK: assistant/tool messages appended to session
Adapter-->>SDK: report unresolved tool calls
SDK->>SDK: attempt fails (exception)
SDK->>Session: capture rolled-back session messages (partial)
SDK->>SDK: snapshot/restore TranscriptBuilder state
alt Adapter has unresolved tool calls
SDK->>Adapter: flush_unresolved_tool_calls()
Adapter-->>Session: insert synthetic `tool` messages into session partial
Adapter-->>SDK: return same tool-output events to emit
end
alt retries remain
SDK->>SDK: retry loop (partial preserved)
else final failure / no retries
SDK->>Session: restore captured partial into session
SDK->>Session: append single copilot error marker (retryable/non-retryable)
SDK->>Client: emit StreamError if not already yielded
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 5 conflict(s), 0 medium risk, 2 low risk (out of 7 PRs with file overlap) Auto-generated on push. Ignores: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 615-631: The captured partial in
_rollback_attempt_capturing_partial currently slices session.messages from
pre_attempt_msg_count and therefore can include an already-appended error marker
inserted by _run_stream_attempt on _HandledStreamError paths (idle timeout /
empty-tool breaker); update _rollback_attempt_capturing_partial to filter out
any trailing error-marker message(s) when building captured (e.g., drop final
messages that match the error-marker shape/flag) so that
_restore_partial_with_error_marker does not replay stale markers or duplicate
them, while still restoring transcript via
transcript_builder.restore(transcript_snap) and returning only the true
assistant work to be replayed on final failure. Ensure the detection logic
matches whatever marker identity _run_stream_attempt uses (type/flag/content) so
it won't remove legitimate assistant messages.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3cfb1bc1-ba5e-400a-bc3a-8eb568adc248
📒 Files selected for processing (2)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: end-to-end tests
- GitHub Check: test (3.11)
- GitHub Check: test (3.12)
- GitHub Check: test (3.13)
- GitHub Check: type-check (3.13)
- GitHub Check: Check PR Status
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/**/*_test.py
📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)
autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using*_test.pynaming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
UseAsyncMockfromunittest.mockfor async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with@pytest.mark.xfailbefore implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, usepoetry run pytest path/to/test.py --snapshot-update; always review snapshot changes withgit diffbefore committing
Files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
🧠 Learnings (27)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : When writing tests, use Test-Driven Development (TDD): write failing tests marked with `pytest.mark.xfail` before implementation, then remove the marker once the implementation is complete
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : When creating snapshots in tests, use `poetry run pytest path/to/test.py --snapshot-update`; always review snapshot changes with `git diff` before committing
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev #12918 +/- ##
==========================================
+ Coverage 68.23% 68.32% +0.08%
==========================================
Files 1960 1961 +1
Lines 150178 150528 +350
Branches 15621 15639 +18
==========================================
+ Hits 102473 102841 +368
+ Misses 44664 44640 -24
- Partials 3041 3047 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…block type-check The inline _restore_partial_with_error_marker calls across five retry-loop branches pushed stream_chat_completion_sdk past pyright's complexity heuristic (CI type-check failed on main). Consolidate into a single post-loop block keyed off ended_with_stream_error + the existing attempts_exhausted / transient_exhausted / stream_err flags, plus a new handled_error_info tuple that carries _HandledStreamError's final-yield decision out of the retry loop. Behaviour is unchanged — same restore semantics, same client-facing StreamError sequencing, same transcript-upload skip. Confirmed with 319 existing + new tests (backend/copilot/sdk + baseline). Pyright still bails on the function body (1500 LoC — the retry loop with context-overflow fallback + transient backoff + partial-work preservation shares too much state across branches to split cleanly without hurting readability). A file-targeted reportGeneralTypeIssues suppression covers the complexity bailout while keeping real type errors elsewhere in the file surfaced.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
615-631:⚠️ Potential issue | 🟠 MajorPartial still captures pre-appended error markers on
_HandledStreamErrorpaths.
captured = session.messages[pre_attempt_msg_count:]will still include the canonical error marker that_run_stream_attemptalready appended for the idle-timeout (line 2296) and circuit-breaker (line 2162) branches before raising_HandledStreamError(already_yielded=True). The consolidated restore block at lines 4072-4100 then calls_restore_partial_with_error_marker, which:
- Re-extends
session.messageswith the captured partial (re-inserting the stale marker).- Calls
_flush_orphan_tool_uses_to_session— can inject synthesizedtool_resultrows after that stale marker, producing an invalidassistant(error) → tool_resultordering if the next turn replays history beforestream_chat_completion_sdk's start-of-turn marker cleanup runs.- Appends a second copy of the same marker via
_append_error_marker.Result: duplicate error bubbles on idle-timeout / empty-tool-breaker turns, and a transiently malformed sequence if orphans are present. Next-turn cleanup (lines 3117-3125) only trims trailing markers, so a tool_result sandwiched between them leaves the earlier marker in place.
🛠️ Consider stripping trailing markers during capture
def _rollback_attempt_capturing_partial( session: "ChatSession", transcript_builder: "TranscriptBuilder", transcript_snap: object, pre_attempt_msg_count: int, ) -> list[ChatMessage]: @@ - captured = list(session.messages[pre_attempt_msg_count:]) + captured = list(session.messages[pre_attempt_msg_count:]) + while ( + captured + and captured[-1].role == "assistant" + and captured[-1].content + and ( + captured[-1].content.startswith(COPILOT_ERROR_PREFIX) + or captured[-1].content.startswith(COPILOT_RETRYABLE_ERROR_PREFIX) + ) + ): + captured.pop() session.messages = session.messages[:pre_attempt_msg_count] transcript_builder.restore(transcript_snap) # type: ignore[arg-type] return capturedA targeted regression test simulating an idle-timeout / empty-tool-breaker
_HandledStreamErrorafter partial assistant work would lock this down.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 615 - 631, The captured list in _rollback_attempt_capturing_partial is including the canonical error marker appended earlier on _HandledStreamError paths, so change _rollback_attempt_capturing_partial to strip trailing error-marker messages from the captured slice before returning (e.g., trim any trailing assistant error-marker objects from captured by checking the same predicate used when appending markers), ensuring session.messages still rolls back to pre_attempt_msg_count; reference _restore_partial_with_error_marker and _append_error_marker/_HandledStreamError to validate behavior and add a regression test that simulates an idle-timeout/empty-tool-breaker _HandledStreamError after partial assistant output to assert no duplicate or sandwiched error markers are reintroduced.
🧹 Nitpick comments (2)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)
3070-3088: File-level# pyright: ignore[reportGeneralTypeIssues]violates the no-suppressor rule.The rationale in the docstring is understood — the function is large and the retry/finalization state is tightly coupled — but the coding guideline forbids
# pyright: ignoresuppressors. Two viable alternatives:
- Extract the retry-loop body (and the consolidated failure-finalization block at 4068-4100) into a helper that takes
_StreamContext+_RetryStateand returns a small result tuple (final_msg, retryable, handled_error_info, etc.).stream_chat_completion_sdkthen only orchestrates setup/teardown.- Split the finally-block post-turn work (OTEL span teardown, cost reconcile, CLI upload) into a dedicated
_finalize_turnhelper.Either extraction shrinks the type-check surface below pyright's heuristic without losing shared state (most of it already lives on
_RetryState/_StreamContext).As per coding guidelines: "Do not use linter suppressors — no
# type: ignore,# noqa,# pyright: ignore; fix the type/code instead" and "Keep functions under ~40 lines; extract named helpers when a function grows longer".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 3070 - 3088, Remove the file-level "# pyright: ignore[reportGeneralTypeIssues]" on stream_chat_completion_sdk and reduce the function's type complexity by extracting the big retry-loop body and the consolidated failure/finalization block into a typed helper (e.g., _run_stream_retry_cycle) that accepts the existing _StreamContext and _RetryState and returns a small result tuple (final_message, retryable: bool, handled_error_info, etc.); alternatively move the OTEL/span teardown, cost reconcile and CLI upload into a dedicated _finalize_turn helper called from stream_chat_completion_sdk. Ensure the new helpers have precise signatures and return types so Pyright can type-check them and that stream_chat_completion_sdk becomes a slim orchestrator delegating to _run_stream_retry_cycle and/or _finalize_turn.
560-588: Avoid the# noqa: SLF001linter suppressor here.Line 577 suppresses ruff's private-member access warning to reach
state.adapter._flush_unresolved_tool_calls. The coding guideline explicitly forbids# noqa/# type: ignore/# pyright: ignorecomments — the correct fix is to expose a public adapter method (e.g.flush_unresolved_tool_calls) and update the existing call sites at lines 2801 and 4115 to use it as well (those sites access the same private without the suppressor today, so they'd also become compliant).As per coding guidelines: "Do not use linter suppressors — no
# type: ignore,# noqa,# pyright: ignore; fix the type/code instead".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 560 - 588, Replace the private call to state.adapter._flush_unresolved_tool_calls (and remove the "# noqa: SLF001") by adding a public adapter method flush_unresolved_tool_calls that preserves the same behavior and typing (e.g., returns a list[StreamBaseResponse] or accepts a mutable list to populate), then call state.adapter.flush_unresolved_tool_calls() from _flush_orphan_tool_uses_to_session instead of the private member; also update the other sites that currently call state.adapter._flush_unresolved_tool_calls to use the new public flush_unresolved_tool_calls so no linter suppressors are needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 615-631: The captured list in _rollback_attempt_capturing_partial
is including the canonical error marker appended earlier on _HandledStreamError
paths, so change _rollback_attempt_capturing_partial to strip trailing
error-marker messages from the captured slice before returning (e.g., trim any
trailing assistant error-marker objects from captured by checking the same
predicate used when appending markers), ensuring session.messages still rolls
back to pre_attempt_msg_count; reference _restore_partial_with_error_marker and
_append_error_marker/_HandledStreamError to validate behavior and add a
regression test that simulates an idle-timeout/empty-tool-breaker
_HandledStreamError after partial assistant output to assert no duplicate or
sandwiched error markers are reintroduced.
---
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 3070-3088: Remove the file-level "# pyright:
ignore[reportGeneralTypeIssues]" on stream_chat_completion_sdk and reduce the
function's type complexity by extracting the big retry-loop body and the
consolidated failure/finalization block into a typed helper (e.g.,
_run_stream_retry_cycle) that accepts the existing _StreamContext and
_RetryState and returns a small result tuple (final_message, retryable: bool,
handled_error_info, etc.); alternatively move the OTEL/span teardown, cost
reconcile and CLI upload into a dedicated _finalize_turn helper called from
stream_chat_completion_sdk. Ensure the new helpers have precise signatures and
return types so Pyright can type-check them and that stream_chat_completion_sdk
becomes a slim orchestrator delegating to _run_stream_retry_cycle and/or
_finalize_turn.
- Around line 560-588: Replace the private call to
state.adapter._flush_unresolved_tool_calls (and remove the "# noqa: SLF001") by
adding a public adapter method flush_unresolved_tool_calls that preserves the
same behavior and typing (e.g., returns a list[StreamBaseResponse] or accepts a
mutable list to populate), then call state.adapter.flush_unresolved_tool_calls()
from _flush_orphan_tool_uses_to_session instead of the private member; also
update the other sites that currently call
state.adapter._flush_unresolved_tool_calls to use the new public
flush_unresolved_tool_calls so no linter suppressors are needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: add0ec08-6ab9-4bf8-8e5d-562ee84f91c0
📒 Files selected for processing (1)
autogpt_platform/backend/backend/copilot/sdk/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: type-check (3.13)
- GitHub Check: type-check (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: end-to-end tests
- GitHub Check: Check PR Status
- GitHub Check: Analyze (typescript)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
…ial + carry retryable through _HandledStreamError Two fixes layered on the partial-restore path introduced by this PR: 1. _rollback_attempt_capturing_partial now drops trailing error markers (COPILOT_ERROR_PREFIX / COPILOT_RETRYABLE_ERROR_PREFIX) from the captured partial. _run_stream_attempt's idle-timeout and circuit-breaker paths append a marker via _append_error_marker BEFORE raising _HandledStreamError; without this filter the post-loop restore would replay the stale marker and then add a fresh one, leaving duplicate error bubbles and pushing any synthetic tool_result after an assistant(error) turn that has no matching tool_use. 2. Replace the (msg, code, already_yielded) 3-tuple carrying _HandledStreamError state out of the retry loop with a frozen _HandledErrorInfo dataclass that also carries `retryable`. The post-loop block now uses exc.retryable instead of hardcoding True, so a future _HandledStreamError(retryable=False, ...) won't silently write the wrong marker prefix. 3 new tests cover the rollback marker-stripping contract.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 560-577: The early flush currently calls the private adapter
method _flush_unresolved_tool_calls(safety) which mutates resolved_tool_calls
and clears has_unresolved_tool_calls, preventing the later error-cleanup flush
from running and it also suppresses lint with # noqa: SLF001; fix this by
exposing a public adapter API (e.g., flush_unresolved_tool_calls or
flush_unresolved_tool_calls_returning_events) that returns the synthesized
StreamBaseResponse list without flipping has_unresolved_tool_calls (or otherwise
returns the events and leaves state mutation to the caller), update
_flush_orphan_tool_uses_to_session to call the new public method and
capture/return the events for reuse by the later cleanup block, and remove the #
noqa suppressor so the public method is used instead of calling the private
_flush_unresolved_tool_calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0ae1a767-1368-4708-af11-ffcb2e522d50
📒 Files selected for processing (2)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
🚧 Files skipped from review as they are similar to previous changes (1)
- autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (typescript)
- GitHub Check: type-check (3.13)
- GitHub Check: test (3.13)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.12)
- GitHub Check: test (3.12)
- GitHub Check: end-to-end tests
- GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (38)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:28.641Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:34:02.835Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T07:24:34.302Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/rate_limit.py:0-0
Timestamp: 2026-03-17T07:24:34.302Z
Learning: In `autogpt_platform/backend/backend/copilot/rate_limit.py`, all fail-open `except` blocks catch `(RedisError, ConnectionError, OSError)` specifically — not bare `except Exception`. This applies to `_session_reset_from_ttl`, `get_usage_status`, `check_rate_limit`, and `record_token_usage`. The narrowed tuple ensures only genuine Redis/network failures are swallowed; unexpected exceptions propagate normally.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*.py : Do not use linter suppressors — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code instead
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:22.385Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:22.385Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `_get_session_lock(session_id)` — a redis-py built-in `Lock` (Lua-script atomic acquire/release) keyed as `copilot:session_lock:{session_id}` with `timeout=10s` (crash-safety TTL) and `blocking_timeout=2s`. There is NO manual NX-poll loop and NO `asyncio.Lock`. On Redis failure, `_get_session_lock` logs a warning and yields without a lock — the in-function idempotency check (compare `session.messages[-1].role` and `.content`) still runs as a fallback. Do NOT expect a manual `SET NX` poll loop or `asyncio.Lock` to wrap `append_and_save_message`.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-10T08:39:22.025Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:38.279Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py:530-535
Timestamp: 2026-04-01T04:17:38.279Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`, the `ToolAnnotations(readOnlyHint=True)` annotation (stored as `_PARALLEL_ANNOTATION`) is intentionally applied to ALL registered MCP tools — including E2B write/edit tools (e.g., `write_file`, `edit_file`). This is a parallel-dispatch hint to the Claude Agent SDK CLI, not a semantic read-only contract. The `_READ_ONLY_E2B_TOOLS` set was dead code and was removed in commit `12ae03c`; the constant was renamed from `_READONLY_ANNOTATION` to `_PARALLEL_ANNOTATION` in commit `c88ca88` to avoid confusion. Do not flag this as a correctness issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T21:27:04.525Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12765
File: autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py:227-234
Timestamp: 2026-04-14T21:27:04.525Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py`, the `getattr(client, "graph_driver", None) or getattr(client, "driver", None)` pattern for accessing the Neo4j driver from a `graphiti_core.Graphiti` instance is intentional and correct. `graphiti_core.Graphiti` does not expose `driver` as a stable public property (`dir(Graphiti)` shows no `driver` or `graph_driver` public property); the attribute name has varied across library versions. The fallback chain handles cross-version compatibility. Do NOT flag this as a duck-typing violation. Additionally, soft delete (temporal invalidation), per-UUID success/failure reporting, and episode back-reference cleanup all require raw Cypher queries — the `EntityEdge.delete_by_uuids` batch API does not cover these cases.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-07T10:12:18.517Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12691
File: .claude/skills/orchestrate/SKILL.md:0-0
Timestamp: 2026-04-07T10:12:18.517Z
Learning: In Significant-Gravitas/AutoGPT's Claude skill markdown files under `.claude/skills/orchestrate/`, fenced code blocks in `SKILL.md`-style skill documents may intentionally omit a fenced code language (no `text`, `bash`, etc.). These blocks are used for Claude Code inline pseudocode/conceptual helpers rather than runnable scripts. During reviews, avoid treating MD040 (fenced-code-language) as an issue for these specific skill-format blocks, even if the language identifier is missing, since this omission is expected and has been accepted as a false positive for this skill format.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-13T13:11:09.987Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx:143-145
Timestamp: 2026-04-13T13:11:09.987Z
Learning: In Significant-Gravitas/AutoGPT `autogpt_platform/frontend`, `executionID` values used as URL query params (e.g. `activeItem=` in `SitrepItem.tsx`) are always UUIDs (e.g. `550e8400-e29b-41d4-a716-446655440000`). Their character set `[0-9a-f-]` contains no reserved URL characters, so `encodeURIComponent` or Next.js object-based `href` encoding is unnecessary. Do not flag direct UUID string interpolation into query strings as a URL-encoding issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-30T11:49:37.770Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:16.378Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/transcript_builder.py:30-34
Timestamp: 2026-04-03T11:14:16.378Z
Learning: In `autogpt_platform/backend/backend/copilot/transcript_builder.py` (and its re-export shim at `sdk/transcript_builder.py`), `TranscriptEntry.parentUuid` is typed `str` (not `str | None`) and root entries use `parentUuid=""` (empty string) to match the canonical `_messages_to_transcript` JSONL format. `_parse_entry`, `append_user`, and `append_assistant` all coerce `None` to `""`. Do NOT flag `parentUuid=""` as incorrect — it is the correct root marker. This was fixed in PR `#12623`, commit b753cb7d0b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
…o _InterruptedAttempt Previous revisions carried the failed-attempt state across three separate function-scope variables (last_attempt_partial, handled_error_info) + four module-level helpers (_rollback_attempt_capturing_partial, _restore_partial_with_error_marker, _flush_orphan_tool_uses_to_session, _append_error_marker). The retry loop mutated all three and the post-loop block reassembled the pieces by hand. Scattered and hard to follow. Collapse to one dataclass with capture / clear / finalize + one _classify_final_failure helper that picks the display message based on which failure flag the retry loop set (attempts_exhausted, transient_exhausted, stream_err, handled_error). Call sites: - success break: interrupted.clear() - _HandledStreamError: interrupted.capture(...); interrupted.handled_error = ... - Exception: interrupted.capture(...) - post-loop: final_msg, retryable = _classify_final_failure(interrupted, ...); interrupted.finalize(...) - outer except: interrupted.finalize(...) Behaviour is unchanged — same restore semantics, same StreamError sequencing, same transcript-upload skip, same orphan tool_use flush, same stale-marker stripping from b1172e2 / 5406fe9. The retry-scenarios suite (48 integration tests) plus the rewritten interrupted_partial_test (14 unit tests) both pass; the full SDK test suite (1012 tests) is green.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
634-650:⚠️ Potential issue | 🟠 MajorDon't consume the adapter flush before the client cleanup path runs.
_flush_orphan_tool_uses_to_session()calls_flush_unresolved_tool_calls()during restore, which clearshas_unresolved_tool_callsbefore the latererror_flushblock checks it. That preserves DB/session validity, but it also prevents theStreamToolOutputAvailablecleanup events from being yielded to the client, so interrupted tool widgets/spinners can stay open until refresh. Persist the synthesizedtool_resultrows without mutating adapter state yet, or return/reuse the generated flush events in the 4146-4161 block instead of flushing twice. As per coding guidelines, "Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead."Also applies to: 4135-4161
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 634 - 650, The current _flush_orphan_tool_uses_to_session calls state.adapter._flush_unresolved_tool_calls which mutates adapter state (clearing has_unresolved_tool_calls) and prevents the later error_flush block from yielding StreamToolOutputAvailable events to the client; instead, change the flow to synthesize the StreamBaseResponse rows without mutating adapter state: either add/use a non-mutating helper on the adapter that returns the synthesized events (e.g., make _flush_unresolved_tool_calls return a list of StreamBaseResponse or add a new method like _collect_unresolved_tool_calls) and append/return those events to be consumed by the later error_flush block, or modify _flush_unresolved_tool_calls to accept a no_mutation flag so _flush_orphan_tool_uses_to_session can collect responses but defer state changes until the client cleanup path runs; remove the noqa suppressor and ensure has_unresolved_tool_calls is only cleared when the actual client-yielding cleanup path consumes the events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 634-650: The current _flush_orphan_tool_uses_to_session calls
state.adapter._flush_unresolved_tool_calls which mutates adapter state (clearing
has_unresolved_tool_calls) and prevents the later error_flush block from
yielding StreamToolOutputAvailable events to the client; instead, change the
flow to synthesize the StreamBaseResponse rows without mutating adapter state:
either add/use a non-mutating helper on the adapter that returns the synthesized
events (e.g., make _flush_unresolved_tool_calls return a list of
StreamBaseResponse or add a new method like _collect_unresolved_tool_calls) and
append/return those events to be consumed by the later error_flush block, or
modify _flush_unresolved_tool_calls to accept a no_mutation flag so
_flush_orphan_tool_uses_to_session can collect responses but defer state changes
until the client cleanup path runs; remove the noqa suppressor and ensure
has_unresolved_tool_calls is only cleared when the actual client-yielding
cleanup path consumes the events.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 39798136-a2aa-4a01-99b6-c78ed21a48de
📒 Files selected for processing (2)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/service.py
🚧 Files skipped from review as they are similar to previous changes (1)
- autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: check API types
- GitHub Check: Seer Code Review
- GitHub Check: end-to-end tests
- GitHub Check: test (3.12)
- GitHub Check: test (3.11)
- GitHub Check: type-check (3.12)
- GitHub Check: test (3.13)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (typescript)
- GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development
autogpt_platform/backend/**/*.py: Usepoetry run ...command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies likeopenpyxl
Use absolute imports withfrom backend.module import ...for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoidhasattr/getattr/isinstancefor type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no# type: ignore,# noqa,# pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use%sfor deferred interpolation indebuglog statements for efficiency; use f-strings elsewhere for readability (e.g.,logger.debug("Processing %s items", count)vslogger.info(f"Processing {count} items"))
Sanitize error paths by usingos.path.basename()in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Usetransaction=Truefor Redis pipelines to ensure atomicity on multi-step operations
Usemax(0, value)guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Format Python code with
poetry run format
Files:
autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (44)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:28.641Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T07:24:34.302Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/rate_limit.py:0-0
Timestamp: 2026-03-17T07:24:34.302Z
Learning: In `autogpt_platform/backend/backend/copilot/rate_limit.py`, all fail-open `except` blocks catch `(RedisError, ConnectionError, OSError)` specifically — not bare `except Exception`. This applies to `_session_reset_from_ttl`, `get_usage_status`, `check_rate_limit`, and `record_token_usage`. The narrowed tuple ensures only genuine Redis/network failures are swallowed; unexpected exceptions propagate normally.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:34:02.835Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*.py : Do not use linter suppressors — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code instead
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:22.385Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:22.385Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `_get_session_lock(session_id)` — a redis-py built-in `Lock` (Lua-script atomic acquire/release) keyed as `copilot:session_lock:{session_id}` with `timeout=10s` (crash-safety TTL) and `blocking_timeout=2s`. There is NO manual NX-poll loop and NO `asyncio.Lock`. On Redis failure, `_get_session_lock` logs a warning and yields without a lock — the in-function idempotency check (compare `session.messages[-1].role` and `.content`) still runs as a fallback. Do NOT expect a manual `SET NX` poll loop or `asyncio.Lock` to wrap `append_and_save_message`.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-10T08:39:22.025Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:38.279Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py:530-535
Timestamp: 2026-04-01T04:17:38.279Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`, the `ToolAnnotations(readOnlyHint=True)` annotation (stored as `_PARALLEL_ANNOTATION`) is intentionally applied to ALL registered MCP tools — including E2B write/edit tools (e.g., `write_file`, `edit_file`). This is a parallel-dispatch hint to the Claude Agent SDK CLI, not a semantic read-only contract. The `_READ_ONLY_E2B_TOOLS` set was dead code and was removed in commit `12ae03c`; the constant was renamed from `_READONLY_ANNOTATION` to `_PARALLEL_ANNOTATION` in commit `c88ca88` to avoid confusion. Do not flag this as a correctness issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T21:27:04.525Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12765
File: autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py:227-234
Timestamp: 2026-04-14T21:27:04.525Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py`, the `getattr(client, "graph_driver", None) or getattr(client, "driver", None)` pattern for accessing the Neo4j driver from a `graphiti_core.Graphiti` instance is intentional and correct. `graphiti_core.Graphiti` does not expose `driver` as a stable public property (`dir(Graphiti)` shows no `driver` or `graph_driver` public property); the attribute name has varied across library versions. The fallback chain handles cross-version compatibility. Do NOT flag this as a duck-typing violation. Additionally, soft delete (temporal invalidation), per-UUID success/failure reporting, and episode back-reference cleanup all require raw Cypher queries — the `EntityEdge.delete_by_uuids` batch API does not cover these cases.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-07T10:12:18.517Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12691
File: .claude/skills/orchestrate/SKILL.md:0-0
Timestamp: 2026-04-07T10:12:18.517Z
Learning: In Significant-Gravitas/AutoGPT's Claude skill markdown files under `.claude/skills/orchestrate/`, fenced code blocks in `SKILL.md`-style skill documents may intentionally omit a fenced code language (no `text`, `bash`, etc.). These blocks are used for Claude Code inline pseudocode/conceptual helpers rather than runnable scripts. During reviews, avoid treating MD040 (fenced-code-language) as an issue for these specific skill-format blocks, even if the language identifier is missing, since this omission is expected and has been accepted as a false positive for this skill format.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-13T13:11:09.987Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx:143-145
Timestamp: 2026-04-13T13:11:09.987Z
Learning: In Significant-Gravitas/AutoGPT `autogpt_platform/frontend`, `executionID` values used as URL query params (e.g. `activeItem=` in `SitrepItem.tsx`) are always UUIDs (e.g. `550e8400-e29b-41d4-a716-446655440000`). Their character set `[0-9a-f-]` contains no reserved URL characters, so `encodeURIComponent` or Next.js object-based `href` encoding is unnecessary. Do not flag direct UUID string interpolation into query strings as a URL-encoding issue.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-30T11:49:37.770Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:16.378Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/transcript_builder.py:30-34
Timestamp: 2026-04-03T11:14:16.378Z
Learning: In `autogpt_platform/backend/backend/copilot/transcript_builder.py` (and its re-export shim at `sdk/transcript_builder.py`), `TranscriptEntry.parentUuid` is typed `str` (not `str | None`) and root entries use `parentUuid=""` (empty string) to match the canonical `_messages_to_transcript` JSONL format. `_parse_entry`, `append_user`, and `append_assistant` all coerce `None` to `""`. Do NOT flag `parentUuid=""` as incorrect — it is the correct root marker. This was fixed in PR `#12623`, commit b753cb7d0b.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T14:54:01.937Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:30:30.764Z
Learnt from: Abhi1992002
Repo: Significant-Gravitas/AutoGPT PR: 12417
File: autogpt_platform/backend/backend/blocks/agent_mail/pods.py:62-74
Timestamp: 2026-03-16T16:30:30.764Z
Learning: In autogpt_platform/backend/backend/blocks/**/*.py, explicit try/except in the `run()` method is NOT required for standard error handling. The block framework's `_execute()` method in `_base.py` catches unhandled exceptions and re-raises them as `BlockExecutionError` or `BlockUnknownError`. Additionally, when a block yields `("error", message)`, `_execute()` immediately raises `BlockExecutionError` — so the `error` output port never propagates downstream. Explicit try/except is only needed when partial output must be controlled (e.g., attachment blocks that must skip yielding `content_base64` on failure).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T13:53:33.653Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12206
File: autogpt_platform/backend/snapshots/v2_unhandled_exception_500:1-5
Timestamp: 2026-04-03T13:53:33.653Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the catch-all `Exception` handler in `autogpt_platform/backend/backend/api/utils/exceptions.py` (`_handle_error()`) intentionally surfaces `str(exc)` as the `detail` field in HTTP 500 responses for non-Prisma errors. This is by design: errors are logged server-side, and the detail helps API consumers report issues. Only `PrismaError` responses are sanitized (see commit ce6910b4a). Do not flag `str(exc)` in the generic 500 handler as an information disclosure issue; the snapshot `autogpt_platform/backend/snapshots/v2_unhandled_exception_500` ("connection refused") correctly reflects this behavior.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T22:49:10.465Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 11235
File: autogpt_platform/frontend/src/app/(platform)/admin/diagnostics/components/ExecutionsTable.tsx:0-0
Timestamp: 2026-04-15T22:49:10.465Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform/frontend), `Sentry.captureException` is NOT required in `catch` blocks for React Query mutation error paths. React Query already handles error propagation internally, and the correct pattern is: toast notifications for mutation errors, ErrorCard for render/fetch errors. Only add `Sentry.captureException` for truly manual/unexpected exception paths outside of React Query's scope (e.g., standalone async utilities, event handlers not wired through React Query).
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:39:52.592Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.
Applied to files:
autogpt_platform/backend/backend/copilot/sdk/service.py
🔇 Additional comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
562-631: Good consolidation of interrupted-attempt state.Centralizing rollback/restore into
_InterruptedAttemptand carryingretryable/already_yieldedvia_HandledErrorInfomakes the final-failure path much easier to reason about, and it closes the stale-marker + handled-error contract gaps cleanly.Also applies to: 2137-2149, 4045-4050, 4188-4195
…final-failure emit CodeRabbit flagged that _flush_orphan_tool_uses_to_session (called from _InterruptedAttempt.finalize) used state.adapter._flush_unresolved_tool_calls with a # noqa: SLF001 suppressor. The private call mutates resolved_tool_calls and flips has_unresolved_tool_calls to False, which caused the downstream error-cleanup block at lines 4185-4200 to skip its own flush — UI spinners on the client stayed open until page refresh because no cleanup events were yielded after the early flush swallowed the unresolved state. Changes: - Rename _flush_unresolved_tool_calls → flush_unresolved_tool_calls (public) in response_adapter.py; update 3 internal call sites + 2 service.py sites. Drops the # noqa: SLF001 suppressor (no longer a private-access violation). - _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now return the list[StreamBaseResponse] produced by the flush so the caller yields them to the client instead of re-flushing. - Replace the three scattered post-loop error blocks (partial restore + redundant flush + stream_err yield + handled_error yield) with one consolidated block that: (a) calls _classify_final_failure → _FinalFailure, (b) yields finalize()'s events + _end_text_if_open, (c) yields one StreamError (unless handled_error.already_yielded=True). Fixes the double-flush skip-cleanup bug and eliminates duplicated error-text/code strings between history marker and SSE yield. - _classify_final_failure now returns _FinalFailure(display_msg, code, retryable) instead of a (msg, retryable) tuple — single source of truth for in-history marker + SSE event so they can't drift. Tests: +5 _classify_final_failure contract tests, +2 return-value assertions on finalize/orphan-flush. All 1022 SDK tests pass (was 1012).
|
🤖 @coderabbitai — the duplicate-comments section in this review (line 634-650: "Don't consume the adapter flush before the client cleanup path runs") was already addressed in 6576bf561:\n\n- |
|
Tip For best results, initiate chat on the files or code changes.
✏️ Learnings added
🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
4114-4152:⚠️ Potential issue | 🟠 MajorKeep mid-stream transient failures retryable.
When Line 4114 short-circuits because content was already streamed, the transient branch below never runs. A mid-stream
ECONNRESET/429 then falls through to_classify_final_failure(..., stream_err=...), which persists a non-retryablesdk_stream_errormarker instead of the retryable marker the frontend expects for “Try again”.💡 Suggested fix
if events_yielded > 0: # Events were already sent to the frontend and cannot be # unsent. Retrying would produce duplicate/inconsistent # output, so treat this as a final error. logger.warning( "%s Not retrying — %d events already yielded", log_prefix, events_yielded, ) + if is_transient: + interrupted.handled_error = _HandledErrorInfo( + error_msg=FRIENDLY_TRANSIENT_MSG, + code="transient_api_error", + retryable=True, + already_yielded=False, + ) skip_transcript_upload = True ended_with_stream_error = True breakBased on learnings, retry signaling for transient Anthropic API errors is done via
COPILOT_RETRYABLE_ERROR_PREFIXin persisted session messages, and the frontend derivesmarkerType === "retryable_error"from that marker.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 4114 - 4152, The current early-exit when events_yielded > 0 prevents transient-error retry logic from running and causes mid-stream transient failures to be marked non-retryable; update the control flow so that transient errors are handled even if events_yielded > 0: when is_transient is true (and even if events_yielded > 0) call _next_transient_backoff(...) and, if backoff is returned, run the async backoff loop via _do_transient_backoff(...) and continue retrying instead of immediately setting skip_transcript_upload/ended_with_stream_error; only fall through to the non-retryable final error path if transient retries are exhausted (set transient_exhausted and persist the retryable marker using the same retry signaling used elsewhere, e.g., COPILOT_RETRYABLE_ERROR_PREFIX), keeping references to events_yielded, is_transient, _next_transient_backoff, _do_transient_backoff, and transient_exhausted to locate the changes.
♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)
4287-4293:⚠️ Potential issue | 🟠 MajorDon't persist
"Operation cancelled"after a successful turn.After Line 4029 clears
interruptedon success, this block still callsinterrupted.finalize(...)for any laterCancelledError/disconnect. If that happens while yieldingStreamUsageor trailing events, the turn can finish successfully and still get a persisted cancellation marker appended on refresh.💡 Suggested fix
- if not ended_with_stream_error: - interrupted.finalize(session, state, display_msg, retryable=is_transient) + if not ended_with_stream_error: + has_interrupted_state = ( + bool(interrupted.partial) or interrupted.handled_error is not None + ) + if has_interrupted_state: + interrupted.finalize(session, state, display_msg, retryable=is_transient) + elif not isinstance(e, asyncio.CancelledError) and not _is_sdk_disconnect_error(e): + _append_error_marker(session, display_msg, retryable=is_transient) logger.debug( "%s Appended error marker, will be persisted in finally", log_prefix, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 4287 - 4293, The finalize call unconditionally persists a cancellation marker even when interrupted was cleared earlier on success; change the guard so we only call interrupted.finalize(...) when interrupted still represents a pending cancellation (e.g. wrap the existing if not ended_with_stream_error: with an additional check like if interrupted and interrupted.is_pending()/not interrupted.is_cleared(): interrupted.finalize(session, state, display_msg, retryable=is_transient)), using the object's existing truthiness or its status method (is_pending/is_cleared) to locate the correct condition in the code paths that touch interrupted and finalize.
🧹 Nitpick comments (1)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py (1)
39-57: Optional: tighten type hints on test helpers.
_tool_output'soutputparam,_adapter_with_unresolved's return type, and the inner_flush(out: list)element type are unannotated. Adds minimal value but keeps the helpers self-documenting and consistent with the typed helpers below.♻️ Proposed refinement
-def _tool_output(tool_call_id: str, output) -> StreamToolOutputAvailable: +def _tool_output( + tool_call_id: str, output: str | dict[str, object] +) -> StreamToolOutputAvailable: return StreamToolOutputAvailable( toolCallId=tool_call_id, toolName="t", output=output ) -def _adapter_with_unresolved(responses: list[StreamToolOutputAvailable]): +def _adapter_with_unresolved( + responses: list[StreamToolOutputAvailable], +) -> MagicMock: """Stub _RetryState whose adapter flushes the given responses.""" adapter = MagicMock() adapter.has_unresolved_tool_calls = bool(responses) - def _flush(out: list) -> None: + def _flush(out: list[StreamToolOutputAvailable]) -> None: out.extend(responses) adapter.has_unresolved_tool_calls = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py` around lines 39 - 57, Annotate the test helpers: import typing.Any and typing.List if not already, change _tool_output signature to def _tool_output(tool_call_id: str, output: Any) -> StreamToolOutputAvailable, annotate _adapter_with_unresolved to return a MagicMock (def _adapter_with_unresolved(responses: list[StreamToolOutputAvailable]) -> MagicMock) and type the inner flush parameter as def _flush(out: List[StreamToolOutputAvailable]) -> None; this keeps helpers self-documenting and consistent with the typed helpers below.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 4114-4152: The current early-exit when events_yielded > 0 prevents
transient-error retry logic from running and causes mid-stream transient
failures to be marked non-retryable; update the control flow so that transient
errors are handled even if events_yielded > 0: when is_transient is true (and
even if events_yielded > 0) call _next_transient_backoff(...) and, if backoff is
returned, run the async backoff loop via _do_transient_backoff(...) and continue
retrying instead of immediately setting
skip_transcript_upload/ended_with_stream_error; only fall through to the
non-retryable final error path if transient retries are exhausted (set
transient_exhausted and persist the retryable marker using the same retry
signaling used elsewhere, e.g., COPILOT_RETRYABLE_ERROR_PREFIX), keeping
references to events_yielded, is_transient, _next_transient_backoff,
_do_transient_backoff, and transient_exhausted to locate the changes.
---
Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 4287-4293: The finalize call unconditionally persists a
cancellation marker even when interrupted was cleared earlier on success; change
the guard so we only call interrupted.finalize(...) when interrupted still
represents a pending cancellation (e.g. wrap the existing if not
ended_with_stream_error: with an additional check like if interrupted and
interrupted.is_pending()/not interrupted.is_cleared():
interrupted.finalize(session, state, display_msg, retryable=is_transient)),
using the object's existing truthiness or its status method
(is_pending/is_cleared) to locate the correct condition in the code paths that
touch interrupted and finalize.
---
Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py`:
- Around line 39-57: Annotate the test helpers: import typing.Any and
typing.List if not already, change _tool_output signature to def
_tool_output(tool_call_id: str, output: Any) -> StreamToolOutputAvailable,
annotate _adapter_with_unresolved to return a MagicMock (def
_adapter_with_unresolved(responses: list[StreamToolOutputAvailable]) ->
MagicMock) and type the inner flush parameter as def _flush(out:
List[StreamToolOutputAvailable]) -> None; this keeps helpers self-documenting
and consistent with the typed helpers below.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3e4c17da-2cf0-43b1-897b-d917faab2b60
📒 Files selected for processing (3)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.pyautogpt_platform/backend/backend/copilot/sdk/response_adapter.pyautogpt_platform/backend/backend/copilot/sdk/service.py
…-limit
Mode 1 (rate-limit at turn start, user message never persisted): the
backend's `check_rate_limit` raises BEFORE `append_and_save_message`, so
when a 429 fires the user's text only exists in the optimistic `useChat`
bubble — refresh or even a successful retry would lose it.
See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save) — backend
can't recover this on its own.
New flow on 429:
- drop the optimistic user bubble (since DB has no record of it),
- push `lastSubmittedMsgRef.current` back into the composer via the
existing `setInitialPrompt` slot — same path URL pre-fills use, so
`useChatInput`'s `consumeInitialPrompt` effect picks it up
automatically,
- clear `lastSubmittedMsgRef` so dedup doesn't block re-send.
In-memory only; refresh-survival is a separate follow-up.
E2E Live-Stack Test Report — PR #12918Date: 2026-04-25 Verdict: APPROVE — all live scenarios PASS, both modes of SECRT-2275 verified end-to-end.Scenario 1: Happy path
Scenario 2: Rate-limit at turn start (Mode 1)
Scenario 3: Synthetic tool-call limit graceful finish
Scenario 4: Mid-stream failure (Mode 2 — the fix's actual target path)
Scenario 5: Post-cleanup verification
Cleanup
Notes
|
Adds two cases that the existing 429 test did not exercise so codecov/patch clears the 80% threshold: 1. The setMessages updater is invoked but no-ops when the trailing message is not a user bubble (assistant reply already landed). 2. The composer-restore branch is skipped entirely when no unsent text was captured (lastSubmittedMsgRef is null at error time).
Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the restore() call with a `TranscriptSnapshot` type alias exported from transcript_builder, so `_InterruptedAttempt.capture` is fully typed.
Background
SECRT-2275. User report: when a copilot ("autopilot") turn is interrupted by a usage-limit, tool-call-limit, or other run interruption, the user's recent work disappears. User described it as: "my initial message was lost 3 times and it disappeared, then when I would say 'continue' it would do a random old task."
Investigation surfaced two distinct failure modes. This PR addresses both.
useChatbubble; the backend rejects before the message is persisted, so the bubble is a lie and a refresh / retry would lose the text.Mode 1 — frontend: restore unsent text on 429
Backend can't recover this on its own:
check_rate_limitraises beforeappend_and_save_message, so by the time the 429 surfaces there is no DB row to roll forward. Seeautogpt_platform/backend/backend/api/features/chat/routes.py:916-922(rate-limit check) androutes.py:945(later append-and-save).Frontend fix in
autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts: whenuseChat'sonErrorreports a usage-limit error, welastSubmittedMsgRef.currentback into the composer via the existingsetInitialPromptslot — the same slot URL pre-fills use, souseChatInput'sconsumeInitialPrompteffect picks it up automatically,lastSubmittedMsgRefso the dedup guard doesn't block re-send.In-memory only; surviving a hard refresh while rate-limited is a separate follow-up (would need localStorage persistence with TTL).
Test:
autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts— verifies the composer is repopulated and the optimistic bubble is dropped on a 429.Mode 2 — backend: preserve interrupted partial in DB
Root cause
The SDK retry loop in
stream_chat_completion_sdkalways rolls backsession.messagesto the pre-attempt watermark on any exception. That rollback is correct before a retry so attempt #2 doesn't duplicate attempt #1's content. But it runs before the retry decision is made, so when retries are exhausted (or no retry is attempted) the partial work is discarded too.Three branches of the retry loop ended in a final-failure state with side effects worse than just losing the partial:
_HandledStreamErrornon-transient: rollback then add error marker — partial goneExceptionwithevents_yielded > 0: rollback then break — no error marker added either, so on refresh the chat looks like nothing happened even though the user just watched tokens stream liveExceptionnon-context-non-transient + the while-else:exhaustion path: same, no markerFix
autogpt_platform/backend/backend/copilot/sdk/service.py:_InterruptedAttemptdataclass — holds the rolled-backpartial: list[ChatMessage]+ optionalhandled_error: _HandledErrorInfo. Three methods drive the contract:capture(session, transcript_builder, transcript_snap, pre_attempt_msg_count)— slicessession.messages, restores the transcript, strips trailing error markers to prevent duplicate markers after restore.clear()— drops captured state on a successful retry so outer cleanup paths don't replay pre-retry content.finalize(session, state, display_msg, retryable=...) -> list[StreamBaseResponse]— re-attaches partial, synthesizestool_resultrows for orphantool_useblocks, appends the canonical error marker, and returns the flushed events so the caller can yield them to the client (no double-flush)._flush_orphan_tool_uses_to_session(session, state) -> list[StreamBaseResponse]— synthesizestool_resultrows for anytool_usethat never resolved before the error so the next turn's LLM context stays API-valid (Anthropic rejects orphan tool_use). Uses the publicadapter.flush_unresolved_tool_callsand returns the events for the caller to yield._classify_final_failure(...) -> _FinalFailure | None— picks the display message + stream code + retryable flag for the final-failure exit. One source of truth for the in-history error marker and the client-facingStreamErrorSSE yield so they can't drift.yield StreamErrorsites) collapsed to one block driven by_classify_final_failure→_FinalFailure→finalize()→ yield events + singleStreamError.flush_unresolved_tool_calls(renamed from_flush_unresolved_tool_callsto drop the# noqa: SLF001suppressors on cross-module callers).Each retry-loop rollback site calls
interrupted.capture(...); the success break callsinterrupted.clear(); the post-loop failure block callsinterrupted.finalize(...)exactly once.The baseline service already preserves partial work via its existing finally block — no change needed there.
Tests
Backend (
backend/copilot/sdk/interrupted_partial_test.py, new, 18 tests):TestInterruptedAttemptCapture— slice semantics + stale-marker strippingTestInterruptedAttemptFinalize— appends partial then marker, handles empty partial, no-op onNonesession, flushes unresolved tools between partial and marker, returns flushed events for caller to yieldTestFlushOrphanToolUses— synthesizestool_resultrows, returns events, no-op on None state / no unresolvedTestClassifyFinalFailure— handled_error wins, attempts_exhausted, transient_exhausted, stream_err fallback, returns None on success pathTestRetryRollbackContract— end-to-end: capture + finalize yields the exact content the user saw streaming live plus the error marker1022 total SDK tests pass (baseline + new).
Frontend (
useCopilotStream.test.ts): 1 new test —restores the unsent text and drops the optimistic user bubble on 429 usage-limit.Out of scope
finallydoesn't run — needs a different mechanism (pod-level checkpoint sweeper).Checklist
poetry run format)Test plan