Skip to content

fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#12918

Merged
majdyz merged 8 commits into
devfrom
zamilmajdy/secrt-2275-persist-interrupted-turn-state
Apr 25, 2026
Merged

fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit#12918
majdyz merged 8 commits into
devfrom
zamilmajdy/secrt-2275-persist-interrupted-turn-state

Conversation

@majdyz
Copy link
Copy Markdown
Contributor

@majdyz majdyz commented Apr 25, 2026

Background

SECRT-2275. User report: when a copilot ("autopilot") turn is interrupted by a usage-limit, tool-call-limit, or other run interruption, the user's recent work disappears. User described it as: "my initial message was lost 3 times and it disappeared, then when I would say 'continue' it would do a random old task."

Investigation surfaced two distinct failure modes. This PR addresses both.

  • Mode 1 — rate-limit (or other pre-stream rejection) at turn start: the user's text only ever lives in the optimistic useChat bubble; the backend rejects before the message is persisted, so the bubble is a lie and a refresh / retry would lose the text.
  • Mode 2 — long-running turn interrupted mid-stream: the entire turn's progress (assistant text, tool calls, reasoning) vanishes on interruption — what users describe as "the turn is gone."

Mode 1 — frontend: restore unsent text on 429

Backend can't recover this on its own: check_rate_limit raises before append_and_save_message, so by the time the 429 surfaces there is no DB row to roll forward. See autogpt_platform/backend/backend/api/features/chat/routes.py:916-922 (rate-limit check) and routes.py:945 (later append-and-save).

Frontend fix in autogpt_platform/frontend/src/app/(platform)/copilot/useCopilotStream.ts: when useChat's onError reports a usage-limit error, we

  • drop the optimistic user bubble (DB has no record of it, so leaving it would be a phantom),
  • push lastSubmittedMsgRef.current back into the composer via the existing setInitialPrompt slot — the same slot URL pre-fills use, so useChatInput's consumeInitialPrompt effect picks it up automatically,
  • clear lastSubmittedMsgRef so the dedup guard doesn't block re-send.

In-memory only; surviving a hard refresh while rate-limited is a separate follow-up (would need localStorage persistence with TTL).

Test: autogpt_platform/frontend/src/app/(platform)/copilot/__tests__/useCopilotStream.test.ts — verifies the composer is repopulated and the optimistic bubble is dropped on a 429.

Mode 2 — backend: preserve interrupted partial in DB

Root cause

The SDK retry loop in stream_chat_completion_sdk always rolls back session.messages to the pre-attempt watermark on any exception. That rollback is correct before a retry so attempt #2 doesn't duplicate attempt #1's content. But it runs before the retry decision is made, so when retries are exhausted (or no retry is attempted) the partial work is discarded too.

Three branches of the retry loop ended in a final-failure state with side effects worse than just losing the partial:

  • _HandledStreamError non-transient: rollback then add error marker — partial gone
  • Exception with events_yielded > 0: rollback then break — no error marker added either, so on refresh the chat looks like nothing happened even though the user just watched tokens stream live
  • Exception non-context-non-transient + the while-else: exhaustion path: same, no marker
  • Outer except (cancellation, GeneratorExit cleanup): didn't restore captured partial

Fix

autogpt_platform/backend/backend/copilot/sdk/service.py:

  1. _InterruptedAttempt dataclass — holds the rolled-back partial: list[ChatMessage] + optional handled_error: _HandledErrorInfo. Three methods drive the contract:
    • capture(session, transcript_builder, transcript_snap, pre_attempt_msg_count) — slices session.messages, restores the transcript, strips trailing error markers to prevent duplicate markers after restore.
    • clear() — drops captured state on a successful retry so outer cleanup paths don't replay pre-retry content.
    • finalize(session, state, display_msg, retryable=...) -> list[StreamBaseResponse] — re-attaches partial, synthesizes tool_result rows for orphan tool_use blocks, appends the canonical error marker, and returns the flushed events so the caller can yield them to the client (no double-flush).
  2. _flush_orphan_tool_uses_to_session(session, state) -> list[StreamBaseResponse] — synthesizes tool_result rows for any tool_use that never resolved before the error so the next turn's LLM context stays API-valid (Anthropic rejects orphan tool_use). Uses the public adapter.flush_unresolved_tool_calls and returns the events for the caller to yield.
  3. _classify_final_failure(...) -> _FinalFailure | None — picks the display message + stream code + retryable flag for the final-failure exit. One source of truth for the in-history error marker and the client-facing StreamError SSE yield so they can't drift.
  4. Consolidated post-loop emit: the former three scattered blocks (partial restore + redundant re-flush + two separate yield StreamError sites) collapsed to one block driven by _classify_final_failure_FinalFailurefinalize() → yield events + single StreamError.
  5. Adapter flush_unresolved_tool_calls (renamed from _flush_unresolved_tool_calls to drop the # noqa: SLF001 suppressors on cross-module callers).

Each retry-loop rollback site calls interrupted.capture(...); the success break calls interrupted.clear(); the post-loop failure block calls interrupted.finalize(...) exactly once.

The baseline service already preserves partial work via its existing finally block — no change needed there.

Tests

Backend (backend/copilot/sdk/interrupted_partial_test.py, new, 18 tests):

  • TestInterruptedAttemptCapture — slice semantics + stale-marker stripping
  • TestInterruptedAttemptFinalize — appends partial then marker, handles empty partial, no-op on None session, flushes unresolved tools between partial and marker, returns flushed events for caller to yield
  • TestFlushOrphanToolUses — synthesizes tool_result rows, returns events, no-op on None state / no unresolved
  • TestClassifyFinalFailure — handled_error wins, attempts_exhausted, transient_exhausted, stream_err fallback, returns None on success path
  • TestRetryRollbackContract — end-to-end: capture + finalize yields the exact content the user saw streaming live plus the error marker

1022 total SDK tests pass (baseline + new).

Frontend (useCopilotStream.test.ts): 1 new test — restores the unsent text and drops the optimistic user bubble on 429 usage-limit.

Out of scope

  • Frontend rendering tweaks for the interrupted-turn marker (existing error-marker rendering already works).
  • Refresh-survival of the unsent text in Mode 1 (would require localStorage persistence with TTL) — separate follow-up.
  • Hard process-kill / OOM where Python finally doesn't run — needs a different mechanism (pod-level checkpoint sweeper).

Checklist

  • My code follows the style guidelines of this project (black/isort/ruff via poetry run format)
  • I have performed a self-review of my own code
  • I have added relevant unit tests
  • I have run lint and tests locally (1022 SDK tests pass)

Test plan

  • Verify a long-running turn that hits transient-retry exhaustion preserves partial assistant text + tool results in chat history after refresh
  • Verify the next user message after an interrupted turn carries enough context that the model can continue the prior task instead of inventing a new one
  • Verify a successful retry (attempt Complete prompt redesign #1 fails, attempt #2 succeeds) shows ONLY attempt #2's content (no leaked partial from Complete prompt redesign #1)
  • Verify hitting daily usage limit at turn start re-populates the composer with the unsent text and removes the optimistic user bubble

…failure exit

SECRT-2275 — when an SDK turn was interrupted (transient API errors with
exhausted retries, mid-stream LLM exceptions, or context-overflow with all
attempts exhausted) the retry loop's pre-decision rollback discarded the
assistant's partial work (text + tool calls + reasoning) that had been
incrementally appended to session.messages during the failed attempt.

Users described it as "the turn is gone": their UI streamed tokens live, then
a refresh showed an empty turn and the next message would prompt the model
to "continue" with no context, so it picked an unrelated old task.

Fix: capture the rolled-back partial in the retry-loop exception handlers and
re-attach it via a single helper on every final-failure branch (including
the events_yielded > 0 path that previously skipped the error marker entirely
and the non-context-non-transient + attempts-exhausted paths). Synthesize
"interrupted" tool_result rows for any orphan tool_use so the next turn's
LLM context stays API-valid. Successful retry breaks clear the captured
partial so attempt #1's rolled-back content doesn't leak into a successful
attempt #2's history.

Baseline path already preserves partial via its existing finally block; only
SDK was affected.
@majdyz majdyz requested a review from a team as a code owner April 25, 2026 01:20
@majdyz majdyz requested review from 0ubbe and kcze and removed request for a team April 25, 2026 01:20
@github-project-automation github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Apr 25, 2026
@github-actions github-actions Bot added the platform/backend AutoGPT Platform - Back end label Apr 25, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a test module and refactors SDK streaming/failure handling to capture per-attempt partial assistant/tool output, roll back session/transcript state on interrupted attempts, flush unresolved tool uses into synthetic tool messages, and restore the captured partial once on final failure with a single error marker.

Changes

Cohort / File(s) Summary
New Test Suite
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
Adds tests exercising _InterruptedAttempt capture/finalize/clear lifecycle, unresolved tool-call flushing into synthetic tool messages, classification of final failures, and end-to-end capture+finalize behavior.
SDK Streaming / Error Handling
autogpt_platform/backend/backend/copilot/sdk/service.py
Introduces _InterruptedAttempt, _HandledErrorInfo, _classify_final_failure, and _flush_orphan_tool_uses_to_session; centralizes deferred post-loop restoration and error emission; removes scattered retry/error marker handling.
Response Adapter API
autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
Renames and exposes flush_unresolved_tool_calls (public) and updates call sites to coordinate single flush that mutates resolved_tool_calls and returns synthetic tool-output events.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant SDK as SDKService
  participant Adapter as ResponseAdapter
  participant Session as ChatSession

  Client->>SDK: start stream_chat_completion_sdk()
  SDK->>Adapter: begin attempt (streaming)
  Adapter-->>SDK: assistant/tool messages appended to session
  Adapter-->>SDK: report unresolved tool calls
  SDK->>SDK: attempt fails (exception)
  SDK->>Session: capture rolled-back session messages (partial)
  SDK->>SDK: snapshot/restore TranscriptBuilder state
  alt Adapter has unresolved tool calls
    SDK->>Adapter: flush_unresolved_tool_calls()
    Adapter-->>Session: insert synthetic `tool` messages into session partial
    Adapter-->>SDK: return same tool-output events to emit
  end
  alt retries remain
    SDK->>SDK: retry loop (partial preserved)
  else final failure / no retries
    SDK->>Session: restore captured partial into session
    SDK->>Session: append single copilot error marker (retryable/non-retryable)
    SDK->>Client: emit StreamError if not already yielded
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

size/xl

Suggested reviewers

  • 0ubbe
  • kcze
  • Pwuts
  • Bentlybro

Poem

"🐇 I hopped through streams where fragments hide,
I gathered partials swept back by the tide.
Flushed stray tool-crumbs, stitched the tale once more,
Left one final mark upon the shore.
A little hop — the session's whole once more."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 39.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: preserving interrupted SDK partial work on final-failure exits, which directly addresses the root cause and fix outlined in the detailed description.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is directly related to the changeset, detailing the root cause, fix implementation across multiple files, and test coverage for preserving interrupted partial work.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zamilmajdy/secrt-2275-persist-interrupted-turn-state

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 25, 2026

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

  • fix(copilot): prevent 524 timeout on chat deletion by deferring cleanup #12668 (Otto-AGPT · updated 8d ago)

    • autogpt_platform/backend/backend/api/features/library/db.py (5 conflicts, ~67 lines)
    • autogpt_platform/backend/backend/api/features/library/model.py (1 conflict, ~4 lines)
    • autogpt_platform/backend/backend/api/features/subscription_routes_test.py (22 conflicts, ~1047 lines)
    • autogpt_platform/backend/backend/api/features/v1.py (10 conflicts, ~233 lines)
    • autogpt_platform/backend/backend/copilot/baseline/service.py (2 conflicts, ~15 lines)
    • autogpt_platform/backend/backend/copilot/model_test.py (1 conflict, ~5 lines)
    • autogpt_platform/backend/backend/copilot/prompting.py (1 conflict, ~5 lines)
    • autogpt_platform/backend/backend/copilot/sdk/service.py (3 conflicts, ~53 lines)
    • autogpt_platform/backend/backend/copilot/sdk/service_helpers_test.py (1 conflict, ~129 lines)
    • autogpt_platform/backend/backend/copilot/transcript.py (1 conflict, ~11 lines)
    • autogpt_platform/backend/backend/data/credit.py (12 conflicts, ~816 lines)
    • autogpt_platform/backend/backend/data/credit_subscription_test.py (26 conflicts, ~1894 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/PulseChips/usePulseChips.ts (1 conflict, ~13 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/usageHelpers.ts (1 conflict, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/BriefingTabContent.tsx (9 conflicts, ~147 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx (2 conflicts, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/ContextualActionButton/ContextualActionButton.tsx (2 conflicts, ~12 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx (2 conflicts, ~15 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/useSitrepItems.ts (4 conflicts, ~97 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useAgentStatus.ts (2 conflicts, ~10 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts (7 conflicts, ~57 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/types.ts (1 conflict, ~4 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/SubscriptionTierSection.tsx (11 conflicts, ~185 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/__tests__/SubscriptionTierSection.test.tsx (21 conflicts, ~486 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/useSubscriptionTierSection.ts (4 conflicts, ~60 lines)
    • autogpt_platform/frontend/src/app/api/openapi.json (2 conflicts, ~40 lines)
    • docs/integrations/block-integrations/llm.md (7 conflicts, ~35 lines)
    • docs/integrations/block-integrations/misc.md (1 conflict, ~5 lines)
  • feat(platform/copilot): Reduce time to first output #12828 (Pwuts · updated 8d ago)

    • autogpt_platform/backend/backend/api/features/chat/routes.py (3 conflicts, ~75 lines)
    • autogpt_platform/backend/backend/api/features/library/db.py (5 conflicts, ~67 lines)
    • autogpt_platform/backend/backend/api/features/library/model.py (1 conflict, ~4 lines)
    • autogpt_platform/backend/backend/api/features/subscription_routes_test.py (22 conflicts, ~1047 lines)
    • autogpt_platform/backend/backend/api/features/v1.py (10 conflicts, ~233 lines)
    • autogpt_platform/backend/backend/copilot/config.py (1 conflict, ~17 lines)
    • autogpt_platform/backend/backend/copilot/sdk/security_hooks.py (1 conflict, ~12 lines)
    • autogpt_platform/backend/backend/copilot/sdk/service.py (2 conflicts, ~83 lines)
    • autogpt_platform/backend/backend/data/credit.py (12 conflicts, ~816 lines)
    • autogpt_platform/backend/backend/data/credit_subscription_test.py (26 conflicts, ~1894 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/PulseChips/usePulseChips.ts (1 conflict, ~13 lines)
    • autogpt_platform/frontend/src/app/(platform)/copilot/components/usageHelpers.ts (1 conflict, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/BriefingTabContent.tsx (9 conflicts, ~147 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/AgentBriefingPanel/StatsGrid.tsx (2 conflicts, ~9 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/ContextualActionButton/ContextualActionButton.tsx (2 conflicts, ~12 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx (2 conflicts, ~15 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/useSitrepItems.ts (4 conflicts, ~97 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useAgentStatus.ts (2 conflicts, ~10 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/hooks/useLibraryFleetSummary.ts (7 conflicts, ~57 lines)
    • autogpt_platform/frontend/src/app/(platform)/library/types.ts (1 conflict, ~4 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/SubscriptionTierSection.tsx (11 conflicts, ~185 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/__tests__/SubscriptionTierSection.test.tsx (21 conflicts, ~486 lines)
    • autogpt_platform/frontend/src/app/(platform)/profile/(user)/credits/components/SubscriptionTierSection/useSubscriptionTierSection.ts (4 conflicts, ~60 lines)
    • autogpt_platform/frontend/src/app/api/openapi.json (2 conflicts, ~40 lines)
    • docs/integrations/block-integrations/llm.md (7 conflicts, ~35 lines)
    • docs/integrations/block-integrations/misc.md (1 conflict, ~5 lines)
  • fix(frontend/copilot): fix streaming reconnect races, hydration ordering, and reasoning split #12813 (0ubbe · updated 8h ago)

    • 📁 autogpt_platform/frontend/src/app/(platform)/copilot/
      • __tests__/useCopilotStream.test.ts (3 conflicts, ~151 lines)
      • useCopilotStream.ts (1 conflict, ~44 lines)
  • feat(platform): estimate CoPilot turn cost and require approval for high-cost requests #12877 (Rushi-Balapure · updated 3d ago)

    • 📁 autogpt_platform/backend/backend/
      • api/features/chat/routes.py (2 conflicts, ~32 lines)
      • util/feature_flag.py (1 conflict, ~11 lines)
  • fix(copilot): mandate gh auth status check before connect_integration #12852 (tianhaocui · updated 2d ago)

    • 📁 autogpt_platform/backend/backend/copilot/sdk/
      • service.py (1 conflict, ~20 lines)

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 5 conflict(s), 0 medium risk, 2 low risk (out of 7 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 615-631: The captured partial in
_rollback_attempt_capturing_partial currently slices session.messages from
pre_attempt_msg_count and therefore can include an already-appended error marker
inserted by _run_stream_attempt on _HandledStreamError paths (idle timeout /
empty-tool breaker); update _rollback_attempt_capturing_partial to filter out
any trailing error-marker message(s) when building captured (e.g., drop final
messages that match the error-marker shape/flag) so that
_restore_partial_with_error_marker does not replay stale markers or duplicate
them, while still restoring transcript via
transcript_builder.restore(transcript_snap) and returning only the true
assistant work to be replayed on final failure. Ensure the detection logic
matches whatever marker identity _run_stream_attempt uses (type/flag/content) so
it won't remove legitimate assistant messages.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3cfb1bc1-ba5e-400a-bc3a-8eb568adc248

📥 Commits

Reviewing files that changed from the base of the PR and between 06188a8 and 7b66f25.

📒 Files selected for processing (2)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.11)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: type-check (3.13)
  • GitHub Check: Check PR Status
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/AGENTS.md)

autogpt_platform/backend/**/*_test.py: Use pytest with snapshot testing for API responses
Colocate test files with source files using *_test.py naming convention
Mock at boundaries — mock where the symbol is used, not where it's defined; after refactoring, update mock targets to match new module paths
Use AsyncMock from unittest.mock for async functions in tests
When writing tests, use Test-Driven Development (TDD): write failing tests marked with @pytest.mark.xfail before implementation, then remove the marker once the implementation is complete
When creating snapshots in tests, use poetry run pytest path/to/test.py --snapshot-update; always review snapshot changes with git diff before committing

Files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
🧠 Learnings (27)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-04T16:49:42.490Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : When writing tests, use Test-Driven Development (TDD): write failing tests marked with `pytest.mark.xfail` before implementation, then remove the marker once the implementation is complete

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*_test.py : When creating snapshots in tests, use `poetry run pytest path/to/test.py --snapshot-update`; always review snapshot changes with `git diff` before committing

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 25, 2026

Codecov Report

❌ Patch coverage is 95.70815% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.32%. Comparing base (2deac20) to head (220ca3d).
⚠️ Report is 3 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev   #12918      +/-   ##
==========================================
+ Coverage   68.23%   68.32%   +0.08%     
==========================================
  Files        1960     1961       +1     
  Lines      150178   150528     +350     
  Branches    15621    15639      +18     
==========================================
+ Hits       102473   102841     +368     
+ Misses      44664    44640      -24     
- Partials     3041     3047       +6     
Flag Coverage Δ
platform-backend 77.94% <95.53%> (+0.07%) ⬆️
platform-frontend 25.93% <100.00%> (+0.07%) ⬆️
platform-frontend-e2e 29.59% <20.00%> (-0.20%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Platform Backend 77.94% <95.53%> (+0.07%) ⬆️
Platform Frontend 32.66% <100.00%> (+0.05%) ⬆️
AutoGPT Libs ∅ <ø> (∅)
Classic AutoGPT 28.43% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…block type-check

The inline _restore_partial_with_error_marker calls across five retry-loop
branches pushed stream_chat_completion_sdk past pyright's complexity heuristic
(CI type-check failed on main). Consolidate into a single post-loop block keyed
off ended_with_stream_error + the existing attempts_exhausted / transient_exhausted
/ stream_err flags, plus a new handled_error_info tuple that carries
_HandledStreamError's final-yield decision out of the retry loop.

Behaviour is unchanged — same restore semantics, same client-facing StreamError
sequencing, same transcript-upload skip. Confirmed with 319 existing + new
tests (backend/copilot/sdk + baseline).

Pyright still bails on the function body (1500 LoC — the retry loop with
context-overflow fallback + transient backoff + partial-work preservation
shares too much state across branches to split cleanly without hurting
readability). A file-targeted reportGeneralTypeIssues suppression covers the
complexity bailout while keeping real type errors elsewhere in the file
surfaced.
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

615-631: ⚠️ Potential issue | 🟠 Major

Partial still captures pre-appended error markers on _HandledStreamError paths.

captured = session.messages[pre_attempt_msg_count:] will still include the canonical error marker that _run_stream_attempt already appended for the idle-timeout (line 2296) and circuit-breaker (line 2162) branches before raising _HandledStreamError(already_yielded=True). The consolidated restore block at lines 4072-4100 then calls _restore_partial_with_error_marker, which:

  1. Re-extends session.messages with the captured partial (re-inserting the stale marker).
  2. Calls _flush_orphan_tool_uses_to_session — can inject synthesized tool_result rows after that stale marker, producing an invalid assistant(error) → tool_result ordering if the next turn replays history before stream_chat_completion_sdk's start-of-turn marker cleanup runs.
  3. Appends a second copy of the same marker via _append_error_marker.

Result: duplicate error bubbles on idle-timeout / empty-tool-breaker turns, and a transiently malformed sequence if orphans are present. Next-turn cleanup (lines 3117-3125) only trims trailing markers, so a tool_result sandwiched between them leaves the earlier marker in place.

🛠️ Consider stripping trailing markers during capture
 def _rollback_attempt_capturing_partial(
     session: "ChatSession",
     transcript_builder: "TranscriptBuilder",
     transcript_snap: object,
     pre_attempt_msg_count: int,
 ) -> list[ChatMessage]:
@@
-    captured = list(session.messages[pre_attempt_msg_count:])
+    captured = list(session.messages[pre_attempt_msg_count:])
+    while (
+        captured
+        and captured[-1].role == "assistant"
+        and captured[-1].content
+        and (
+            captured[-1].content.startswith(COPILOT_ERROR_PREFIX)
+            or captured[-1].content.startswith(COPILOT_RETRYABLE_ERROR_PREFIX)
+        )
+    ):
+        captured.pop()
     session.messages = session.messages[:pre_attempt_msg_count]
     transcript_builder.restore(transcript_snap)  # type: ignore[arg-type]
     return captured

A targeted regression test simulating an idle-timeout / empty-tool-breaker _HandledStreamError after partial assistant work would lock this down.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 615 -
631, The captured list in _rollback_attempt_capturing_partial is including the
canonical error marker appended earlier on _HandledStreamError paths, so change
_rollback_attempt_capturing_partial to strip trailing error-marker messages from
the captured slice before returning (e.g., trim any trailing assistant
error-marker objects from captured by checking the same predicate used when
appending markers), ensuring session.messages still rolls back to
pre_attempt_msg_count; reference _restore_partial_with_error_marker and
_append_error_marker/_HandledStreamError to validate behavior and add a
regression test that simulates an idle-timeout/empty-tool-breaker
_HandledStreamError after partial assistant output to assert no duplicate or
sandwiched error markers are reintroduced.
🧹 Nitpick comments (2)
autogpt_platform/backend/backend/copilot/sdk/service.py (2)

3070-3088: File-level # pyright: ignore[reportGeneralTypeIssues] violates the no-suppressor rule.

The rationale in the docstring is understood — the function is large and the retry/finalization state is tightly coupled — but the coding guideline forbids # pyright: ignore suppressors. Two viable alternatives:

  • Extract the retry-loop body (and the consolidated failure-finalization block at 4068-4100) into a helper that takes _StreamContext + _RetryState and returns a small result tuple (final_msg, retryable, handled_error_info, etc.). stream_chat_completion_sdk then only orchestrates setup/teardown.
  • Split the finally-block post-turn work (OTEL span teardown, cost reconcile, CLI upload) into a dedicated _finalize_turn helper.

Either extraction shrinks the type-check surface below pyright's heuristic without losing shared state (most of it already lives on _RetryState / _StreamContext).

As per coding guidelines: "Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead" and "Keep functions under ~40 lines; extract named helpers when a function grows longer".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 3070 -
3088, Remove the file-level "# pyright: ignore[reportGeneralTypeIssues]" on
stream_chat_completion_sdk and reduce the function's type complexity by
extracting the big retry-loop body and the consolidated failure/finalization
block into a typed helper (e.g., _run_stream_retry_cycle) that accepts the
existing _StreamContext and _RetryState and returns a small result tuple
(final_message, retryable: bool, handled_error_info, etc.); alternatively move
the OTEL/span teardown, cost reconcile and CLI upload into a dedicated
_finalize_turn helper called from stream_chat_completion_sdk. Ensure the new
helpers have precise signatures and return types so Pyright can type-check them
and that stream_chat_completion_sdk becomes a slim orchestrator delegating to
_run_stream_retry_cycle and/or _finalize_turn.

560-588: Avoid the # noqa: SLF001 linter suppressor here.

Line 577 suppresses ruff's private-member access warning to reach state.adapter._flush_unresolved_tool_calls. The coding guideline explicitly forbids # noqa / # type: ignore / # pyright: ignore comments — the correct fix is to expose a public adapter method (e.g. flush_unresolved_tool_calls) and update the existing call sites at lines 2801 and 4115 to use it as well (those sites access the same private without the suppressor today, so they'd also become compliant).

As per coding guidelines: "Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 560 -
588, Replace the private call to state.adapter._flush_unresolved_tool_calls (and
remove the "# noqa: SLF001") by adding a public adapter method
flush_unresolved_tool_calls that preserves the same behavior and typing (e.g.,
returns a list[StreamBaseResponse] or accepts a mutable list to populate), then
call state.adapter.flush_unresolved_tool_calls() from
_flush_orphan_tool_uses_to_session instead of the private member; also update
the other sites that currently call state.adapter._flush_unresolved_tool_calls
to use the new public flush_unresolved_tool_calls so no linter suppressors are
needed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 615-631: The captured list in _rollback_attempt_capturing_partial
is including the canonical error marker appended earlier on _HandledStreamError
paths, so change _rollback_attempt_capturing_partial to strip trailing
error-marker messages from the captured slice before returning (e.g., trim any
trailing assistant error-marker objects from captured by checking the same
predicate used when appending markers), ensuring session.messages still rolls
back to pre_attempt_msg_count; reference _restore_partial_with_error_marker and
_append_error_marker/_HandledStreamError to validate behavior and add a
regression test that simulates an idle-timeout/empty-tool-breaker
_HandledStreamError after partial assistant output to assert no duplicate or
sandwiched error markers are reintroduced.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 3070-3088: Remove the file-level "# pyright:
ignore[reportGeneralTypeIssues]" on stream_chat_completion_sdk and reduce the
function's type complexity by extracting the big retry-loop body and the
consolidated failure/finalization block into a typed helper (e.g.,
_run_stream_retry_cycle) that accepts the existing _StreamContext and
_RetryState and returns a small result tuple (final_message, retryable: bool,
handled_error_info, etc.); alternatively move the OTEL/span teardown, cost
reconcile and CLI upload into a dedicated _finalize_turn helper called from
stream_chat_completion_sdk. Ensure the new helpers have precise signatures and
return types so Pyright can type-check them and that stream_chat_completion_sdk
becomes a slim orchestrator delegating to _run_stream_retry_cycle and/or
_finalize_turn.
- Around line 560-588: Replace the private call to
state.adapter._flush_unresolved_tool_calls (and remove the "# noqa: SLF001") by
adding a public adapter method flush_unresolved_tool_calls that preserves the
same behavior and typing (e.g., returns a list[StreamBaseResponse] or accepts a
mutable list to populate), then call state.adapter.flush_unresolved_tool_calls()
from _flush_orphan_tool_uses_to_session instead of the private member; also
update the other sites that currently call
state.adapter._flush_unresolved_tool_calls to use the new public
flush_unresolved_tool_calls so no linter suppressors are needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: add0ec08-6ab9-4bf8-8e5d-562ee84f91c0

📥 Commits

Reviewing files that changed from the base of the PR and between 7b66f25 and b1172e2.

📒 Files selected for processing (1)
  • autogpt_platform/backend/backend/copilot/sdk/service.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: type-check (3.13)
  • GitHub Check: type-check (3.11)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: end-to-end tests
  • GitHub Check: Check PR Status
  • GitHub Check: Analyze (typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py

…ial + carry retryable through _HandledStreamError

Two fixes layered on the partial-restore path introduced by this PR:

1. _rollback_attempt_capturing_partial now drops trailing error markers
   (COPILOT_ERROR_PREFIX / COPILOT_RETRYABLE_ERROR_PREFIX) from the
   captured partial. _run_stream_attempt's idle-timeout and
   circuit-breaker paths append a marker via _append_error_marker BEFORE
   raising _HandledStreamError; without this filter the post-loop
   restore would replay the stale marker and then add a fresh one,
   leaving duplicate error bubbles and pushing any synthetic tool_result
   after an assistant(error) turn that has no matching tool_use.

2. Replace the (msg, code, already_yielded) 3-tuple carrying
   _HandledStreamError state out of the retry loop with a frozen
   _HandledErrorInfo dataclass that also carries `retryable`. The
   post-loop block now uses exc.retryable instead of hardcoding True,
   so a future _HandledStreamError(retryable=False, ...) won't silently
   write the wrong marker prefix.

3 new tests cover the rollback marker-stripping contract.
@github-actions github-actions Bot added size/xl and removed size/l labels Apr 25, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 560-577: The early flush currently calls the private adapter
method _flush_unresolved_tool_calls(safety) which mutates resolved_tool_calls
and clears has_unresolved_tool_calls, preventing the later error-cleanup flush
from running and it also suppresses lint with # noqa: SLF001; fix this by
exposing a public adapter API (e.g., flush_unresolved_tool_calls or
flush_unresolved_tool_calls_returning_events) that returns the synthesized
StreamBaseResponse list without flipping has_unresolved_tool_calls (or otherwise
returns the events and leaves state mutation to the caller), update
_flush_orphan_tool_uses_to_session to call the new public method and
capture/return the events for reuse by the later cleanup block, and remove the #
noqa suppressor so the public method is used instead of calling the private
_flush_unresolved_tool_calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0ae1a767-1368-4708-af11-ffcb2e522d50

📥 Commits

Reviewing files that changed from the base of the PR and between b1172e2 and 5406fe9.

📒 Files selected for processing (2)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (typescript)
  • GitHub Check: type-check (3.13)
  • GitHub Check: test (3.13)
  • GitHub Check: test (3.11)
  • GitHub Check: type-check (3.12)
  • GitHub Check: test (3.12)
  • GitHub Check: end-to-end tests
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (38)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:28.641Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:34:02.835Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T07:24:34.302Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/rate_limit.py:0-0
Timestamp: 2026-03-17T07:24:34.302Z
Learning: In `autogpt_platform/backend/backend/copilot/rate_limit.py`, all fail-open `except` blocks catch `(RedisError, ConnectionError, OSError)` specifically — not bare `except Exception`. This applies to `_session_reset_from_ttl`, `get_usage_status`, `check_rate_limit`, and `record_token_usage`. The narrowed tuple ensures only genuine Redis/network failures are swallowed; unexpected exceptions propagate normally.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*.py : Do not use linter suppressors — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code instead

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:22.385Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:22.385Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `_get_session_lock(session_id)` — a redis-py built-in `Lock` (Lua-script atomic acquire/release) keyed as `copilot:session_lock:{session_id}` with `timeout=10s` (crash-safety TTL) and `blocking_timeout=2s`. There is NO manual NX-poll loop and NO `asyncio.Lock`. On Redis failure, `_get_session_lock` logs a warning and yields without a lock — the in-function idempotency check (compare `session.messages[-1].role` and `.content`) still runs as a fallback. Do NOT expect a manual `SET NX` poll loop or `asyncio.Lock` to wrap `append_and_save_message`.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-10T08:39:22.025Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:38.279Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py:530-535
Timestamp: 2026-04-01T04:17:38.279Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`, the `ToolAnnotations(readOnlyHint=True)` annotation (stored as `_PARALLEL_ANNOTATION`) is intentionally applied to ALL registered MCP tools — including E2B write/edit tools (e.g., `write_file`, `edit_file`). This is a parallel-dispatch hint to the Claude Agent SDK CLI, not a semantic read-only contract. The `_READ_ONLY_E2B_TOOLS` set was dead code and was removed in commit `12ae03c`; the constant was renamed from `_READONLY_ANNOTATION` to `_PARALLEL_ANNOTATION` in commit `c88ca88` to avoid confusion. Do not flag this as a correctness issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T21:27:04.525Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12765
File: autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py:227-234
Timestamp: 2026-04-14T21:27:04.525Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py`, the `getattr(client, "graph_driver", None) or getattr(client, "driver", None)` pattern for accessing the Neo4j driver from a `graphiti_core.Graphiti` instance is intentional and correct. `graphiti_core.Graphiti` does not expose `driver` as a stable public property (`dir(Graphiti)` shows no `driver` or `graph_driver` public property); the attribute name has varied across library versions. The fallback chain handles cross-version compatibility. Do NOT flag this as a duck-typing violation. Additionally, soft delete (temporal invalidation), per-UUID success/failure reporting, and episode back-reference cleanup all require raw Cypher queries — the `EntityEdge.delete_by_uuids` batch API does not cover these cases.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-07T10:12:18.517Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12691
File: .claude/skills/orchestrate/SKILL.md:0-0
Timestamp: 2026-04-07T10:12:18.517Z
Learning: In Significant-Gravitas/AutoGPT's Claude skill markdown files under `.claude/skills/orchestrate/`, fenced code blocks in `SKILL.md`-style skill documents may intentionally omit a fenced code language (no `text`, `bash`, etc.). These blocks are used for Claude Code inline pseudocode/conceptual helpers rather than runnable scripts. During reviews, avoid treating MD040 (fenced-code-language) as an issue for these specific skill-format blocks, even if the language identifier is missing, since this omission is expected and has been accepted as a false positive for this skill format.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-13T13:11:09.987Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx:143-145
Timestamp: 2026-04-13T13:11:09.987Z
Learning: In Significant-Gravitas/AutoGPT `autogpt_platform/frontend`, `executionID` values used as URL query params (e.g. `activeItem=` in `SitrepItem.tsx`) are always UUIDs (e.g. `550e8400-e29b-41d4-a716-446655440000`). Their character set `[0-9a-f-]` contains no reserved URL characters, so `encodeURIComponent` or Next.js object-based `href` encoding is unnecessary. Do not flag direct UUID string interpolation into query strings as a URL-encoding issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-30T11:49:37.770Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:16.378Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/transcript_builder.py:30-34
Timestamp: 2026-04-03T11:14:16.378Z
Learning: In `autogpt_platform/backend/backend/copilot/transcript_builder.py` (and its re-export shim at `sdk/transcript_builder.py`), `TranscriptEntry.parentUuid` is typed `str` (not `str | None`) and root entries use `parentUuid=""` (empty string) to match the canonical `_messages_to_transcript` JSONL format. `_parse_entry`, `append_user`, and `append_assistant` all coerce `None` to `""`. Do NOT flag `parentUuid=""` as incorrect — it is the correct root marker. This was fixed in PR `#12623`, commit b753cb7d0b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
…o _InterruptedAttempt

Previous revisions carried the failed-attempt state across three separate
function-scope variables (last_attempt_partial, handled_error_info) + four
module-level helpers (_rollback_attempt_capturing_partial,
_restore_partial_with_error_marker, _flush_orphan_tool_uses_to_session,
_append_error_marker). The retry loop mutated all three and the post-loop
block reassembled the pieces by hand. Scattered and hard to follow.

Collapse to one dataclass with capture / clear / finalize + one
_classify_final_failure helper that picks the display message based on
which failure flag the retry loop set (attempts_exhausted,
transient_exhausted, stream_err, handled_error). Call sites:

  - success break:          interrupted.clear()
  - _HandledStreamError:    interrupted.capture(...); interrupted.handled_error = ...
  - Exception:              interrupted.capture(...)
  - post-loop:              final_msg, retryable = _classify_final_failure(interrupted, ...); interrupted.finalize(...)
  - outer except:           interrupted.finalize(...)

Behaviour is unchanged — same restore semantics, same StreamError
sequencing, same transcript-upload skip, same orphan tool_use flush, same
stale-marker stripping from b1172e2 / 5406fe9. The retry-scenarios
suite (48 integration tests) plus the rewritten interrupted_partial_test
(14 unit tests) both pass; the full SDK test suite (1012 tests) is green.
@github-actions github-actions Bot added size/l and removed size/xl labels Apr 25, 2026
Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

634-650: ⚠️ Potential issue | 🟠 Major

Don't consume the adapter flush before the client cleanup path runs.

_flush_orphan_tool_uses_to_session() calls _flush_unresolved_tool_calls() during restore, which clears has_unresolved_tool_calls before the later error_flush block checks it. That preserves DB/session validity, but it also prevents the StreamToolOutputAvailable cleanup events from being yielded to the client, so interrupted tool widgets/spinners can stay open until refresh. Persist the synthesized tool_result rows without mutating adapter state yet, or return/reuse the generated flush events in the 4146-4161 block instead of flushing twice. As per coding guidelines, "Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead."

Also applies to: 4135-4161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 634 -
650, The current _flush_orphan_tool_uses_to_session calls
state.adapter._flush_unresolved_tool_calls which mutates adapter state (clearing
has_unresolved_tool_calls) and prevents the later error_flush block from
yielding StreamToolOutputAvailable events to the client; instead, change the
flow to synthesize the StreamBaseResponse rows without mutating adapter state:
either add/use a non-mutating helper on the adapter that returns the synthesized
events (e.g., make _flush_unresolved_tool_calls return a list of
StreamBaseResponse or add a new method like _collect_unresolved_tool_calls) and
append/return those events to be consumed by the later error_flush block, or
modify _flush_unresolved_tool_calls to accept a no_mutation flag so
_flush_orphan_tool_uses_to_session can collect responses but defer state changes
until the client cleanup path runs; remove the noqa suppressor and ensure
has_unresolved_tool_calls is only cleared when the actual client-yielding
cleanup path consumes the events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 634-650: The current _flush_orphan_tool_uses_to_session calls
state.adapter._flush_unresolved_tool_calls which mutates adapter state (clearing
has_unresolved_tool_calls) and prevents the later error_flush block from
yielding StreamToolOutputAvailable events to the client; instead, change the
flow to synthesize the StreamBaseResponse rows without mutating adapter state:
either add/use a non-mutating helper on the adapter that returns the synthesized
events (e.g., make _flush_unresolved_tool_calls return a list of
StreamBaseResponse or add a new method like _collect_unresolved_tool_calls) and
append/return those events to be consumed by the later error_flush block, or
modify _flush_unresolved_tool_calls to accept a no_mutation flag so
_flush_orphan_tool_uses_to_session can collect responses but defer state changes
until the client cleanup path runs; remove the noqa suppressor and ensure
has_unresolved_tool_calls is only cleared when the actual client-yielding
cleanup path consumes the events.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 39798136-a2aa-4a01-99b6-c78ed21a48de

📥 Commits

Reviewing files that changed from the base of the PR and between 5406fe9 and 2e7c5fb.

📒 Files selected for processing (2)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: check API types
  • GitHub Check: Seer Code Review
  • GitHub Check: end-to-end tests
  • GitHub Check: test (3.12)
  • GitHub Check: test (3.11)
  • GitHub Check: type-check (3.12)
  • GitHub Check: test (3.13)
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (typescript)
  • GitHub Check: Check PR Status
🧰 Additional context used
📓 Path-based instructions (2)
autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

autogpt_platform/backend/**/*.py: Use poetry run ... command for executing Python package dependencies
Use top-level imports only — avoid local/inner imports except for lazy imports of heavy optional dependencies like openpyxl
Use absolute imports with from backend.module import ... for cross-package imports; single-dot relative imports are acceptable for sibling modules within the same package; avoid double-dot relative imports
Do not use duck typing — avoid hasattr/getattr/isinstance for type dispatch; use typed interfaces/unions/protocols instead
Use Pydantic models over dataclass/namedtuple/dict for structured data
Do not use linter suppressors — no # type: ignore, # noqa, # pyright: ignore; fix the type/code instead
Prefer list comprehensions over manual loop-and-append patterns
Use early return with guard clauses first to avoid deep nesting
Use %s for deferred interpolation in debug log statements for efficiency; use f-strings elsewhere for readability (e.g., logger.debug("Processing %s items", count) vs logger.info(f"Processing {count} items"))
Sanitize error paths by using os.path.basename() in error messages to avoid leaking directory structure
Be aware of TOCTOU (Time-Of-Check-Time-Of-Use) issues — avoid check-then-act patterns for file access and credit charging
Use transaction=True for Redis pipelines to ensure atomicity on multi-step operations
Use max(0, value) guards for computed values that should never be negative
Keep files under ~300 lines; if a file grows beyond this, split by responsibility (extract helpers, models, or a sub-module into a new file)
Keep functions under ~40 lines; extract named helpers when a function grows longer
...

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/{backend,autogpt_libs}/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
🧠 Learnings (44)
📓 Common learnings
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.
📚 Learning: 2026-04-22T05:57:34.861Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T06:48:26.471Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T13:44:34.273Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T11:41:05.877Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T07:35:11.464Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T14:36:25.545Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:28.641Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T07:24:34.302Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12385
File: autogpt_platform/backend/backend/copilot/rate_limit.py:0-0
Timestamp: 2026-03-17T07:24:34.302Z
Learning: In `autogpt_platform/backend/backend/copilot/rate_limit.py`, all fail-open `except` blocks catch `(RedisError, ConnectionError, OSError)` specifically — not bare `except Exception`. This applies to `_session_reset_from_ttl`, `get_usage_status`, `check_rate_limit`, and `record_token_usage`. The narrowed tuple ensures only genuine Redis/network failures are swallowed; unexpected exceptions propagate normally.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:20.824Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:661-679
Timestamp: 2026-04-16T13:28:20.824Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` acquires `_get_session_lock` — a redis-py built-in Lock at key `copilot:session_lock:{session_id}` (timeout=10s, blocking_timeout=2s) — to serialize concurrent writers across replicas. On Redis failure the lock is skipped with a warning and the function continues. Inside the lock it re-fetches the session via `get_chat_session` (cache-first), performs an idempotency check (`session.messages[-1].role == message.role and session.messages[-1].content == message.content`), and returns early if matched. On successful DB write but failed cache write, it calls `invalidate_session_cache(session_id)` (the pre-existing best-effort helper) to evict the stale cache entry so subsequent retries fall back to the authoritative DB. Do NOT expect `asyncio.Lock` or a manual NX poll loop (`copilot:msg_append:{session_id}`) — those were removed. Do NOT flag the `invalidate_session_cache` call on ...

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T12:26:42.571Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:45.569Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-03T11:14:45.569Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, `transcript_builder.append_user(content=message)` is called unconditionally even when the message is a duplicate that was suppressed by the `is_new_message` guard. This is intentional: the downloaded transcript may be stale (uploaded before the previous attempt persisted the message), so always appending the current user turn prevents a malformed assistant-after-assistant transcript structure. The `is_user_message` flag is still checked (`if message and is_user_message:`), so assistant-role inputs are excluded. Do NOT flag this as a bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T12:33:44.990Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-21T17:31:23.683Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12873
File: autogpt_platform/backend/backend/copilot/baseline/reasoning.py:0-0
Timestamp: 2026-04-21T17:31:23.683Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/reasoning.py` (`BaselineReasoningEmitter`), when `render_in_ui=False`, BOTH the `StreamReasoning*` wire events AND the `ChatMessage(role="reasoning")` persistence append must be suppressed together. `convertChatSessionToUiMessages.ts` unconditionally re-renders all persisted `role="reasoning"` rows as `{type:"reasoning"}` UI parts on reload, so persisting rows while silencing live wire events would resurrect the reasoning collapse on page refresh. The audit trail is preserved through the provider transcript and `_format_sdk_content_blocks` (SDK path) instead. The baseline and SDK paths mirror each other: flag off → no live wire event, no persisted row, no hydrated collapse. This was established in PR `#12873`, commit 7ef10b26c.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T01:26:38.257Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:34:02.835Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12774
File: autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py:0-0
Timestamp: 2026-04-14T06:34:02.835Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/e2b_sandbox.py`, the `asyncio.wait_for()` retry loop around `AsyncSandbox.create()` (introduced in PR `#12774`) can leak up to `_SANDBOX_CREATE_MAX_RETRIES - 1` (≤2) orphaned E2B sandboxes per hang incident because `wait_for` cancels only the client-side wait while E2B may complete server-side provisioning. With the default `on_timeout="pause"` lifecycle, leaked orphaned sandboxes are **paused** (not killed) when their original `end_at` is reached and persist indefinitely until explicitly killed — there is NO automatic E2B project-level cleanup. Operators must manage these manually or via their own cleanup jobs. The sandbox_id is not accessible from the timed-out coroutine, so recovery via `AsyncSandbox.connect(sandbox_id)` is not possible at timeout. This is an intentionally accepted trade-off; a proper fix is deferred to a follow-up PR. Do NOT flag the retry loop as a blocking issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-23T00:07:27.117Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-08T17:28:23.439Z
Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: autogpt_platform/backend/AGENTS.md:0-0
Timestamp: 2026-04-08T17:28:23.439Z
Learning: Applies to autogpt_platform/backend/**/*.py : Do not use linter suppressors — no `# type: ignore`, `# noqa`, `# pyright: ignore`; fix the type/code instead

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-16T13:28:22.385Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:22.385Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `_get_session_lock(session_id)` — a redis-py built-in `Lock` (Lua-script atomic acquire/release) keyed as `copilot:session_lock:{session_id}` with `timeout=10s` (crash-safety TTL) and `blocking_timeout=2s`. There is NO manual NX-poll loop and NO `asyncio.Lock`. On Redis failure, `_get_session_lock` logs a warning and yields without a lock — the in-function idempotency check (compare `session.messages[-1].role` and `.content`) still runs as a fallback. Do NOT expect a manual `SET NX` poll loop or `asyncio.Lock` to wrap `append_and_save_message`.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:04.431Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/config.py:0-0
Timestamp: 2026-04-22T11:46:04.431Z
Learning: Do not flag the Claude Sonnet 4.6 model ID as incorrect when it uses the project’s established hyphenated convention: `anthropic/claude-sonnet-4-6`. This hyphen form is the intentional, production convention and should be treated as valid (including in files like llm.py, blocks tests, reasoning.py, `_is_anthropic_model` tests, and config defaults). Note that OpenRouter also accepts the dot variant `anthropic/claude-sonnet-4.6`, so either form may be tolerated, but `anthropic/claude-sonnet-4-6` should be considered the standard to match project usage.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-31T15:37:38.626Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/tools/agent_generator/fixer.py:37-47
Timestamp: 2026-03-31T15:37:38.626Z
Learning: When validating/constructing Anthropic API model IDs in Significant-Gravitas/AutoGPT, allow the hyphen-separated Claude Opus 4.6 model ID `claude-opus-4-6` (it corresponds to `LlmModel.CLAUDE_4_6_OPUS` in `autogpt_platform/backend/backend/blocks/llm.py`). Do NOT require the dot-separated form in Anthropic contexts. Only OpenRouter routing variants should use the dot separator (e.g., `anthropic/claude-opus-4.6`); `claude-opus-4-6` should be treated as correct when passed to Anthropic, and flagged only if it’s used in the OpenRouter path where the dot form is expected.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-22T11:46:12.892Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/baseline/service.py:322-332
Timestamp: 2026-04-22T11:46:12.892Z
Learning: In this codebase (Significant-Gravitas/AutoGPT), OpenRouter-routed Anthropic model IDs should use the hyphen-separated convention (e.g., `anthropic/claude-sonnet-4-6`, `anthropic/claude-opus-4-6`). Although OpenRouter may accept both hyphen and dot variants, treat the hyphen-separated form as the intended, correct codebase-wide convention and do not flag it as an error. Only flag the dot-separated variant (e.g., `anthropic/claude-sonnet-4.6`) as incorrect when reviewing/validating model ID strings for OpenRouter-routed Anthropic models.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-10T08:39:22.025Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:41.600Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:41.600Z
Learning: When reviewing AutoGPT Copilot tool implementations, accept that `readOnlyHint=True` (provided via `ToolAnnotations`) may be applied unconditionally to *all* tools—even tools that have side effects (e.g., `bash_exec`, `write_workspace_file`, or other write/save operations). Do **not** flag these tools for having `readOnlyHint=True`; this is intentional to enable fully-parallel dispatch by the Anthropic SDK/CLI and has been E2E validated. Only flag `readOnlyHint` issues if they conflict with the established `ToolAnnotations` behavior (e.g., missing/incorrect propagation relative to the intended annotation mechanism).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T04:17:38.279Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py:530-535
Timestamp: 2026-04-01T04:17:38.279Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py`, the `ToolAnnotations(readOnlyHint=True)` annotation (stored as `_PARALLEL_ANNOTATION`) is intentionally applied to ALL registered MCP tools — including E2B write/edit tools (e.g., `write_file`, `edit_file`). This is a parallel-dispatch hint to the Claude Agent SDK CLI, not a semantic read-only contract. The `_READ_ONLY_E2B_TOOLS` set was dead code and was removed in commit `12ae03c`; the constant was renamed from `_READONLY_ANNOTATION` to `_PARALLEL_ANNOTATION` in commit `c88ca88` to avoid confusion. Do not flag this as a correctness issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T21:27:04.525Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12765
File: autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py:227-234
Timestamp: 2026-04-14T21:27:04.525Z
Learning: In `autogpt_platform/backend/backend/copilot/tools/graphiti_forget.py`, the `getattr(client, "graph_driver", None) or getattr(client, "driver", None)` pattern for accessing the Neo4j driver from a `graphiti_core.Graphiti` instance is intentional and correct. `graphiti_core.Graphiti` does not expose `driver` as a stable public property (`dir(Graphiti)` shows no `driver` or `graph_driver` public property); the attribute name has varied across library versions. The fallback chain handles cross-version compatibility. Do NOT flag this as a duck-typing violation. Additionally, soft delete (temporal invalidation), per-UUID success/failure reporting, and episode back-reference cleanup all require raw Cypher queries — the `EntityEdge.delete_by_uuids` batch API does not cover these cases.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-09T10:50:43.907Z
Learnt from: Bentlybro
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-03-09T10:50:43.907Z
Learning: Repo: Significant-Gravitas/AutoGPT — File: autogpt_platform/backend/backend/blocks/llm.py
For xAI Grok models accessed via OpenRouter, the API returns `null` for `max_completion_tokens`. The convention in this codebase is to use the model's context window size as the `max_output_tokens` value in ModelMetadata. For example, Grok 3 uses 131072 (128k) and Grok 4 uses 262144 (256k). Do not flag these as incorrect max output token values.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-17T10:57:12.953Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-07T10:12:18.517Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12691
File: .claude/skills/orchestrate/SKILL.md:0-0
Timestamp: 2026-04-07T10:12:18.517Z
Learning: In Significant-Gravitas/AutoGPT's Claude skill markdown files under `.claude/skills/orchestrate/`, fenced code blocks in `SKILL.md`-style skill documents may intentionally omit a fenced code language (no `text`, `bash`, etc.). These blocks are used for Claude Code inline pseudocode/conceptual helpers rather than runnable scripts. During reviews, avoid treating MD040 (fenced-code-language) as an issue for these specific skill-format blocks, even if the language identifier is missing, since this omission is expected and has been accepted as a false positive for this skill format.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-13T13:11:09.987Z
Learnt from: 0ubbe
Repo: Significant-Gravitas/AutoGPT PR: 12764
File: autogpt_platform/frontend/src/app/(platform)/library/components/SitrepItem/SitrepItem.tsx:143-145
Timestamp: 2026-04-13T13:11:09.987Z
Learning: In Significant-Gravitas/AutoGPT `autogpt_platform/frontend`, `executionID` values used as URL query params (e.g. `activeItem=` in `SitrepItem.tsx`) are always UUIDs (e.g. `550e8400-e29b-41d4-a716-446655440000`). Their character set `[0-9a-f-]` contains no reserved URL characters, so `encodeURIComponent` or Next.js object-based `href` encoding is unnecessary. Do not flag direct UUID string interpolation into query strings as a URL-encoding issue.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-30T11:49:37.770Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12604
File: autogpt_platform/backend/backend/copilot/sdk/security_hooks.py:165-171
Timestamp: 2026-03-30T11:49:37.770Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/security_hooks.py`, the `web_search_count` and `total_tool_call_count` circuit-breaker counters in `create_security_hooks` are intentionally per-turn (closure-local), not per-session. Hooks are recreated per stream invocation in `service.py`, so counters reset each turn. This is an accepted v1 design: it caps a single runaway turn (incident d2f7cba3: 179 WebSearch calls, $20.66). True per-session persistence via Redis is deferred to a later iteration. Do not flag these as a per-session vs. per-turn mismatch bug.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-26T07:00:03.405Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12574
File: autogpt_platform/backend/backend/copilot/sdk/transcript.py:980-990
Timestamp: 2026-03-26T07:00:03.405Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/transcript.py`, `_rechain_tail` intentionally rewrites `parentUuid` for **all** tail entries (not just the first), because a single assistant turn can span multiple consecutive JSONL entries sharing the same `message.id` (e.g., a thinking entry + a tool_use entry). Their original `parentUuid` values may reference entries that were absorbed into the compressed prefix, so sequential rechaining of the entire tail is required to maintain a valid parent→child graph. The test `test_chains_multiple_tail_entries` validates this: the second tail entry's `parentUuid` is rewritten from its original value to the uuid of the first tail entry.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T11:14:16.378Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12623
File: autogpt_platform/backend/backend/copilot/transcript_builder.py:30-34
Timestamp: 2026-04-03T11:14:16.378Z
Learning: In `autogpt_platform/backend/backend/copilot/transcript_builder.py` (and its re-export shim at `sdk/transcript_builder.py`), `TranscriptEntry.parentUuid` is typed `str` (not `str | None`) and root entries use `parentUuid=""` (empty string) to match the canonical `_messages_to_transcript` JSONL format. `_parse_entry`, `append_user`, and `append_assistant` all coerce `None` to `""`. Do NOT flag `parentUuid=""` as incorrect — it is the correct root marker. This was fixed in PR `#12623`, commit b753cb7d0b.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-01T14:54:01.937Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12636
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-01T14:54:01.937Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), `claude_agent_max_transient_retries` (default=3) in `ChatConfig` counts **total attempts including the initial one**, not the number of extra retries. With the pre-incremented `transient_retries >= max_transient` guard in `service.py`, a value of 3 yields 3 total stream attempts (initial + 2 retries with exponential backoff: 1s, 2s). Do NOT flag this as an off-by-one — the `>=` check is intentional.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T17:00:02.827Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12439
File: autogpt_platform/backend/backend/blocks/autogpt_copilot.py:0-0
Timestamp: 2026-03-16T17:00:02.827Z
Learning: In autogpt_platform/backend/backend/blocks/autogpt_copilot.py, the recursion guard uses two module-level ContextVars: `_copilot_recursion_depth` (tracks current nesting depth) and `_copilot_recursion_limit` (stores the chain-wide ceiling). On the first invocation, `_copilot_recursion_limit` is set to `max_recursion_depth`; nested calls use `min(inherited_limit, max_recursion_depth)`, so they can only lower the cap, never raise it. The entry/exit logic is extracted into module-level helper functions. This is the approved pattern for preventing runaway sub-agent recursion in AutogptCopilotBlock (PR `#12439`, commits 348e9f8e2 and 3b70f61b1).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:30:30.764Z
Learnt from: Abhi1992002
Repo: Significant-Gravitas/AutoGPT PR: 12417
File: autogpt_platform/backend/backend/blocks/agent_mail/pods.py:62-74
Timestamp: 2026-03-16T16:30:30.764Z
Learning: In autogpt_platform/backend/backend/blocks/**/*.py, explicit try/except in the `run()` method is NOT required for standard error handling. The block framework's `_execute()` method in `_base.py` catches unhandled exceptions and re-raises them as `BlockExecutionError` or `BlockUnknownError`. Additionally, when a block yields `("error", message)`, `_execute()` immediately raises `BlockExecutionError` — so the `error` output port never propagates downstream. Explicit try/except is only needed when partial output must be controlled (e.g., attachment blocks that must skip yielding `content_base64` on failure).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-03T13:53:33.653Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12206
File: autogpt_platform/backend/snapshots/v2_unhandled_exception_500:1-5
Timestamp: 2026-04-03T13:53:33.653Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the catch-all `Exception` handler in `autogpt_platform/backend/backend/api/utils/exceptions.py` (`_handle_error()`) intentionally surfaces `str(exc)` as the `detail` field in HTTP 500 responses for non-Prisma errors. This is by design: errors are logged server-side, and the detail helps API consumers report issues. Only `PrismaError` responses are sanitized (see commit ce6910b4a). Do not flag `str(exc)` in the generic 500 handler as an information disclosure issue; the snapshot `autogpt_platform/backend/snapshots/v2_unhandled_exception_500` ("connection refused") correctly reflects this behavior.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T22:49:10.465Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 11235
File: autogpt_platform/frontend/src/app/(platform)/admin/diagnostics/components/ExecutionsTable.tsx:0-0
Timestamp: 2026-04-15T22:49:10.465Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform/frontend), `Sentry.captureException` is NOT required in `catch` blocks for React Query mutation error paths. React Query already handles error propagation internally, and the correct pattern is: toast notifications for mutation errors, ErrorCard for render/fetch errors. Only add `Sentry.captureException` for truly manual/unexpected exception paths outside of React Query's scope (e.g., standalone async utilities, event handlers not wired through React Query).

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-14T06:39:52.592Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/frontend/src/app/api/openapi.json:12803-12806
Timestamp: 2026-04-14T06:39:52.592Z
Learning: Repo: Significant-Gravitas/AutoGPT — autogpt_platform
Intentional message length caps:
- StreamChatRequest.message maxLength = 64000.
- QueuePendingMessageRequest.message maxLength = 32000 (matches PendingMessage.content).
Rationale: both feed the same LLM context window; pending must not exceed stream, and larger ceilings replace legacy 4000/16000.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-02-26T17:02:22.448Z
Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-04T08:04:35.881Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-05T15:42:08.207Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-03-16T16:35:40.236Z
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/api/features/workflow_import.py:54-63
Timestamp: 2026-03-16T16:35:40.236Z
Learning: Avoid using the word 'competitor' in public-facing identifiers and text. Use neutral naming for API paths, model names, function names, and UI text. Examples: rename 'CompetitorFormat' to 'SourcePlatform', 'convert_competitor_workflow' to 'convert_workflow', '/competitor-workflow' to '/workflow'. Apply this guideline to files under autogpt_platform/backend and autogpt_platform/frontend.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
📚 Learning: 2026-04-15T02:43:36.890Z
Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12780
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-15T02:43:36.890Z
Learning: When reviewing Python exception handlers, do not flag `isinstance(e, X)` checks as dead/unreachable if the caught exception `X` is a subclass of the exception type being handled. For example, if `X` (e.g., `VirusScanError`) inherits from `ValueError` (directly or via an intermediate class) and it can be raised within an `except ValueError:` block, then `isinstance(e, X)` inside that handler is reachable and should not be treated as dead code.

Applied to files:

  • autogpt_platform/backend/backend/copilot/sdk/service.py
🔇 Additional comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

562-631: Good consolidation of interrupted-attempt state.

Centralizing rollback/restore into _InterruptedAttempt and carrying retryable / already_yielded via _HandledErrorInfo makes the final-failure path much easier to reason about, and it closes the stale-marker + handled-error contract gaps cleanly.

Also applies to: 2137-2149, 4045-4050, 4188-4195

…final-failure emit

CodeRabbit flagged that _flush_orphan_tool_uses_to_session (called from
_InterruptedAttempt.finalize) used state.adapter._flush_unresolved_tool_calls
with a # noqa: SLF001 suppressor. The private call mutates resolved_tool_calls
and flips has_unresolved_tool_calls to False, which caused the downstream
error-cleanup block at lines 4185-4200 to skip its own flush — UI spinners
on the client stayed open until page refresh because no cleanup events were
yielded after the early flush swallowed the unresolved state.

Changes:
- Rename _flush_unresolved_tool_calls → flush_unresolved_tool_calls (public)
  in response_adapter.py; update 3 internal call sites + 2 service.py sites.
  Drops the # noqa: SLF001 suppressor (no longer a private-access violation).
- _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now
  return the list[StreamBaseResponse] produced by the flush so the caller
  yields them to the client instead of re-flushing.
- Replace the three scattered post-loop error blocks (partial restore +
  redundant flush + stream_err yield + handled_error yield) with one
  consolidated block that: (a) calls _classify_final_failure → _FinalFailure,
  (b) yields finalize()'s events + _end_text_if_open, (c) yields one
  StreamError (unless handled_error.already_yielded=True). Fixes the
  double-flush skip-cleanup bug and eliminates duplicated error-text/code
  strings between history marker and SSE yield.
- _classify_final_failure now returns _FinalFailure(display_msg, code,
  retryable) instead of a (msg, retryable) tuple — single source of truth
  for in-history marker + SSE event so they can't drift.

Tests: +5 _classify_final_failure contract tests, +2 return-value assertions
on finalize/orphan-flush. All 1022 SDK tests pass (was 1012).
@github-actions github-actions Bot removed the size/l label Apr 25, 2026
@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Apr 25, 2026

🤖 @coderabbitai — the duplicate-comments section in this review (line 634-650: "Don't consume the adapter flush before the client cleanup path runs") was already addressed in 6576bf561:\n\n- _flush_unresolved_tool_calls renamed to flush_unresolved_tool_calls (public) and all call sites updated. The # noqa: SLF001 suppressor is gone.\n- _flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize now return the list[StreamBaseResponse] they produce. The post-loop caller yields those events directly instead of re-flushing — no more double-flush, no skipped UI cleanup.\n- The three former post-loop blocks (partial restore + redundant re-flush + two separate yield StreamError sites) collapsed to one consolidated block driven by _classify_final_failure returning a _FinalFailure(display_msg, code, retryable) dataclass. History marker and SSE yield now share one source of truth.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

Tip

For best results, initiate chat on the files or code changes.

@majdyz: Thanks for the detailed breakdown — the refactor is a clean resolution on all three axes:

  1. Public flush methodflush_unresolved_tool_calls (public) removes the # noqa: SLF001 workaround entirely.
  2. Return-value propagation_flush_orphan_tool_uses_to_session and _InterruptedAttempt.finalize returning list[StreamBaseResponse] eliminates the double-flush and guarantees the UI cleanup events aren't silently dropped.
  3. Single source of truth — collapsing the three post-loop blocks into one _classify_final_failure_FinalFailure path ensures the history marker and SSE StreamError always stay in sync.

✏️ Learnings added
Learnt from: majdyz
URL: https://github.com/Significant-Gravitas/AutoGPT/pull/12918

Timestamp: 2026-04-25T02:53:53.938Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (PR `#12918`, commit 6576bf561):
- `_flush_unresolved_tool_calls` was renamed to `flush_unresolved_tool_calls` (public); all call sites updated, `# noqa: SLF001` suppressor removed.
- `_flush_orphan_tool_uses_to_session` and `_InterruptedAttempt.finalize` both return `list[StreamBaseResponse]`; the post-loop caller yields those events directly to avoid double-flush and skipped UI cleanup events.
- The three former post-loop blocks (partial restore + redundant re-flush + two separate `yield StreamError` sites) are collapsed into a single block driven by `_classify_final_failure` returning a `_FinalFailure(display_msg, code, retryable)` dataclass, so history marker and SSE yield share one source of truth.
Do NOT flag double-flush risk or mismatched history/SSE marker as issues in the post-loop section of `stream_chat_completion_sdk`.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12440
File: autogpt_platform/backend/backend/copilot/workflow_import/converter.py:0-0
Timestamp: 2026-03-17T10:57:12.953Z
Learning: In Significant-Gravitas/AutoGPT PR `#12440`, `autogpt_platform/backend/backend/copilot/workflow_import/converter.py` was fully rewritten (commit 732960e2d) to no longer make direct LLM/OpenAI API calls. The converter now builds a structured text prompt for AutoPilot/CoPilot instead. There is no `response.choices` access or any direct LLM client usage in this file. Do not flag `response.choices` access or LLM client initialization patterns as issues in this file.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12356
File: autogpt_platform/backend/backend/copilot/constants.py:9-12
Timestamp: 2026-03-10T08:39:22.025Z
Learning: In Significant-Gravitas/AutoGPT PR `#12356`, the `COPILOT_SYNTHETIC_ID_PREFIX = "copilot-"` check in `create_auto_approval_record` (human_review.py) is intentional and safe. The `graph_exec_id` passed to this function comes from server-side `PendingHumanReview` DB records (not from user input); the API only accepts `node_exec_id` from users. Synthetic `copilot-*` IDs are only ever created server-side in `run_block.py`. The prefix skip avoids a DB lookup for a `AgentGraphExecution` record that legitimately does not exist for CoPilot sessions, while `user_id` scoping is enforced at the auth layer and on the resulting auto-approval record.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12879
File: autogpt_platform/backend/backend/copilot/baseline/service.py:0-0
Timestamp: 2026-04-22T05:57:34.861Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py`, the approved pattern for `_run_task_subagent` (PR `#12879`, commit 187f0a5) uses a nested `try/except Exception` inside an outer `try/finally`. The outer `finally` block resets `_TASK_DEPTH_VAR` (via `_TASK_DEPTH_VAR.reset(token)`) AND calls `_absorb_inner_usage(parent_state, inner_state)` unconditionally, so both the depth ContextVar and usage roll-up are guaranteed on all exit paths including `CancelledError`/`KeyboardInterrupt`/`SystemExit`. The inner `except Exception` catches and converts failures into a `TaskResponse` error payload that is returned as `StreamToolOutputAvailable`. Do NOT flag missing ContextVar reset or usage roll-up on BaseException paths in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12796
File: autogpt_platform/backend/backend/api/features/chat/routes.py:504-527
Timestamp: 2026-04-16T12:33:44.990Z
Learning: In `autogpt_platform/backend/backend/api/features/chat/routes.py`, `get_session` (PR `#12796`, commit 3771bfad9c1) closes the TOCTOU race between the initial `stream_registry.get_active_session()` pre-check and `get_chat_messages_paginated()` with a post-check re-verification: after the DB fetch, if `is_initial_load and active_session is not None`, it calls `get_active_session` a second time; if `post_active is None` (stream completed during the window), it resets `from_start=True`, `forward_paginated=True`, and re-fetches messages from sequence 0. Do NOT flag the double `get_active_session` call pattern as redundant — it is the intentional TOCTOU mitigation for pagination direction selection.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12445
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1071-1072
Timestamp: 2026-03-17T06:48:26.471Z
Learning: In Significant-Gravitas/AutoGPT (autogpt_platform), the AI SDK enforces `z.strictObject({type, errorText})` on SSE `StreamError` responses, so additional fields like `retryable: bool` cannot be added to `StreamError` or serialized via `to_sse()`. Instead, retry signaling for transient Anthropic API errors is done via the `COPILOT_RETRYABLE_ERROR_PREFIX` constant prepended to persisted session messages (in `ChatMessage.content`). The frontend detects retryable errors by checking `markerType === "retryable_error"` from `parseSpecialMarkers()` — no SSE schema changes and no string matching on error text. This pattern was established in PR `#12445`, commit 64d82797b.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12797
File: autogpt_platform/backend/backend/copilot/sdk/service.py:1991-2021
Timestamp: 2026-04-15T13:44:34.273Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py` (`_run_stream_attempt`), the pre-create block (PR `#12797`) intentionally does NOT call `state.transcript_builder.append_assistant(...)` when inserting the empty assistant placeholder into `ctx.session.messages`. The transcript is left ending at the `tool_result` entry (N entries) while `message_count` metadata is N+1. This mismatch is benign and deliberate: on the next `--resume`, the SDK sees the transcript ending at `tool_result` and correctly regenerates the assistant response. Pre-appending the assistant turn to the transcript would suppress regeneration while leaving `session.messages[-1].content = ""` permanently (worse outcome). On the gap-fallback path, `transcript_msg_count (N+1) >= msg_count-1 (N)` means no gap is injected for the empty placeholder, which is correct because injecting an empty assistant message as context would mislead the SDK. Do NOT flag this transcript/message_count discrepancy as a bug.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12701
File: .claude/skills/orchestrate/scripts/verify-complete.sh:120-121
Timestamp: 2026-04-07T18:08:03.548Z
Learning: In Significant-Gravitas/AutoGPT, verify-complete.sh (`.claude/skills/orchestrate/scripts/verify-complete.sh`) uses `commits[-1].committedDate` (not `updatedAt`) to identify stale CHANGES_REQUESTED reviews. This is intentional: `updatedAt` changes on any PR activity (bot comments, label changes, description edits), which would falsely classify a reviewer's CHANGES_REQUESTED as stale — a silent false negative. The `committedDate` edge case (commit created locally before a review but pushed after) only causes a false positive (unnecessary re-brief), which is the safer failure mode. Do not suggest switching to `updatedAt` for this comparison.

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12284
File: autogpt_platform/frontend/src/app/api/openapi.json:11897-11900
Timestamp: 2026-03-04T23:58:18.476Z
Learning: Repo: Significant-Gravitas/AutoGPT — PR `#12284`
Backend/frontend OpenAPI codegen convention: In backend/api/features/store/model.py, the StoreSubmission and StoreSubmissionAdminView models define submitted_at: datetime | None, changes_summary: str | None, and instructions: str | None with no default. This is intentional to produce “required but nullable” fields in OpenAPI (properties appear in required[] and use anyOf [type, null]). This matches Prisma’s submittedAt DateTime? and changesSummary String?. Do not flag this as a required/nullable mismatch.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12632
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:0-0
Timestamp: 2026-04-01T04:17:43.495Z
Learning: In autogpt_platform/backend/backend/copilot/sdk/tool_adapter.py (PR `#12632`, commit 12ae03c), the per-tool `BaseTool.read_only` property approach was removed. Instead, `readOnlyHint=True` (via `ToolAnnotations`) is applied unconditionally to ALL tools — including side-effect tools like `bash_exec` and `write_workspace_file` — to enable fully parallel dispatch by the Anthropic SDK/CLI. Do not flag tools with mutating operations (e.g. save_to_path, write operations) for having `readOnlyHint=True`; this is intentional and E2E validated (3x bash_exec(sleep 3) completed in 3.3s vs 9s sequential).

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-21T11:41:05.877Z
Learning: In `autogpt_platform/backend/backend/copilot/baseline/service.py` (PR `#12870`, commits 080d42b9d and 3d7b38162), the `_close_reasoning_block_if_open(state)` helper centralises all four reasoning-block-close call sites (text branch, tool_calls branch, stream-end, exception path). The outer `finally` block of `_baseline_llm_caller` calls this helper plus stripper flush + `StreamTextEnd` to guarantee matched end events are emitted before `StreamFinishStep` on both normal and exception paths. Do NOT flag duplicated close logic or missing reasoning-end-on-exception as issues in this function.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/blocks/autopilot.py:631-638
Timestamp: 2026-04-14T07:35:11.464Z
Learning: In `autogpt_platform/backend/backend/copilot/executor/utils.py`, `CoPilotExecutionEntry` includes a `permissions: CopilotPermissions | None` field (added in PR `#12773` / commit a0184c87b9). `enqueue_copilot_turn` accepts and serializes this field into the queue entry, `_enqueue_for_recovery` in `autopilot.py` accepts and forwards `permissions` to `enqueue_copilot_turn`, and `_execute_async` in `processor.py` restores `entry.permissions` and passes it into `stream_chat_completion_sdk`/`stream_chat_completion_baseline` via `set_execution_context`. This ensures recovered sub-agent turns respect the same tool/block permission ceiling as the original in-process execution (mirroring `_merge_inherited_permissions`). Do NOT flag recovered turns as losing their permission ceiling — it is now fully propagated through the queue.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12773
File: autogpt_platform/backend/backend/copilot/pending_messages.py:52-64
Timestamp: 2026-04-14T14:36:25.545Z
Learning: In `autogpt_platform/backend/backend/copilot` (PR `#12773`, commit d7bced0c6): when draining pending messages into `session.messages`, each message's text is sanitized via `strip_user_context_tags` before persistence to prevent user-controlled `<user_context>` injection from bypassing the trusted server-side context prefix. Additionally, if `upsert_chat_session` fails after draining, the drained `PendingMessage` objects are requeued back to Redis to avoid silent message loss. Do NOT flag the drain-then-requeue pattern as redundant — it is the intentional failure-resilience strategy for the pending buffer.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T00:07:27.117Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, background tasks that persist cost or emit Langfuse backfill (e.g. the cost-reconcile task) must be anchored to `_background_tasks` using `_background_tasks.add(task)` and `task.add_done_callback(_background_tasks.discard)`, mirroring the existing pattern at lines 3063 / 4232 / 4256. This prevents the asyncio task from being garbage-collected before persistence or Langfuse emission completes. Do NOT flag the absence of this anchoring as acceptable in this file. Established in PR `#12889` commit 5ce3d0388.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 0
File: :0-0
Timestamp: 2026-04-23T01:26:38.257Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `langfuse_trace_id = get_client().get_current_trace_id()` must be captured under the `if _lf_span is not None:` guard (before `_lf_span` is torn down), NOT under `if _otel_ctx is not None:`. The `_otel_ctx` guard is too narrow: if `propagate_attributes().__enter__()` raises, `_otel_ctx` is never assigned, and placing the trace-id capture there would silently orphan the `openrouter-cost-reconcile` event from its parent span. Established in PR `#12889` commit d243bf6c9.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12881
File: autogpt_platform/backend/backend/copilot/sdk/service.py:0-0
Timestamp: 2026-04-22T12:26:42.571Z
Learning: In `autogpt_platform/backend/backend/copilot/sdk/service.py`, `_resolve_sdk_model_for_request`: when a per-user LaunchDarkly model value fails `_normalize_model_name` (e.g. a `moonshotai/kimi-*` slug in direct-Anthropic mode), the fallback must be tier-specific — `config.thinking_advanced_model` for advanced tier, `config.thinking_standard_model` for standard tier — NOT the generic `_resolve_sdk_model()` (which is standard-only and returns None under subscription mode). If the tier-specific config default also fails `_normalize_model_name`, re-raise the original LD error; this is a deployment-level misconfiguration that `model_validator` should have caught at startup. Established in PR `#12881` commit 637d2fef5.

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12814
File: autogpt_platform/backend/backend/copilot/model.py:0-0
Timestamp: 2026-04-16T13:28:28.641Z
Learning: In `autogpt_platform/backend/backend/copilot/model.py` (PR `#12814`, commit 259d37083): `append_and_save_message` uses `async with _get_session_lock(session_id)` — the same shared context manager used across the module — which internally acquires `redis-py`'s built-in `Lock` (key `copilot:session_lock:{session_id}`, timeout=10s, blocking_timeout=2s) via an atomic Lua-script. Lock release is also owner-verified via Lua so a slow pod can never delete a lock it no longer holds. On Redis failure the lock is skipped with a warning; the in-function idempotency check (`session.messages[-1].role` and `.content` comparison) still runs as a fallback. Do NOT expect a raw `redis.set(nx=True)` / `redis.delete()` pattern here — that intermediate approach was replaced in commit 259d37083.

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

4114-4152: ⚠️ Potential issue | 🟠 Major

Keep mid-stream transient failures retryable.

When Line 4114 short-circuits because content was already streamed, the transient branch below never runs. A mid-stream ECONNRESET/429 then falls through to _classify_final_failure(..., stream_err=...), which persists a non-retryable sdk_stream_error marker instead of the retryable marker the frontend expects for “Try again”.

💡 Suggested fix
                 if events_yielded > 0:
                     # Events were already sent to the frontend and cannot be
                     # unsent.  Retrying would produce duplicate/inconsistent
                     # output, so treat this as a final error.
                     logger.warning(
                         "%s Not retrying — %d events already yielded",
                         log_prefix,
                         events_yielded,
                     )
+                    if is_transient:
+                        interrupted.handled_error = _HandledErrorInfo(
+                            error_msg=FRIENDLY_TRANSIENT_MSG,
+                            code="transient_api_error",
+                            retryable=True,
+                            already_yielded=False,
+                        )
                     skip_transcript_upload = True
                     ended_with_stream_error = True
                     break

Based on learnings, retry signaling for transient Anthropic API errors is done via COPILOT_RETRYABLE_ERROR_PREFIX in persisted session messages, and the frontend derives markerType === "retryable_error" from that marker.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 4114 -
4152, The current early-exit when events_yielded > 0 prevents transient-error
retry logic from running and causes mid-stream transient failures to be marked
non-retryable; update the control flow so that transient errors are handled even
if events_yielded > 0: when is_transient is true (and even if events_yielded >
0) call _next_transient_backoff(...) and, if backoff is returned, run the async
backoff loop via _do_transient_backoff(...) and continue retrying instead of
immediately setting skip_transcript_upload/ended_with_stream_error; only fall
through to the non-retryable final error path if transient retries are exhausted
(set transient_exhausted and persist the retryable marker using the same retry
signaling used elsewhere, e.g., COPILOT_RETRYABLE_ERROR_PREFIX), keeping
references to events_yielded, is_transient, _next_transient_backoff,
_do_transient_backoff, and transient_exhausted to locate the changes.
♻️ Duplicate comments (1)
autogpt_platform/backend/backend/copilot/sdk/service.py (1)

4287-4293: ⚠️ Potential issue | 🟠 Major

Don't persist "Operation cancelled" after a successful turn.

After Line 4029 clears interrupted on success, this block still calls interrupted.finalize(...) for any later CancelledError/disconnect. If that happens while yielding StreamUsage or trailing events, the turn can finish successfully and still get a persisted cancellation marker appended on refresh.

💡 Suggested fix
-        if not ended_with_stream_error:
-            interrupted.finalize(session, state, display_msg, retryable=is_transient)
+        if not ended_with_stream_error:
+            has_interrupted_state = (
+                bool(interrupted.partial) or interrupted.handled_error is not None
+            )
+            if has_interrupted_state:
+                interrupted.finalize(session, state, display_msg, retryable=is_transient)
+            elif not isinstance(e, asyncio.CancelledError) and not _is_sdk_disconnect_error(e):
+                _append_error_marker(session, display_msg, retryable=is_transient)
             logger.debug(
                 "%s Appended error marker, will be persisted in finally",
                 log_prefix,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/service.py` around lines 4287 -
4293, The finalize call unconditionally persists a cancellation marker even when
interrupted was cleared earlier on success; change the guard so we only call
interrupted.finalize(...) when interrupted still represents a pending
cancellation (e.g. wrap the existing if not ended_with_stream_error: with an
additional check like if interrupted and interrupted.is_pending()/not
interrupted.is_cleared(): interrupted.finalize(session, state, display_msg,
retryable=is_transient)), using the object's existing truthiness or its status
method (is_pending/is_cleared) to locate the correct condition in the code paths
that touch interrupted and finalize.
🧹 Nitpick comments (1)
autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py (1)

39-57: Optional: tighten type hints on test helpers.

_tool_output's output param, _adapter_with_unresolved's return type, and the inner _flush(out: list) element type are unannotated. Adds minimal value but keeps the helpers self-documenting and consistent with the typed helpers below.

♻️ Proposed refinement
-def _tool_output(tool_call_id: str, output) -> StreamToolOutputAvailable:
+def _tool_output(
+    tool_call_id: str, output: str | dict[str, object]
+) -> StreamToolOutputAvailable:
     return StreamToolOutputAvailable(
         toolCallId=tool_call_id, toolName="t", output=output
     )


-def _adapter_with_unresolved(responses: list[StreamToolOutputAvailable]):
+def _adapter_with_unresolved(
+    responses: list[StreamToolOutputAvailable],
+) -> MagicMock:
     """Stub _RetryState whose adapter flushes the given responses."""
     adapter = MagicMock()
     adapter.has_unresolved_tool_calls = bool(responses)

-    def _flush(out: list) -> None:
+    def _flush(out: list[StreamToolOutputAvailable]) -> None:
         out.extend(responses)
         adapter.has_unresolved_tool_calls = False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py`
around lines 39 - 57, Annotate the test helpers: import typing.Any and
typing.List if not already, change _tool_output signature to def
_tool_output(tool_call_id: str, output: Any) -> StreamToolOutputAvailable,
annotate _adapter_with_unresolved to return a MagicMock (def
_adapter_with_unresolved(responses: list[StreamToolOutputAvailable]) ->
MagicMock) and type the inner flush parameter as def _flush(out:
List[StreamToolOutputAvailable]) -> None; this keeps helpers self-documenting
and consistent with the typed helpers below.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 4114-4152: The current early-exit when events_yielded > 0 prevents
transient-error retry logic from running and causes mid-stream transient
failures to be marked non-retryable; update the control flow so that transient
errors are handled even if events_yielded > 0: when is_transient is true (and
even if events_yielded > 0) call _next_transient_backoff(...) and, if backoff is
returned, run the async backoff loop via _do_transient_backoff(...) and continue
retrying instead of immediately setting
skip_transcript_upload/ended_with_stream_error; only fall through to the
non-retryable final error path if transient retries are exhausted (set
transient_exhausted and persist the retryable marker using the same retry
signaling used elsewhere, e.g., COPILOT_RETRYABLE_ERROR_PREFIX), keeping
references to events_yielded, is_transient, _next_transient_backoff,
_do_transient_backoff, and transient_exhausted to locate the changes.

---

Duplicate comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 4287-4293: The finalize call unconditionally persists a
cancellation marker even when interrupted was cleared earlier on success; change
the guard so we only call interrupted.finalize(...) when interrupted still
represents a pending cancellation (e.g. wrap the existing if not
ended_with_stream_error: with an additional check like if interrupted and
interrupted.is_pending()/not interrupted.is_cleared():
interrupted.finalize(session, state, display_msg, retryable=is_transient)),
using the object's existing truthiness or its status method
(is_pending/is_cleared) to locate the correct condition in the code paths that
touch interrupted and finalize.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py`:
- Around line 39-57: Annotate the test helpers: import typing.Any and
typing.List if not already, change _tool_output signature to def
_tool_output(tool_call_id: str, output: Any) -> StreamToolOutputAvailable,
annotate _adapter_with_unresolved to return a MagicMock (def
_adapter_with_unresolved(responses: list[StreamToolOutputAvailable]) ->
MagicMock) and type the inner flush parameter as def _flush(out:
List[StreamToolOutputAvailable]) -> None; this keeps helpers self-documenting
and consistent with the typed helpers below.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3e4c17da-2cf0-43b1-897b-d917faab2b60

📥 Commits

Reviewing files that changed from the base of the PR and between 2e7c5fb and 6576bf5.

📒 Files selected for processing (3)
  • autogpt_platform/backend/backend/copilot/sdk/interrupted_partial_test.py
  • autogpt_platform/backend/backend/copilot/sdk/response_adapter.py
  • autogpt_platform/backend/backend/copilot/sdk/service.py

…-limit

Mode 1 (rate-limit at turn start, user message never persisted): the
backend's `check_rate_limit` raises BEFORE `append_and_save_message`, so
when a 429 fires the user's text only exists in the optimistic `useChat`
bubble — refresh or even a successful retry would lose it.

See `autogpt_platform/backend/backend/api/features/chat/routes.py:916-922`
(rate-limit check) and `routes.py:945` (later append-and-save) — backend
can't recover this on its own.

New flow on 429:
  - drop the optimistic user bubble (since DB has no record of it),
  - push `lastSubmittedMsgRef.current` back into the composer via the
    existing `setInitialPrompt` slot — same path URL pre-fills use, so
    `useChatInput`'s `consumeInitialPrompt` effect picks it up
    automatically,
  - clear `lastSubmittedMsgRef` so dedup doesn't block re-send.

In-memory only; refresh-survival is a separate follow-up.
@github-actions github-actions Bot added the platform/frontend AutoGPT Platform - Front end label Apr 25, 2026
@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Apr 25, 2026

E2E Live-Stack Test Report — PR #12918

Date: 2026-04-25
Tested HEAD: 40349c787 (after frontend draft-restore landed)
Worktree: /Users/majdyz/Code/AutoGPT2
Mode: native dev stack — poetry run app + docker-compose infra deps + real CHAT_USE_CLAUDE_AGENT_SDK=true + real Anthropic API key
Driver: direct REST API (UI onboarding skipped to reduce ceremony — same backend code path)

Verdict: APPROVE — all live scenarios PASS, both modes of SECRT-2275 verified end-to-end.

Scenario 1: Happy path

  • Command: POST /api/chat/sessions/{id}/stream with "Reply with exactly: hello world"
  • Expected: Clean SSE sequence, assistant + user msgs in DB.
  • Actual: start → start-step → text-start → text-delta → text-end → finish-step → usage → finish → DONE. DB has [user("Reply with exactly: hello world"), assistant("hello world")].
  • Result: PASS — no regression from the consolidation refactor.

Scenario 2: Rate-limit at turn start (Mode 1)

  • Setup: CHAT_DAILY_COST_LIMIT_MICRODOLLARS=1 + injected 999_999_999 µ$ into copilot:cost:daily:{user}:2026-04-25 redis key, restarted backend.
  • Command: POST a new message into the existing 6-msg session.
  • Expected: HTTP 429, prior history intact, rejected user message NOT persisted (since backend raises before append_and_save_message).
  • Actual:
    • HTTP 429, body: {"detail":"You've reached your daily usage limit. Resets in 20h 0m."}
    • GET /api/chat/usage: percent_used: 100.0
    • GET /api/chat/sessions/{id}: returns 200, 6 prior msgs intact, latest user message NOT in DB.
    • GET /api/chat/sessions (list): 200 — chat is fully functional.
  • Result: PASS — confirms backend cannot recover the lost user text by design (rate-limit check at routes.py:916-922 raises before routes.py:945 append_and_save_message). This is the gap the frontend draft-restore in 40349c787 closes.

Scenario 3: Synthetic tool-call limit graceful finish

  • Setup: CHAT_AGENT_MAX_TURNS=2, restarted backend.
  • Command: "Run three bash commands one at a time: uname -a, pwd, whoami." in a new session.
  • Expected: SDK CLI hits the 2-turn cap mid-task, finishes gracefully per PR fix(backend/copilot): raise baseline tool-round limit to 100 + graceful finish hint #12892.
  • Actual: HTTP 200, zero error events. SSE has 2 tool-input-available, 2 tool-output-available, 2 start-step/finish-step, 2 reasoning-start/end, 1 finish. DB has 7 rows: user → reasoning → assistant(tool_calls) → tool(uname result) → reasoning → assistant(text formatted result) → tool(pwd result).
  • Result: PASS — middle conversation fully preserved; only the third command (whoami) didn't run, exactly per max-turns design.

Scenario 4: Mid-stream failure (Mode 2 — the fix's actual target path)

  • Setup: Added a temp CHAT_TEST_INJECT_FAIL_AFTER_TOOL env-gated raise RuntimeError("ECONNRESET (test injection)") in _dispatch_response right after the first tool_result is appended to session.messages. Re-used the 6-msg session.

  • Command: "Run two bash commands one at a time: uname -a, then pwd."

  • Expected: Inject fires after first tool round → SDK retry-loop's except Exception fires with events_yielded > 0 → my fix's _InterruptedAttempt.capture() then _classify_final_failure() then finalize() re-attaches partial + adds error marker, all 6 prior msgs untouched.

  • Actual: SSE ends with error: "SDK stream error: ECONNRESET (test injection after first tool)". DB after: 11 messages

    • rows 1-6: 6 prior msgs intact
    • row 7: user (the new turn's input)
    • row 8: reasoning ("The user wants me to run two bash commands one at a time…")
    • row 9: assistant with tool_calls=[toolu_01Dy…]
    • row 10: tool with tool_call_id=toolu_01Dy… (the uname -a partial result — the work that would have been lost without the fix)
    • row 11: assistant with [__COPILOT_ERROR_f7a1__] SDK stream error: ECONNRESET …

    tool_calls[].idtool_msg.tool_call_id paired 1:1 (no orphan).

  • Result: PASS — the fix's core invariant (partial assistant work + tool result + error marker preserved across rollback, prior session history untouched) holds end-to-end on real SDK + real API.

Scenario 5: Post-cleanup verification

  • Setup: Reverted the temp injection, removed env var, synced working tree to origin/40349c787, restarted backend.
  • Command: Happy-path send in a new session.
  • Actual: Clean SSE sequence, 2 msgs persisted, zero error events.
  • Result: PASS — no residual state from injection.

Cleanup

  • Temp injection removed from working tree (git diff clean against origin/40349c787).
  • Env vars CHAT_DAILY_COST_LIMIT_MICRODOLLARS, CHAT_AGENT_MAX_TURNS, CHAT_TEST_INJECT_FAIL_AFTER_TOOL reverted.
  • Redis daily-cost counter reset.
  • Testing lock released.

Notes

  • Frontend (Mode 1) and backend (Mode 2) fixes are now both on 40349c787. The 1012 SDK tests + 48 retry-scenarios + 14 interrupted-partial unit tests + 3 frontend tests already verified the deterministic paths offline; this report adds end-to-end live confirmation against real Anthropic SDK + CLI tool execution.

majdyz and others added 2 commits April 25, 2026 11:57
Adds two cases that the existing 429 test did not exercise so codecov/patch
clears the 80% threshold:
1. The setMessages updater is invoked but no-ops when the trailing message
   is not a user bubble (assistant reply already landed).
2. The composer-restore branch is skipped entirely when no unsent text was
   captured (lastSubmittedMsgRef is null at error time).
Replace `transcript_snap: object` + `# type: ignore[arg-type]` on the
restore() call with a `TranscriptSnapshot` type alias exported from
transcript_builder, so `_InterruptedAttempt.capture` is fully typed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end platform/frontend AutoGPT Platform - Front end size/xl

Projects

Status: ✅ Done
Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants