fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK by majdyz · Pull Request #12413 · Significant-Gravitas/AutoGPT

majdyz · 2026-03-13T22:56:16Z

Summary

When the Claude SDK returns a prompt-too-long error (e.g. transcript + query exceeds the model's context window), the streaming loop now retries with escalating fallbacks instead of failing immediately:

Attempt 1: Use the transcript as-is (normal path)
Attempt 2: Compact the transcript via LLM summarization (compact_transcript) and retry
Attempt 3: Drop the transcript entirely and fall back to DB-reconstructed context (_build_query_message)

If all 3 attempts fail, a StreamError(code="prompt_too_long") is yielded to the frontend.

Key changes

service.py

Add _is_prompt_too_long(err) — pattern-matches SDK exceptions for prompt-length errors (prompt is too long, prompt_too_long, context_length_exceeded, request too large)
Wrap async with ClaudeSDKClient in a 3-attempt retry for loop with compaction/fallback logic
Move current_message, _build_query_message, and _prepare_file_attachments before the retry loop (computed once, reused)
Skip transcript upload in finally when transcript_caused_error (avoids persisting a broken/empty transcript)
Reset stream_completed between retry iterations
Document outer-scope variable contract in _run_stream_attempt closure (which variables are reassigned between retries vs read-only)

transcript.py

Add compact_transcript(content, log_prefix, model) — converts JSONL → messages → compress_context (LLM summarization with truncation fallback) → JSONL
Add helpers: _flatten_assistant_content, _flatten_tool_result_content, _transcript_to_messages, _messages_to_transcript, _run_compression
Returns None when compaction fails or transcript is already within budget (signals caller to fall through to DB fallback)
Truncation fallback wrapped in 30s timeout to prevent unbounded CPU time on large transcripts
Accepts model parameter to avoid creating a new ChatConfig() on every call

util/prompt.py

Fix _truncate_middle_tokens edge case: returns empty string when max_tok < 1, properly handles max_tok < 3

config.py

E2B sandbox timeout raised from 5 min to 15 min to accommodate compaction retries

prompt_too_long_test.py (new, 45 tests)

_is_prompt_too_long positive/negative patterns, case sensitivity, BaseException handling
Flatten helpers for assistant/tool_result content blocks
_transcript_to_messages / _messages_to_transcript roundtrip, strippable types, empty content
compact_transcript async tests: too few messages, not compacted, successful compaction, compression failure

retry_scenarios_test.py (new, 27 tests)

Full retry state machine simulation covering all 8 scenarios:
1. Normal flow (no retry)
2. Compact succeeds → retry succeeds
3. Compact fails → DB fallback succeeds
4. No transcript → DB fallback succeeds
5. Double fail → DB fallback on attempt 3
6. All 3 attempts exhausted
7. Non-prompt-too-long error (no retry)
8. Compaction returns identical content → DB fallback
Edge cases: nested exceptions, case insensitivity, unicode content, large transcripts, resume-after-compaction flow

Shared test fixtures (conftest.py)

Extracted build_test_transcript helper used across 3 test files to eliminate duplication

Test plan

…reserve compaction The TranscriptBuilder accumulates all raw SDK stream messages including pre-compaction content. When the CLI compacts mid-stream, the uploaded transcript still contains the full uncompacted messages, causing "Prompt is too long" errors on the next --resume turn. Fix: - Read the CLI's own session file (~/.claude/projects/<cwd>/*.jsonl) which reflects mid-stream compaction, instead of TranscriptBuilder - Extract _cli_project_dir() helper, refactor cleanup_cli_project_dir - On "Prompt is too long" error with --resume, delete the oversized transcript so the next turn falls back to compression-based context

…sion Use skip_transcript_upload instead of reusing use_resume to gate transcript uploads. use_resume starts False and only becomes True after a successful download, so gating on it prevented first-turn transcripts from ever being uploaded (bootstrap paradox).

Move flag assignment before the await to prevent re-upload if cancellation interrupts delete_transcript.

…l protection - Replace `import glob` with `pathlib.Path.glob()` in `read_cli_session_file` - Add symlink path traversal validation on glob results using `is_relative_to` - Add unit tests for `read_cli_session_file` and `_cli_project_dir`

…n_file Wrap p.resolve() and stat() calls in try/except to prevent unhandled OSError/RuntimeError from propagating and silently dropping transcript uploads when the caller falls back to TranscriptBuilder.

…-compaction

- Move `import shutil` to module top-level (HIGH) - Extract `_safe_glob_jsonl` helper to shorten `read_cli_session_file` (MEDIUM) - Inline single-use `err_str` variable (LOW) - Use lazy `%s` formatting in all logger calls (LOW) - Fix implicit string concatenation in log format string (NIT) - Compute accurate entry_count when using CLI transcript path (WARNING)

TranscriptBuilder accumulated ALL messages (pre + post compaction), making it larger than the CLI's active context after mid-stream compaction. This caused "Prompt is too long" errors on resume. Now when the PreCompact hook fires, CompactionTracker captures the transcript_path. When compaction ends (next message arrives), we read the CLI session file, find the isCompactSummary entry, and replace TranscriptBuilder's entries with the compacted entries. This ensures TranscriptBuilder always mirrors the CLI's active context. Removes the racy read_cli_session_file fallback from the finally block - TranscriptBuilder is now the single source of truth for uploads.

…ath validation, tests - Make compaction_just_ended a one-shot flag that resets after being read, fixing multiple mid-stream compactions not triggering replace_entries - Add path validation to read_compacted_entries() to reject paths outside the CLI projects directory - Reset _transcript_path in reset_for_query() for consistency - Add tests for read_compacted_entries and TranscriptBuilder.replace_entries - Preserve isCompactSummary entries during STRIPPABLE_TYPES filtering

… last-summary, dedup helpers - Replace side-effect @Property with CompactionResult dataclass for TOCTOU safety - Use build-then-swap in replace_entries to prevent data loss on corrupt input - Fix read_compacted_entries to use LAST isCompactSummary (not first) - Guard isCompactSummary in strip_progress_entries and TranscriptBuilder - Extract _projects_base(), _build_path_from_parts(), _build_meta_storage_path() - Extract TranscriptBuilder._parse_entry() static method - Sanitize transcript_path in pre_compact_hook before logging - Update compaction_test.py and transcript_test.py for new behaviors

Add isCompactSummary field to TranscriptEntry model so compaction summaries survive the export→load_previous roundtrip. Without this, exported summaries with type="summary" were stripped on reload since "summary" is in STRIPPABLE_TYPES. Also add integration tests simulating the full compaction flow (load → append → compact → replace → export → reload).

…stream compaction When compaction ends, replace_entries loads the CLI session file which already contains the current message. Skip append_assistant and append_tool_result for that iteration to avoid duplicates that could cause 'prompt is too long' errors on subsequent turns.

Exercises the full service.py compaction flow end-to-end: TranscriptBuilder load → CompactionTracker state machine → read_compacted_entries → replace_entries → export → roundtrip.

When the transcript exceeds the context window, the SDK now retries with escalating strategies: 1. Use transcript as-is (normal path) 2. Compact transcript via LLM summarization, retry with --resume 3. Drop transcript entirely, fall back to DB-reconstructed context - Add _is_prompt_too_long detector with specific patterns - Add compact_transcript + helpers (_transcript_to_messages, _messages_to_transcript, _flatten_*) to transcript.py - Remove dead delete_transcript function and its tests - Guard finally-block transcript upload with transcript_caused_error to prevent re-uploading a problematic transcript - Add 90 tests covering detection, helpers, and compaction

github-actions · 2026-03-13T22:56:27Z

This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request.

coderabbitai · 2026-03-13T22:56:53Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds prompt-too-long detection, a per-attempt retry flow that can compact transcripts (LLM-backed) and retry, multimodal attachment handling in the streaming path, transcript flattening/compaction utilities, and extensive tests covering detection, conversion, compaction, and retry scenarios.

Changes

Cohort / File(s)	Summary
Tests `autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py`, `autogpt_platform/backend/backend/copilot/sdk/retry_scenarios_test.py`	New comprehensive tests for prompt-too-long detection, flattening helpers, transcript ↔ messages round-trips, compact_transcript behavior, and end-to-end retry/compaction scenarios with heavy parametrization and mocking.
Transcript utilities `autogpt_platform/backend/backend/copilot/sdk/transcript.py`	Adds compaction subsystem: constants (STOP_REASON_END_TURN, COMPACT_MSG_ID_PREFIX, ENTRY_TYPE_MESSAGE), flattening helpers (`_flatten_assistant_content`, `_flatten_tool_result_content`), conversion utilities (`_transcript_to_messages`, `_messages_to_transcript`), `_run_compression`, and `compact_transcript` with validation, logging, and OpenAI-based compression pathway.
Service integration `autogpt_platform/backend/backend/copilot/sdk/service.py`	Introduces `_PROMPT_TOO_LONG_PATTERNS` and `_is_prompt_too_long`; implements per-attempt retry loop invoking compaction, rebuilds query/message state on retry, handles multimodal attachments, aggregates tool results into transcripts, enhances streaming/error lifecycle handling, and adds transcript upload safeguards.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Service as Service (stream_chat_completion_sdk)
    participant Claude as Claude API
    participant Transcript as Transcript Manager
    participant Compressor as OpenAI Compressor

    Client->>Service: stream_chat_completion_sdk(query, context)
    activate Service

    rect rgba(100,150,200,0.5)
    Note over Service: Retry loop with resume/transcript handling
    Service->>Claude: send_request(query, transcript?)
    end

    Claude-->>Service: ✗ prompt_too_long error

    alt Prompt Too Long Detected
        Service->>Transcript: _transcript_to_messages(transcript)
        Transcript-->>Service: messages[]
        Service->>Compressor: _run_compression(messages, model, cfg)
        activate Compressor
        Compressor->>Compressor: compress via OpenAI
        Compressor-->>Service: CompressResult
        deactivate Compressor
        Service->>Transcript: _messages_to_transcript(compressed_messages)
        Transcript-->>Service: new_transcript
        Service->>Service: Rebuild query with compacted transcript
        Service->>Claude: Retry request (next attempt)
        Claude-->>Service: ✓ response | ✗ other error
    else Success or Non-Prompt Error
        Service->>Service: Continue with response
    end

    Service->>Transcript: Persist final transcript
    Service-->>Client: stream response
    deactivate Service

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(copilot): remove legacy copilot, add baseline non-SDK mode with tool calling #12276 — overlaps on copilot SDK streaming flow, retry/compaction logic, and transcript handling.
fix(backend/copilot): sync TranscriptBuilder with CLI on mid-stream compaction #12401 — related changes to prompt-too-long handling, compaction/retry, and transcript upload/resume integration.
fix(copilot): SDK streaming reliability, parallel tools, incremental saves, frontend reconnection #12173 — coordinated edits to SDK streaming/service and transcript compaction plumbing.

Suggested labels

Review effort 4/5

Suggested reviewers

Swiftyos
0ubbe
Pwuts

Poem

🐇 I nudge the transcript, nibble tokens small,
Flatten the chatter, tuck tools in a sprawl.
Compress and retry, then onward we hop —
Messages stitched, no token will stop.
Hooray for compacted lanes on top!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.52% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically summarizes the main change: implementing retry-with-compaction logic for prompt-too-long errors in the CoPilot SDK.
Description check	✅ Passed	The PR description clearly outlines the prompt-too-long retry mechanism with three escalating fallback attempts, detailing key changes across multiple files and new test coverage.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/copilot-transcript-compaction

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

autogpt_platform/backend/backend/copilot/sdk/transcript.py (1)

672-691: Consider adding a docstring note about the fallback behavior.

The function falls back to truncation-based compression when the LLM call fails, which is good resilience. However, the behavior difference between LLM compression and truncation might be worth documenting for maintainers.

📝 Suggested docstring enhancement

 async def _run_compression(
     messages: list[dict],
     model: str,
     cfg: "ChatConfig",
     log_prefix: str,
 ) -> "CompressResult":
-    """Run LLM-based compression with truncation fallback."""
+    """Run LLM-based compression with truncation fallback.
+
+    If the LLM call fails (API error, timeout, network), falls back to
+    truncation-based compression which drops older messages without
+    summarization.
+    """
     import openai

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/transcript.py` around lines 672
- 691, The _run_compression function's docstring should mention its fallback
behavior: when openai.AsyncOpenAI calls in _run_compression raise
openai.APIError, openai.APITimeoutError, or OSError the function logs a warning
and calls compress_context with client=None to perform truncation-based
compression; update the docstring above _run_compression to describe this
distinction (LLM-based compression vs truncation fallback), list the exceptions
that trigger the fallback, and note that the fallback will call
compress_context(messages, model, client=None) so maintainers understand the
behavior and implications.

autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py (1)

341-462: Consider using a pytest fixture for the ChatConfig mock.

The mock pattern for ChatConfig is repeated across multiple tests. A fixture could reduce duplication.

♻️ Suggested fixture extraction

`@pytest.fixture`
def mock_chat_config():
    """Mock ChatConfig for compact_transcript tests."""
    with patch(
        "backend.copilot.config.ChatConfig",
        return_value=type(
            "Cfg", (), {"model": "m", "api_key": "k", "base_url": "u"}
        )(),
    ) as mock:
        yield mock


class TestCompactTranscript:
    `@pytest.mark.asyncio`
    async def test_too_few_messages_returns_none(self, mock_chat_config):
        transcript = _build_transcript([("user", "Hello")])
        result = await compact_transcript(transcript)
        assert result is None
    # ... other tests using mock_chat_config fixture

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py` around
lines 341 - 462, The tests in TestCompactTranscript repeat the same ChatConfig
patch; extract that into a pytest fixture (e.g., mock_chat_config) that patches
backend.copilot.config.ChatConfig and yields the mock config object, then add
the fixture as a parameter to each async test
(test_too_few_messages_returns_none, test_returns_content_when_not_compacted,
test_returns_compacted_transcript, test_returns_none_on_compression_failure) and
remove the inline patch blocks so compact_transcript tests reuse the fixture.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@autogpt_platform/backend/backend/copilot/sdk/service.py`:
- Around line 991-1002: The fallback branch currently clears the
TranscriptBuilder but still allows the finally block to upload a transcript with
message_count=len(session.messages), which misrepresents the stored transcript
as a full prefix; update the code in the branch that sets use_resume = False /
resume_file = None / transcript_builder = TranscriptBuilder() (and the analogous
block around lines 1574-1603) to either (A) mark a flag (e.g.,
skip_transcript_upload) so the finally upload is skipped when no-resume fallback
is used, or (B) regenerate a true transcript that covers the actual prefix
before letting the finally block write it (ensure message_count matches
transcript_builder.length), and ensure _build_query_message() will see the
correct resume state rather than a falsely complete transcript.
- Around line 944-949: The current empty-prompt check (using
current_message.strip()) runs after the code path that appends the incoming
message to the session and persists it in the finally block, so whitespace-only
messages are being stored; move the trim/empty validation so it executes before
any mutation or append to the session (i.e., validate message/current_message
and yield StreamError("Message cannot be empty.", code="empty_prompt") and
return) OR, if you prefer minimal change, ensure you remove the just-appended
message from the session (undo the append/persist) before returning; locate the
append/persist calls (e.g., session.add_message, session.messages.append, or any
save_session/save_conversation calls) in this handler and either validate
earlier or explicitly remove that entry before exiting.
- Around line 962-999: The retry logic can re-enter compaction on the final
attempt because it keys off transcript_content; update the loop so the last
attempt (_query_attempt == _MAX_QUERY_ATTEMPTS - 1) unconditionally disables
resume and falls back to DB context. Concretely, inside the for loop around
_query_attempt, when handling the "Prompt too long" branch, check if this is the
final retry and if so set use_resume = False, resume_file = None,
transcript_content = "" and reset transcript_builder (e.g., new
TranscriptBuilder()) regardless of transcript_content/compaction result;
otherwise keep the existing compact_transcript flow that may attempt compaction
and set use_resume based on its outcome. Ensure the check uses the existing
symbols _query_attempt, _MAX_QUERY_ATTEMPTS, use_resume, resume_file,
transcript_content, TranscriptBuilder and compact_transcript so the last attempt
always disables resume and triggers the DB fallback.

---

Nitpick comments:
In `@autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py`:
- Around line 341-462: The tests in TestCompactTranscript repeat the same
ChatConfig patch; extract that into a pytest fixture (e.g., mock_chat_config)
that patches backend.copilot.config.ChatConfig and yields the mock config
object, then add the fixture as a parameter to each async test
(test_too_few_messages_returns_none, test_returns_content_when_not_compacted,
test_returns_compacted_transcript, test_returns_none_on_compression_failure) and
remove the inline patch blocks so compact_transcript tests reuse the fixture.

In `@autogpt_platform/backend/backend/copilot/sdk/transcript.py`:
- Around line 672-691: The _run_compression function's docstring should mention
its fallback behavior: when openai.AsyncOpenAI calls in _run_compression raise
openai.APIError, openai.APITimeoutError, or OSError the function logs a warning
and calls compress_context with client=None to perform truncation-based
compression; update the docstring above _run_compression to describe this
distinction (LLM-based compression vs truncation fallback), list the exceptions
that trigger the fallback, and note that the fallback will call
compress_context(messages, model, client=None) so maintainers understand the
behavior and implications.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a785c0d0-ea4e-4238-9542-641e8256eabc

📥 Commits

Reviewing files that changed from the base of the PR and between cfe22e5 and 41591fd.

📒 Files selected for processing (9)

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Seer Code Review

🧰 Additional context used

📓 Path-based instructions (6)

autogpt_platform/backend/**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

autogpt_platform/backend/**/*.py: Use Python 3.11 (required; managed by Poetry via pyproject.toml) for backend development
Always run 'poetry run format' (Black + isort) before linting in backend development
Always run 'poetry run lint' (ruff) after formatting in backend development

Files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

autogpt_platform/backend/**/*.{py,txt}

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use poetry run prefix for all Python commands, including testing, linting, formatting, and migrations

Files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

autogpt_platform/backend/backend/**/*.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

Use Prisma ORM for database operations in PostgreSQL with pgvector for embeddings

Files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

autogpt_platform/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Format Python code with poetry run format

Files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

autogpt_platform/backend/**/*_test.py

📄 CodeRabbit inference engine (autogpt_platform/backend/CLAUDE.md)

autogpt_platform/backend/**/*_test.py: Always review snapshot changes with git diff before committing when updating snapshots with poetry run pytest --snapshot-update
Use pytest with snapshot testing for API responses in test files
Colocate test files with source files using the *_test.py naming convention

Files:

autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

autogpt_platform/backend/**/*test*.py

📄 CodeRabbit inference engine (AGENTS.md)

Run poetry run test for backend testing (runs pytest with docker based postgres + prisma)

Files:

autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

🧠 Learnings (4)

📚 Learning: 2026-02-26T17:02:22.448Z

Learnt from: Pwuts
Repo: Significant-Gravitas/AutoGPT PR: 12211
File: .pre-commit-config.yaml:160-179
Timestamp: 2026-02-26T17:02:22.448Z
Learning: Keep the pre-commit hook pattern broad for autogpt_platform/backend to ensure OpenAPI schema changes are captured. Do not narrow to backend/api/ alone, since the generated schema depends on Pydantic models across multiple directories (backend/data/, backend/blocks/, backend/copilot/, backend/integrations/, backend/util/). Narrowing could miss schema changes and cause frontend type desynchronization.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

📚 Learning: 2026-03-04T08:04:35.881Z

Learnt from: majdyz
Repo: Significant-Gravitas/AutoGPT PR: 12273
File: autogpt_platform/backend/backend/copilot/tools/workspace_files.py:216-220
Timestamp: 2026-03-04T08:04:35.881Z
Learning: In the AutoGPT Copilot backend, ensure that SVG images are not treated as vision image types by excluding 'image/svg+xml' from INLINEABLE_MIME_TYPES and MULTIMODAL_TYPES in tool_adapter.py; the Claude API supports PNG, JPEG, GIF, and WebP for vision. SVGs (XML text) should be handled via the text path instead, not the vision path.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

📚 Learning: 2026-03-05T15:42:08.207Z

Learnt from: ntindle
Repo: Significant-Gravitas/AutoGPT PR: 12297
File: .claude/skills/backend-check/SKILL.md:14-16
Timestamp: 2026-03-05T15:42:08.207Z
Learning: In Python files under autogpt_platform/backend (recursively), rely on poetry run format to perform formatting (Black + isort) and linting (ruff). Do not run poetry run lint as a separate step after poetry run format, since format already includes linting checks.

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/compaction.py
autogpt_platform/backend/backend/copilot/sdk/service.py
autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py
autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py
autogpt_platform/backend/backend/copilot/sdk/security_hooks.py
autogpt_platform/backend/backend/copilot/sdk/transcript.py
autogpt_platform/backend/backend/copilot/sdk/transcript_test.py
autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py
autogpt_platform/backend/backend/copilot/sdk/compaction_test.py

📚 Learning: 2026-02-04T16:49:42.490Z

Learnt from: CR
Repo: Significant-Gravitas/AutoGPT PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-02-04T16:49:42.490Z
Learning: Applies to autogpt_platform/backend/**/test/**/*.py : Use snapshot testing with '--snapshot-update' flag in backend tests when output changes; always review with 'git diff'

Applied to files:

autogpt_platform/backend/backend/copilot/sdk/transcript_test.py

🔇 Additional comments (17)

autogpt_platform/backend/backend/copilot/sdk/security_hooks.py (1)

305-319: LGTM! Good security practice with log sanitization.

The newline/carriage-return stripping prevents log injection attacks from untrusted transcript_path input. The updated callback signature properly propagates the transcript path through the compaction flow.

autogpt_platform/backend/backend/copilot/sdk/compaction.py (2)

30-41: LGTM! Well-designed dataclass to prevent TOCTOU races.

The CompactionResult properly bundles just_ended, events, and transcript_path together, eliminating the need for separate flag checks that could introduce race conditions between the emit call and state inspection.

195-208: LGTM! Good defensive logging on transcript_path overwrite.

The warning when _transcript_path is overwritten helps debug unexpected multiple compaction events within a single cycle.

autogpt_platform/backend/backend/copilot/sdk/transcript_builder.py (2)

57-73: LGTM! Good extraction of entry parsing logic.

The _parse_entry method properly filters strippable types while preserving entries with isCompactSummary=True, enabling compact summaries to survive load/replace operations. Reusing this method in both load_previous and replace_entries follows DRY principles.

175-210: LGTM! Defensive validation prevents data loss.

The non-empty check before swapping entries is a good safeguard against corrupt input wiping conversation history. The logging provides visibility into compaction outcomes.

autogpt_platform/backend/backend/copilot/sdk/compaction_test.py (2)

307-378: LGTM! Thorough testing of multi-compaction behavior.

The tests clearly document the expected behavior: within a single query, _done=True blocks subsequent compactions until reset_for_query is called. The inline comments explaining the state machine behavior are helpful for maintainability.

379-407: LGTM! Good coverage of transcript_path lifecycle.

Tests verify the complete lifecycle: storage via on_compact, return via emit_end_if_ready, and clearing after emission. This ensures the path doesn't leak into subsequent non-compaction calls.

autogpt_platform/backend/backend/copilot/sdk/e2e_compaction_test.py (1)

201-369: LGTM! Excellent end-to-end test coverage.

This test comprehensively simulates the service.py compaction flow across 13 steps, including transcript loading, message appending, PreCompact hook handling, CLI session reading, entry replacement, and roundtrip verification. The detailed assertions on parent UUID chains and isCompactSummary preservation provide confidence in the integration.

autogpt_platform/backend/backend/copilot/sdk/transcript.py (4)

156-176: LGTM! Proper path traversal protection.

The _cli_project_dir function validates that the resolved path stays within the projects base directory, preventing symlink-based path traversal attacks. Returning None for escaped paths is the right defensive approach.

179-199: LGTM! Safe glob implementation filtering symlink escapes.

The _safe_glob_jsonl function properly validates each candidate file's resolved path stays within the project directory, catching both symlink escapes and filesystem errors gracefully.

202-263: LGTM! Correctly finds the last compaction summary.

The implementation properly scans for the LAST isCompactSummary entry (not breaking on first match), which handles multiple compactions correctly. The path boundary check prevents reading files outside the projects directory.

95-99: LGTM! Proper preservation of compact summaries.

The isCompactSummary check ensures that summary entries marked as compaction results are preserved even though summary is in STRIPPABLE_TYPES. This is essential for maintaining compacted conversation state across turns.

Also applies to: 123-125

autogpt_platform/backend/backend/copilot/sdk/transcript_test.py (3)

319-340: LGTM! Good security test for symlink escape prevention.

This test verifies that symlinks pointing outside the project directory are correctly filtered out, preventing path traversal attacks via symbolic links.

604-691: LGTM! Comprehensive integration test simulating real service flow.

The test accurately simulates the service.py compaction flow: loading previous transcript, appending messages, reading compacted entries from CLI session, replacing builder state, and verifying the parent chain integrity. This provides confidence in the end-to-end integration.

753-781: LGTM! Excellent test for compact summary preservation during stripping.

This test verifies that strip_progress_entries correctly preserves entries with isCompactSummary=True while still stripping regular summary entries. This is a critical behavior for maintaining compacted state.

autogpt_platform/backend/backend/copilot/sdk/prompt_too_long_test.py (2)

27-76: LGTM! Thorough error detection testing.

The parametrized tests cover a good range of prompt-too-long error patterns including case variations, while explicitly testing that similar but non-matching patterns (like partial matches) don't trigger false positives.

291-303: LGTM! Essential roundtrip validation.

The roundtrip test verifies that messages → transcript → messages preserves both role and content, which is critical for the compaction flow where transcripts are converted back and forth.

Resolve merge conflicts in service.py, transcript.py, and transcript_test.py. Incorporates dev's compaction tracking (emit_end_if_ready, entries_replaced, read_compacted_entries) into the retry loop structure.

github-actions · 2026-03-13T23:09:28Z

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

fix(copilot): resolve OTel context ValueError during stream cancellation #12421 (Otto-AGPT · updated 19h ago)
- 📁 autogpt_platform/backend/backend/copilot/sdk/
  - service.py (4 conflicts, ~553 lines)
feat(platform): add nightly copilot automation flow #12407 (Swiftyos · updated 7m ago)
- 📁 autogpt_platform/backend/backend/notifications/
  - test_notifications.py (1 conflict, ~4 lines)
feat(platform/blocks): Add Instagram automation blocks #11602 (akkupratap323 · updated 7d ago)
ci: run pyright against Python 3.13 #12026 (Deeven-Seru · updated 8d ago)

🟡 Medium Risk — Some Line Overlap

These PRs have some overlapping changes:

feat(blocks): add AutoPilotBlock for invoking AutoPilot from graphs #12439 (majdyz · updated 19m ago)
- autogpt_platform/backend/backend/notifications/test_notifications.py: L19-25, L27-34, L331-338
feat(copilot): GitHub CLI support — inject GH_TOKEN and connect_integration tool #12426 (majdyz · updated 38m ago)
- autogpt_platform/backend/backend/notifications/test_notifications.py: L19-25, L27-34, L331-338

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

chore(backend): Update poetry to v2.2.1 #12459 (Pwuts · updated 37m ago)
- Shared files: autogpt_platform/backend/pyproject.toml
feat(backend/blocks): MCP message signing and tool integrity verification #12455 (dataCenter430 · updated 4h ago)
- Shared files: autogpt_platform/backend/pyproject.toml
ci(backend): Python version specific type checking #12453 (Pwuts · updated 11h ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump the production-dependencies group across 1 directory with 19 updates #12442 (dependabot · updated 18h ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps-dev): bump faker from 38.3.0 to 40.4.0 in /autogpt_platform/backend #12133 (dependabot · updated 19h ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps-dev): bump the development-dependencies group across 1 directory with 2 updates #12197 (dependabot · updated 1d ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump aiodns from 3.6.1 to 4.0.0 in /autogpt_platform/backend #12045 (dependabot · updated 4d ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump todoist-api-python from 2.1.7 to 3.2.1 in /autogpt_platform/backend #12044 (dependabot · updated 4d ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump stagehand from 0.5.9 to 3.5.0 in /autogpt_platform/backend #12039 (dependabot · updated 4d ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump groq from 0.30.0 to 1.0.0 in /autogpt_platform/backend #12038 (dependabot · updated 4d ago)
- Shared files: autogpt_platform/backend/pyproject.toml
chore(backend/deps): bump yt-dlp from 2025.12.8 to 2026.2.4 in /autogpt_platform/backend #12041 (dependabot · updated 4d ago)
- Shared files: autogpt_platform/backend/pyproject.toml

Summary: 4 conflict(s), 2 medium risk, 11 low risk (out of 17 PRs with file overlap)

Auto-generated on push. Ignores: openapi.json, lock files.

…backticks Revert f-string to %-format logging conversions in security_hooks.py, tool_adapter.py, and response_adapter.py — purely cosmetic and outside this PR's scope. Replace rST double backticks with single backticks in service.py docstrings for codebase consistency.

sentry · 2026-03-26T11:16:07Z

Issues attributed to commits in this pull request

This pull request was merged and Sentry observed the following issues:

‼️ ValueError: <Token var=<ContextVar name='current_c... in app:prod-behave:cloud
‼️ RuntimeError: Attempted to exit cancel scope in a... in app:prod-behave:cloud

…aware compression, and truncated tool call recovery (#12625) ## Why CoPilot has several context management issues that degrade long sessions: 1. "Prompt is too long" errors crash the session instead of triggering retry/compaction 2. Stale thinking blocks bloat transcripts, causing unnecessary compaction every turn 3. Compression target is hardcoded regardless of model context window size 4. Truncated tool calls (empty `{}` args from max_tokens) kill the session instead of guiding the model to self-correct ## What **Fix 1: Prompt-too-long retry bypass (SENTRY-1207)** The SDK surfaces "prompt too long" via `AssistantMessage.error` and `ResultMessage.result` — neither triggered the retry/compaction loop (only Python exceptions did). Now both paths are intercepted and re-raised. **Fix 2: Strip stale thinking blocks before upload** Thinking/redacted_thinking blocks in non-last assistant entries are 10-50K tokens each but only needed for API signature verification in the *last* message. Stripping before upload reduces transcript size and prevents per-turn compaction. **Fix 3: Model-aware compression target** `compress_context()` now computes `target_tokens` from the model's context window (e.g. 140K for Opus 200K) instead of a hardcoded 120K default. Larger models retain more history; smaller models compress more aggressively. **Fix 4: Self-correcting truncated tool calls** When the model's response exceeds max_tokens, tool call inputs get silently truncated to `{}`. Previously this tripped a circuit breaker after 3 attempts. Now the MCP wrapper detects empty args and returns guidance: "write in chunks with `cat >>`, pass via `@@agptfile:filename`". The model can self-correct instead of the session dying. ## How - **service.py**: `_is_prompt_too_long` checks in both `AssistantMessage.error` and `ResultMessage` error handlers. Circuit breaker limit raised from 3→5. - **transcript.py**: `strip_stale_thinking_blocks()` reverse-scans for last assistant `message.id`, strips thinking blocks from all others. Called in `upload_transcript()`. - **prompt.py**: `get_compression_target(model)` computes `context_window - 60K overhead`. `compress_context()` uses it when `target_tokens` is None. - **tool_adapter.py**: `_truncating` wrapper intercepts empty args on tools with required params, returns actionable guidance instead of failing. ## Related - Fixes SENTRY-1207 - Sessions: `d2f7cba3` (repeated compaction), `08b807d4` (prompt too long), `130d527c` (truncated tool calls) - Extends #12413, consolidates #12626 ## Test plan - [x] 6 unit tests for `strip_stale_thinking_blocks` - [x] 1 integration test for ResultMessage prompt-too-long → compaction retry - [x] Pyright clean (0 errors), all pre-commit hooks pass - [ ] E2E: Load transcripts from affected sessions and verify behavior

…t SDK (#12413) ## Summary When the Claude SDK returns a prompt-too-long error (e.g. transcript + query exceeds the model's context window), the streaming loop now retries with escalating fallbacks instead of failing immediately: 1. **Attempt 1**: Use the transcript as-is (normal path) 2. **Attempt 2**: Compact the transcript via LLM summarization (`compact_transcript`) and retry 3. **Attempt 3**: Drop the transcript entirely and fall back to DB-reconstructed context (`_build_query_message`) If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is yielded to the frontend. ### Key changes **`service.py`** - Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for prompt-length errors (`prompt is too long`, `prompt_too_long`, `context_length_exceeded`, `request too large`) - Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with compaction/fallback logic - Move `current_message`, `_build_query_message`, and `_prepare_file_attachments` before the retry loop (computed once, reused) - Skip transcript upload in `finally` when `transcript_caused_error` (avoids persisting a broken/empty transcript) - Reset `stream_completed` between retry iterations - Document outer-scope variable contract in `_run_stream_attempt` closure (which variables are reassigned between retries vs read-only) **`transcript.py`** - Add `compact_transcript(content, log_prefix, model)` — converts JSONL → messages → `compress_context` (LLM summarization with truncation fallback) → JSONL - Add helpers: `_flatten_assistant_content`, `_flatten_tool_result_content`, `_transcript_to_messages`, `_messages_to_transcript`, `_run_compression` - Returns `None` when compaction fails or transcript is already within budget (signals caller to fall through to DB fallback) - Truncation fallback wrapped in 30s timeout to prevent unbounded CPU time on large transcripts - Accepts `model` parameter to avoid creating a new `ChatConfig()` on every call **`util/prompt.py`** - Fix `_truncate_middle_tokens` edge case: returns empty string when `max_tok < 1`, properly handles `max_tok < 3` **`config.py`** - E2B sandbox timeout raised from 5 min to 15 min to accommodate compaction retries **`prompt_too_long_test.py`** (new, 45 tests) - `_is_prompt_too_long` positive/negative patterns, case sensitivity, BaseException handling - Flatten helpers for assistant/tool_result content blocks - `_transcript_to_messages` / `_messages_to_transcript` roundtrip, strippable types, empty content - `compact_transcript` async tests: too few messages, not compacted, successful compaction, compression failure **`retry_scenarios_test.py`** (new, 27 tests) - Full retry state machine simulation covering all 8 scenarios: 1. Normal flow (no retry) 2. Compact succeeds → retry succeeds 3. Compact fails → DB fallback succeeds 4. No transcript → DB fallback succeeds 5. Double fail → DB fallback on attempt 3 6. All 3 attempts exhausted 7. Non-prompt-too-long error (no retry) 8. Compaction returns identical content → DB fallback - Edge cases: nested exceptions, case insensitivity, unicode content, large transcripts, resume-after-compaction flow **Shared test fixtures** (`conftest.py`) - Extracted `build_test_transcript` helper used across 3 test files to eliminate duplication ## Test plan - [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8 positive, 5 negative patterns) - [x] `compact_transcript` compacts oversized transcripts via LLM summarization - [x] `compact_transcript` returns `None` on failure or when already within budget - [x] Retry loop state machine: all 8 scenarios verified with state assertions - [x] `TranscriptBuilder` works correctly after loading compacted transcripts - [x] `_messages_to_transcript` roundtrip preserves content including unicode - [x] `transcript_caused_error` prevents stale transcript upload - [x] Truncation timeout prevents unbounded CPU time - [x] All 139 unit tests pass locally - [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)

…t SDK (Significant-Gravitas#12413) ## Summary When the Claude SDK returns a prompt-too-long error (e.g. transcript + query exceeds the model's context window), the streaming loop now retries with escalating fallbacks instead of failing immediately: 1. **Attempt 1**: Use the transcript as-is (normal path) 2. **Attempt 2**: Compact the transcript via LLM summarization (`compact_transcript`) and retry 3. **Attempt 3**: Drop the transcript entirely and fall back to DB-reconstructed context (`_build_query_message`) If all 3 attempts fail, a `StreamError(code="prompt_too_long")` is yielded to the frontend. ### Key changes **`service.py`** - Add `_is_prompt_too_long(err)` — pattern-matches SDK exceptions for prompt-length errors (`prompt is too long`, `prompt_too_long`, `context_length_exceeded`, `request too large`) - Wrap `async with ClaudeSDKClient` in a 3-attempt retry `for` loop with compaction/fallback logic - Move `current_message`, `_build_query_message`, and `_prepare_file_attachments` before the retry loop (computed once, reused) - Skip transcript upload in `finally` when `transcript_caused_error` (avoids persisting a broken/empty transcript) - Reset `stream_completed` between retry iterations - Document outer-scope variable contract in `_run_stream_attempt` closure (which variables are reassigned between retries vs read-only) **`transcript.py`** - Add `compact_transcript(content, log_prefix, model)` — converts JSONL → messages → `compress_context` (LLM summarization with truncation fallback) → JSONL - Add helpers: `_flatten_assistant_content`, `_flatten_tool_result_content`, `_transcript_to_messages`, `_messages_to_transcript`, `_run_compression` - Returns `None` when compaction fails or transcript is already within budget (signals caller to fall through to DB fallback) - Truncation fallback wrapped in 30s timeout to prevent unbounded CPU time on large transcripts - Accepts `model` parameter to avoid creating a new `ChatConfig()` on every call **`util/prompt.py`** - Fix `_truncate_middle_tokens` edge case: returns empty string when `max_tok < 1`, properly handles `max_tok < 3` **`config.py`** - E2B sandbox timeout raised from 5 min to 15 min to accommodate compaction retries **`prompt_too_long_test.py`** (new, 45 tests) - `_is_prompt_too_long` positive/negative patterns, case sensitivity, BaseException handling - Flatten helpers for assistant/tool_result content blocks - `_transcript_to_messages` / `_messages_to_transcript` roundtrip, strippable types, empty content - `compact_transcript` async tests: too few messages, not compacted, successful compaction, compression failure **`retry_scenarios_test.py`** (new, 27 tests) - Full retry state machine simulation covering all 8 scenarios: 1. Normal flow (no retry) 2. Compact succeeds → retry succeeds 3. Compact fails → DB fallback succeeds 4. No transcript → DB fallback succeeds 5. Double fail → DB fallback on attempt 3 6. All 3 attempts exhausted 7. Non-prompt-too-long error (no retry) 8. Compaction returns identical content → DB fallback - Edge cases: nested exceptions, case insensitivity, unicode content, large transcripts, resume-after-compaction flow **Shared test fixtures** (`conftest.py`) - Extracted `build_test_transcript` helper used across 3 test files to eliminate duplication ## Test plan - [x] `_is_prompt_too_long` correctly identifies prompt-length errors (8 positive, 5 negative patterns) - [x] `compact_transcript` compacts oversized transcripts via LLM summarization - [x] `compact_transcript` returns `None` on failure or when already within budget - [x] Retry loop state machine: all 8 scenarios verified with state assertions - [x] `TranscriptBuilder` works correctly after loading compacted transcripts - [x] `_messages_to_transcript` roundtrip preserves content including unicode - [x] `transcript_caused_error` prevents stale transcript upload - [x] Truncation timeout prevents unbounded CPU time - [x] All 139 unit tests pass locally - [x] CI green (tests 3.11/3.12/3.13, types, CodeQL, linting)

majdyz added 15 commits March 13, 2026 16:52

fix(backend/copilot): set skip_transcript_upload before await delete

152f54f

Move flag assignment before the await to prevent re-upload if cancellation interrupts delete_transcript.

fix(backend/copilot): handle filesystem exceptions in read_cli_sessio…

8c7b077

…n_file Wrap p.resolve() and stat() calls in try/except to prevent unhandled OSError/RuntimeError from propagating and silently dropping transcript uploads when the caller falls back to TranscriptBuilder.

Merge remote-tracking branch 'origin/dev' into fix/copilot-transcript…

696f533

…-compaction

fix(backend/copilot): fix mock patch target in delete_transcript tests

b04f806

test(backend/copilot): add e2e compaction lifecycle tests

7d95321

Exercises the full service.py compaction flow end-to-end: TranscriptBuilder load → CompactionTracker state machine → read_compacted_entries → replace_entries → export → roundtrip.

majdyz requested a review from a team as a code owner March 13, 2026 22:56

majdyz requested review from 0ubbe and Swiftyos and removed request for a team March 13, 2026 22:56

github-project-automation Bot added this to AutoGPT development kanban Mar 13, 2026

github-project-automation Bot moved this to 🆕 Needs initial review in AutoGPT development kanban Mar 13, 2026

github-actions Bot added platform/backend AutoGPT Platform - Back end conflicts Automatically applied to PRs with merge conflicts labels Mar 13, 2026

github-actions Bot added the size/xl label Mar 13, 2026

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated

Comment thread autogpt_platform/backend/backend/copilot/sdk/service.py Outdated

Merge origin/dev into fix/copilot-transcript-compaction

39c0fec

Resolve merge conflicts in service.py, transcript.py, and transcript_test.py. Incorporates dev's compaction tracking (emit_end_if_ready, entries_replaced, read_compacted_entries) into the retry loop structure.

github-actions Bot removed the conflicts Automatically applied to PRs with merge conflicts label Mar 13, 2026

sentry Bot reviewed Mar 18, 2026

View reviewed changes

Comment thread autogpt_platform/backend/backend/copilot/sdk/transcript.py

majdyz requested a review from Pwuts March 18, 2026 08:16

This was referenced Mar 18, 2026

chore(backend/deps-dev): bump faker from 38.3.0 to 40.4.0 in /autogpt_platform/backend #12133

Open

chore(backend/deps): bump the production-dependencies group across 1 directory with 22 updates #12470

Closed

Pwuts approved these changes Mar 18, 2026

View reviewed changes

majdyz added this pull request to the merge queue Mar 18, 2026

Merged via the queue into dev with commit 5a29de0 Mar 18, 2026
21 checks passed

github-project-automation Bot moved this from 👍🏼 Mergeable to ✅ Done in AutoGPT development kanban Mar 18, 2026

github-project-automation Bot moved this to Done in Frontend Mar 18, 2026

majdyz deleted the fix/copilot-transcript-compaction branch March 18, 2026 10:42

This was referenced Mar 20, 2026

fix(backend): add circuit breaker for infinite tool call retry loops #12499

Merged

fix(backend): preserve thinking blocks during transcript compaction #12574

Merged

majdyz mentioned this pull request Mar 31, 2026

fix(backend/copilot): prompt-too-long retry, compaction churn, model-aware compression, and truncated tool call recovery #12625

Merged

4 tasks

coderabbitai Bot mentioned this pull request Mar 31, 2026

fix(backend/copilot): CLI ResultMessage "Prompt is too long" bypasses compaction #12626

Closed

6 tasks

coderabbitai Bot mentioned this pull request Apr 1, 2026

fix(backend/copilot): detect prompt-too-long in AssistantMessage content and ResultMessage success subtype #12642

Merged

4 tasks

majdyz mentioned this pull request Apr 3, 2026

fix(copilot): prevent duplicate side effects from double-submit and stale-cache race #12660

Merged

4 tasks

coderabbitai Bot mentioned this pull request Apr 25, 2026

fix(backend/copilot): preserve interrupted SDK partial work on final-failure exit #12918

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 4, 2026

fix(backend/copilot): re-prompt on thinking-only finish; route storage-limit through DB-manager #12992

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK#12413

fix(platform): try-compact-retry for prompt-too-long errors in CoPilot SDK#12413
majdyz merged 80 commits into
devfrom
fix/copilot-transcript-compaction

majdyz commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

sentry Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

majdyz commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Test plan

Uh oh!

github-actions Bot commented Mar 13, 2026

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 PR Overlap Detection

🔴 Merge Conflicts Detected

🟡 Medium Risk — Some Line Overlap

🟢 Low Risk — File Overlap Only

Uh oh!

Uh oh!

Uh oh!

sentry Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues attributed to commits in this pull request

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

majdyz commented Mar 13, 2026 •

edited

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

github-actions Bot commented Mar 13, 2026 •

edited

Loading

sentry Bot commented Mar 26, 2026 •

edited

Loading