Skip to content

[CI BISECT — DO NOT MERGE] claude-agent-sdk 0.1.55#12743

Closed
majdyz wants to merge 3 commits into
devfrom
ci-test-sdk-bisect-0.1.55
Closed

[CI BISECT — DO NOT MERGE] claude-agent-sdk 0.1.55#12743
majdyz wants to merge 3 commits into
devfrom
ci-test-sdk-bisect-0.1.55

Conversation

@majdyz
Copy link
Copy Markdown
Contributor

@majdyz majdyz commented Apr 11, 2026

CI bisect probe for the OpenRouter compat investigation. NOT for merging — close after CI runs report back.

Bumps claude-agent-sdk to 0.1.55 to test whether the new cli_openrouter_compat_test.py reproduction passes / fails. The signal we care about:

  • test_cli_does_not_send_openrouter_incompatible_features passing → this version is OpenRouter-safe and a viable upgrade target.
  • ❌ same test failing → this version trips one of the two known forbidden patterns (tool_reference blocks or context-management-2025-06-27 beta).
  • ⚠️ same test skipping → CLI failed to make any HTTP request before the test could capture, treat as inconclusive.

Companion to #12741 (the cli_path plumbing + reproduction test PR).

Tracks anthropics/claude-agent-sdk-python#789.

majdyz added 3 commits April 11, 2026 07:05
…ests

We've been pinned at `claude-agent-sdk==0.1.45` (bundled CLI 2.1.63)
since PR #12294 because every version above introduces a 400 against
OpenRouter. There are two stacked regressions today:

1. CLI 2.1.69 (= SDK 0.1.46) added a `tool_reference` content block in
   `tool_result.content` that OpenRouter's stricter Zod validation
   rejects. CLI 2.1.70 added a proxy-detection workaround but our
   subsequent attempts at 0.1.55 and 0.1.56 still failed.
2. A newer regression — the `context-management-2025-06-27` beta
   header — appears in some CLI version after 2.1.91. Tracked upstream
   at anthropics/claude-agent-sdk-python#789, still open with no fix.

This commit doesn't actually upgrade the SDK — it adds the
infrastructure we need to upgrade safely *when* upstream lands a fix
or when we identify a known-good newer CLI version via bisection:

* `ChatConfig.claude_agent_cli_path` (env: `CLAUDE_AGENT_CLI_PATH`)
  threads through to `ClaudeAgentOptions(cli_path=...)` so we can
  decouple the Python SDK API surface from the CLI binary version.
  `_prewarm_cli` in the CoPilotExecutor honours the same override.

* `test_bundled_cli_version_is_known_good_against_openrouter` pins
  the bundled CLI to a known-good set (`{"2.1.63"}` today). Any
  `claude-agent-sdk` bump that changes the bundled CLI will fail this
  test loudly with a pointer to PR #12294 and issue #789, instead of
  silently re-breaking production.

* `test_sdk_exposes_cli_path_option` is a forward-compat sentinel that
  fails fast if upstream removes the `cli_path` option we depend on
  for the override.

* `cli_openrouter_compat_test.py` is the actual reproduction test:
  spawns the bundled (or `CLAUDE_AGENT_CLI_PATH`-overridden) CLI
  against an in-process aiohttp server pretending to be the Anthropic
  Messages API, captures every request body the CLI sends, and
  asserts that none of them contain the two known forbidden patterns
  (`"type": "tool_reference"` content blocks or
  `"context-management-2025-06-27"` in body or `anthropic-beta`
  header). The fake server returns a minimal valid streamed response
  so the CLI doesn't error out before we can inspect what it sent.
  No OpenRouter API key required — the test reproduces the *mechanism*
  rather than the symptom, so it's deterministic and free to run in CI.

Workflow for verifying a candidate upgrade going forward: bump the
SDK in `pyproject.toml`, push the commit, and watch the CI run for
both tests in `sdk_compat_test.py` and `cli_openrouter_compat_test.py`.
A clean run on both means it's safe to add the new bundled CLI version
to `_KNOWN_GOOD_BUNDLED_CLI_VERSIONS` and merge.
CI bisect commit only — do NOT merge. 0.1.55 is the highest version
historically attempted by Dependabot before being rolled back. Tests
whether CLI 2.1.91 (which includes the MCP large-tool-result fix and
predates the suspected `context-management-2025-06-27` introduction)
still trips the OpenRouter forbidden-pattern guard.
Same pre-existing dev-branch lint issue from PR #12739 — black would
reformat this file (extra blank line between two test classes), which
fails the `lint` CI job for any PR branched from current dev.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 51d2a795-31d4-4a7b-95c1-397ce540aab8

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci-test-sdk-bisect-0.1.55

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 PR Overlap Detection

This check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early.

🔴 Merge Conflicts Detected

The following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.

🟡 Medium Risk — Some Line Overlap

These PRs have some overlapping changes:

  • chore(copilot): SDK CLI override + OpenRouter compat regression tests #12741 (majdyz · updated just now)
    • autogpt_platform/backend/backend/copilot/config.py: L172-189
    • autogpt_platform/backend/backend/data/platform_cost_test.py: L35-41
    • autogpt_platform/backend/backend/copilot/executor/processor.py: L174-198
    • autogpt_platform/backend/backend/copilot/sdk/cli_openrouter_compat_test.py: L1-424
    • autogpt_platform/backend/backend/copilot/sdk/service.py: L2245-2256
    • autogpt_platform/backend/backend/copilot/sdk/sdk_compat_test.py: L196-274

🟢 Low Risk — File Overlap Only

These PRs touch the same files but different sections (click to expand)

Summary: 5 conflict(s), 1 medium risk, 13 low risk (out of 19 PRs with file overlap)


Auto-generated on push. Ignores: openapi.json, lock files.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 11, 2026

Codecov Report

❌ Patch coverage is 78.33333% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.14%. Comparing base (b319c26) to head (552d84b).

❌ Your patch status has failed because the patch coverage (78.33%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##              dev   #12743    +/-   ##
========================================
  Coverage   63.14%   63.14%            
========================================
  Files        1811     1812     +1     
  Lines      130463   130581   +118     
  Branches    14260    14272    +12     
========================================
+ Hits        82376    82461    +85     
- Misses      45495    45519    +24     
- Partials     2592     2601     +9     
Flag Coverage Δ
platform-backend 74.64% <78.33%> (+0.01%) ⬆️
platform-frontend-e2e 27.97% <ø> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Platform Backend 74.64% <78.33%> (+0.01%) ⬆️
Platform Frontend 23.72% <ø> (-0.06%) ⬇️
AutoGPT Libs ∅ <ø> (∅)
Classic AutoGPT 28.43% <ø> (ø)
🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@majdyz
Copy link
Copy Markdown
Contributor Author

majdyz commented Apr 11, 2026

FAIL — 0.1.55 (bundled CLI 2.1.91) trips the reproduction test with: 'context-management-2025-06-27' in 'anthropic-beta' header — issue #789. Confirms the new regression, not the original tool_reference one. To upgrade to this version, the compat proxy in #12745 is required.

Closing — bisect verdict captured in the parent PR #12741 description and in PR #12745 (compat proxy). This was a CI-only probe that was never intended to merge.

@majdyz majdyz closed this Apr 11, 2026
@github-project-automation github-project-automation Bot moved this from 🆕 Needs initial review to ✅ Done in AutoGPT development kanban Apr 11, 2026
@majdyz majdyz deleted the ci-test-sdk-bisect-0.1.55 branch April 12, 2026 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform/backend AutoGPT Platform - Back end size/xl

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

1 participant