[CI BISECT — DO NOT MERGE] claude-agent-sdk 0.1.55#12743
Conversation
…ests We've been pinned at `claude-agent-sdk==0.1.45` (bundled CLI 2.1.63) since PR #12294 because every version above introduces a 400 against OpenRouter. There are two stacked regressions today: 1. CLI 2.1.69 (= SDK 0.1.46) added a `tool_reference` content block in `tool_result.content` that OpenRouter's stricter Zod validation rejects. CLI 2.1.70 added a proxy-detection workaround but our subsequent attempts at 0.1.55 and 0.1.56 still failed. 2. A newer regression — the `context-management-2025-06-27` beta header — appears in some CLI version after 2.1.91. Tracked upstream at anthropics/claude-agent-sdk-python#789, still open with no fix. This commit doesn't actually upgrade the SDK — it adds the infrastructure we need to upgrade safely *when* upstream lands a fix or when we identify a known-good newer CLI version via bisection: * `ChatConfig.claude_agent_cli_path` (env: `CLAUDE_AGENT_CLI_PATH`) threads through to `ClaudeAgentOptions(cli_path=...)` so we can decouple the Python SDK API surface from the CLI binary version. `_prewarm_cli` in the CoPilotExecutor honours the same override. * `test_bundled_cli_version_is_known_good_against_openrouter` pins the bundled CLI to a known-good set (`{"2.1.63"}` today). Any `claude-agent-sdk` bump that changes the bundled CLI will fail this test loudly with a pointer to PR #12294 and issue #789, instead of silently re-breaking production. * `test_sdk_exposes_cli_path_option` is a forward-compat sentinel that fails fast if upstream removes the `cli_path` option we depend on for the override. * `cli_openrouter_compat_test.py` is the actual reproduction test: spawns the bundled (or `CLAUDE_AGENT_CLI_PATH`-overridden) CLI against an in-process aiohttp server pretending to be the Anthropic Messages API, captures every request body the CLI sends, and asserts that none of them contain the two known forbidden patterns (`"type": "tool_reference"` content blocks or `"context-management-2025-06-27"` in body or `anthropic-beta` header). The fake server returns a minimal valid streamed response so the CLI doesn't error out before we can inspect what it sent. No OpenRouter API key required — the test reproduces the *mechanism* rather than the symptom, so it's deterministic and free to run in CI. Workflow for verifying a candidate upgrade going forward: bump the SDK in `pyproject.toml`, push the commit, and watch the CI run for both tests in `sdk_compat_test.py` and `cli_openrouter_compat_test.py`. A clean run on both means it's safe to add the new bundled CLI version to `_KNOWN_GOOD_BUNDLED_CLI_VERSIONS` and merge.
CI bisect commit only — do NOT merge. 0.1.55 is the highest version historically attempted by Dependabot before being rolled back. Tests whether CLI 2.1.91 (which includes the MCP large-tool-result fix and predates the suspected `context-management-2025-06-27` introduction) still trips the OpenRouter forbidden-pattern guard.
Same pre-existing dev-branch lint issue from PR #12739 — black would reformat this file (extra blank line between two test classes), which fails the `lint` CI job for any PR branched from current dev.
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟡 Medium Risk — Some Line OverlapThese PRs have some overlapping changes:
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 5 conflict(s), 1 medium risk, 13 low risk (out of 19 PRs with file overlap) Auto-generated on push. Ignores: |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (78.33%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## dev #12743 +/- ##
========================================
Coverage 63.14% 63.14%
========================================
Files 1811 1812 +1
Lines 130463 130581 +118
Branches 14260 14272 +12
========================================
+ Hits 82376 82461 +85
- Misses 45495 45519 +24
- Partials 2592 2601 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
❌ FAIL — 0.1.55 (bundled CLI 2.1.91) trips the reproduction test with: Closing — bisect verdict captured in the parent PR #12741 description and in PR #12745 (compat proxy). This was a CI-only probe that was never intended to merge. |
CI bisect probe for the OpenRouter compat investigation. NOT for merging — close after CI runs report back.
Bumps
claude-agent-sdkto0.1.55to test whether the newcli_openrouter_compat_test.pyreproduction passes / fails. The signal we care about:test_cli_does_not_send_openrouter_incompatible_featurespassing → this version is OpenRouter-safe and a viable upgrade target.tool_referenceblocks orcontext-management-2025-06-27beta).Companion to #12741 (the cli_path plumbing + reproduction test PR).
Tracks anthropics/claude-agent-sdk-python#789.