Skip to content

claude: short-circuit probe requests & drop context-1m beta on OAuth#3567

Open
lyzgeorge wants to merge 5 commits into
router-for-me:devfrom
lyzgeorge:fix/context-1m-and-probe-shortcircuit
Open

claude: short-circuit probe requests & drop context-1m beta on OAuth#3567
lyzgeorge wants to merge 5 commits into
router-for-me:devfrom
lyzgeorge:fix/context-1m-and-probe-shortcircuit

Conversation

@lyzgeorge
Copy link
Copy Markdown

Summary

  • claude_executor: when upstream auth is OAuth (no api_key), strip context-1m-2025-08-07 from Anthropic-Beta. Anthropic returns 429 Usage credits are required for long context requests for subscription accounts whose pay-as-you-go credits are not enabled; clients like the Claude Desktop App in cowork mode advertise the beta unconditionally and break sub-200k requests. API-key paths are unchanged and keep the full 1M window.
  • code_handlers: short-circuit /v1/messages locally when the request matches the actual Claude Desktop / Claude Code liveness-probe fingerprint (not bare max_tokens=1, which is a legitimate single-token classifier setting). The synthesized response carries stop_reason=max_tokens for both JSON and SSE. Probe predicate requires:
    • max_tokens == 1
    • messages contain a pre-filled assistant turn (real classifier inputs are user-only)
    • no tools / tool_choice (tool-gated classifiers won't be probed)
  • Adds code_handlers_probe_test.go covering the JSON path, SSE path, the positive predicate, and the false-positive shapes (single-token user-only classifier, tool-gated classifier via tools and via tool_choice).

History

Continuation of #3566 (closed); same branch, no force-pushes. Resolved feedback already incorporated:

  • gemini-code-assist[bot]: message_delta usage now carries only output_tokens (commit e17b18d).
  • chatgpt-codex-connector[bot] P1: probe predicate tightened to avoid false positives on real classification workloads (commit 5e4bd68).

Test plan

  • `go build ./cmd/server` (verified inside Docker multi-stage build, `golang:1.26-alpine`)
  • `go test ./sdk/api/handlers/claude/...` (probe tests including false-positive cases)
  • Deployed image under cowork-mode traffic; previously failing long-context requests now succeed at 200k; probe traffic short-circuits without hitting upstream; classifier-style `max_tokens=1` requests pass through unchanged.

🤖 Generated with Claude Code

George Liu and others added 4 commits May 26, 2026 18:17
Clients (e.g. Claude Desktop App) periodically probe the messages
endpoint with max_tokens=1 to check liveness/auth. Forwarding these to
upstream wastes a token quota and surfaces upstream auth failures on
otherwise idle UIs. Detect them in ClaudeMessages and synthesize a
local message with stop_reason=max_tokens, supporting both JSON and
SSE responses. Adds dedicated tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Anthropic OAuth subscription accounts (Max/Pro) reject requests
carrying the context-1m-2025-08-07 beta with HTTP 429
"Usage credits are required for long context requests" unless
pay-as-you-go credits are enabled. Cowork-style clients (Claude
Desktop App) advertise the 1M beta unconditionally, breaking
otherwise valid sub-200k requests.

When the resolved auth is OAuth (no api_key attribute), filter the
beta out so requests fall back to the standard 200k window. API-key
upstreams keep the beta and full 1M capability.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Bare max_tokens=1 is a legitimate single-token classification setting,
so the previous predicate over-matched. Restrict the short-circuit to
the actual probe fingerprint sent by Claude Desktop App / Claude Code:

- max_tokens == 1
- messages contain a pre-filled assistant turn (real classifier inputs
  are user-only)
- no tools / tool_choice (tool-gated classifiers won't be probed)

Adds tests for the false-positive shapes the reviewer raised:
single-token user-only classifier and tool-gated classifier both
pass through to upstream.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces logic to strip the context-1m-2025-08-07 beta header for OAuth upstreams to prevent 429 errors, and implements a short-circuit mechanism for Claude Desktop App/Claude Code liveness probes. However, the current implementation of shouldShortCircuitClaudeSample checks if any message in the array has the assistant role, which incorrectly flags legitimate multi-turn conversations where the last message is from the user. It is recommended to only check if the last message in the array is an assistant message, and to add a corresponding test case to prevent regressions.

Comment thread sdk/api/handlers/claude/code_handlers.go Outdated
Comment thread sdk/api/handlers/claude/code_handlers_probe_test.go
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e4bd686c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +498 to +500
gjson.GetBytes(rawJSON, "messages").ForEach(func(_, msg gjson.Result) bool {
if msg.Get("role").String() == "assistant" {
hasAssistant = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require stricter probe fingerprint before short-circuiting

The current predicate treats any request with max_tokens == 1 plus an assistant message (and no tools) as a probe, but that shape is also valid for real Claude workloads (e.g., single-token classification or constrained continuation with prior assistant turns). In those cases this handler now returns a synthetic empty response locally instead of sending the request upstream, which can silently break legitimate production requests whenever callers intentionally use max_tokens=1 with multi-turn context.

Useful? React with 👍 / 👎.

The previous predicate scanned for any assistant turn in the history,
which would incorrectly short-circuit multi-turn classifier requests
that happen to set max_tokens=1 on a follow-up question (real
conversations naturally accumulate assistant turns).

Switch to gjson's messages.-1 to assert the last message is an
assistant turn, which is the actual hallmark of the Claude Desktop
App / Claude Code liveness probe (probes pre-fill an assistant turn
to coax the model into emitting one token immediately). Adds a test
for the multi-turn user-ending case.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3da6162448

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +497 to +498
lastMsg := gjson.GetBytes(rawJSON, "messages.-1")
return lastMsg.Exists() && lastMsg.Get("role").String() == "assistant"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Require a stricter probe fingerprint before short-circuiting

The short-circuit predicate still matches normal Claude requests that intentionally end with an assistant prefill and use max_tokens: 1, so those real requests are answered locally with a synthetic empty message instead of being sent upstream. Fresh evidence in this commit is TestShouldShortCircuitClaudeSample ("probe shape shorts") which explicitly marks a standard user -> assistant conversation as a probe, so this false-positive path is now codified and can silently break legitimate single-token continuation/classification workloads.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied in 3da6162. The predicate now uses messages.-1 and checks only the last message is assistant. Added the multi-turn-ending-on-user test case alongside it.

Comment on lines +982 to +983
if !useAPIKey && strings.Contains(baseBetas, "context-1m-2025-08-07") {
parts := strings.Split(baseBetas, ",")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid stripping 1M context beta from all OAuth traffic

This removes context-1m-2025-08-07 for every non-api_key auth path, which forces all OAuth requests down to the 200k window. That means OAuth accounts that are actually eligible for long-context will now fail requests above 200k tokens because the required beta is always deleted, creating a functional regression for valid long-context workloads instead of only handling the failing subset.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant