claude: short-circuit probe requests & drop context-1m beta on OAuth by lyzgeorge · Pull Request #3567 · router-for-me/CLIProxyAPI

lyzgeorge · 2026-05-26T10:39:51Z

Summary

claude_executor: when upstream auth is OAuth (no api_key), strip context-1m-2025-08-07 from Anthropic-Beta. Anthropic returns 429 Usage credits are required for long context requests for subscription accounts whose pay-as-you-go credits are not enabled; clients like the Claude Desktop App in cowork mode advertise the beta unconditionally and break sub-200k requests. API-key paths are unchanged and keep the full 1M window.
code_handlers: short-circuit /v1/messages locally when the request matches the actual Claude Desktop / Claude Code liveness-probe fingerprint (not bare max_tokens=1, which is a legitimate single-token classifier setting). The synthesized response carries stop_reason=max_tokens for both JSON and SSE. Probe predicate requires:
- max_tokens == 1
- messages contain a pre-filled assistant turn (real classifier inputs are user-only)
- no tools / tool_choice (tool-gated classifiers won't be probed)
Adds code_handlers_probe_test.go covering the JSON path, SSE path, the positive predicate, and the false-positive shapes (single-token user-only classifier, tool-gated classifier via tools and via tool_choice).

History

Continuation of #3566 (closed); same branch, no force-pushes. Resolved feedback already incorporated:

gemini-code-assist[bot]: message_delta usage now carries only output_tokens (commit e17b18d).
chatgpt-codex-connector[bot] P1: probe predicate tightened to avoid false positives on real classification workloads (commit 5e4bd68).

Test plan

`go build ./cmd/server` (verified inside Docker multi-stage build, `golang:1.26-alpine`)
`go test ./sdk/api/handlers/claude/...` (probe tests including false-positive cases)
Deployed image under cowork-mode traffic; previously failing long-context requests now succeed at 200k; probe traffic short-circuits without hitting upstream; classifier-style `max_tokens=1` requests pass through unchanged.

🤖 Generated with Claude Code

Clients (e.g. Claude Desktop App) periodically probe the messages endpoint with max_tokens=1 to check liveness/auth. Forwarding these to upstream wastes a token quota and surfaces upstream auth failures on otherwise idle UIs. Detect them in ClaudeMessages and synthesize a local message with stop_reason=max_tokens, supporting both JSON and SSE responses. Adds dedicated tests. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Anthropic OAuth subscription accounts (Max/Pro) reject requests carrying the context-1m-2025-08-07 beta with HTTP 429 "Usage credits are required for long context requests" unless pay-as-you-go credits are enabled. Cowork-style clients (Claude Desktop App) advertise the 1M beta unconditionally, breaking otherwise valid sub-200k requests. When the resolved auth is OAuth (no api_key attribute), filter the beta out so requests fall back to the standard 200k window. API-key upstreams keep the beta and full 1M capability. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Bare max_tokens=1 is a legitimate single-token classification setting, so the previous predicate over-matched. Restrict the short-circuit to the actual probe fingerprint sent by Claude Desktop App / Claude Code: - max_tokens == 1 - messages contain a pre-filled assistant turn (real classifier inputs are user-only) - no tools / tool_choice (tool-gated classifiers won't be probed) Adds tests for the false-positive shapes the reviewer raised: single-token user-only classifier and tool-gated classifier both pass through to upstream. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces logic to strip the context-1m-2025-08-07 beta header for OAuth upstreams to prevent 429 errors, and implements a short-circuit mechanism for Claude Desktop App/Claude Code liveness probes. However, the current implementation of shouldShortCircuitClaudeSample checks if any message in the array has the assistant role, which incorrectly flags legitimate multi-turn conversations where the last message is from the user. It is recommended to only check if the last message in the array is an assistant message, and to add a corresponding test case to prevent regressions.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e4bd686c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T10:44:48Z

+	gjson.GetBytes(rawJSON, "messages").ForEach(func(_, msg gjson.Result) bool {
+		if msg.Get("role").String() == "assistant" {
+			hasAssistant = true


Require stricter probe fingerprint before short-circuiting

The current predicate treats any request with max_tokens == 1 plus an assistant message (and no tools) as a probe, but that shape is also valid for real Claude workloads (e.g., single-token classification or constrained continuation with prior assistant turns). In those cases this handler now returns a synthetic empty response locally instead of sending the request upstream, which can silently break legitimate production requests whenever callers intentionally use max_tokens=1 with multi-turn context.

Useful? React with 👍 / 👎.

The previous predicate scanned for any assistant turn in the history, which would incorrectly short-circuit multi-turn classifier requests that happen to set max_tokens=1 on a follow-up question (real conversations naturally accumulate assistant turns). Switch to gjson's messages.-1 to assert the last message is an assistant turn, which is the actual hallmark of the Claude Desktop App / Claude Code liveness probe (probes pre-fill an assistant turn to coax the model into emitting one token immediately). Adds a test for the multi-turn user-ending case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3da6162448

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-26T10:48:53Z

+	lastMsg := gjson.GetBytes(rawJSON, "messages.-1")
+	return lastMsg.Exists() && lastMsg.Get("role").String() == "assistant"


Require a stricter probe fingerprint before short-circuiting

The short-circuit predicate still matches normal Claude requests that intentionally end with an assistant prefill and use max_tokens: 1, so those real requests are answered locally with a synthetic empty message instead of being sent upstream. Fresh evidence in this commit is TestShouldShortCircuitClaudeSample ("probe shape shorts") which explicitly marks a standard user -> assistant conversation as a probe, so this false-positive path is now codified and can silently break legitimate single-token continuation/classification workloads.

Useful? React with 👍 / 👎.

applied in 3da6162. The predicate now uses messages.-1 and checks only the last message is assistant. Added the multi-turn-ending-on-user test case alongside it.

chatgpt-codex-connector · 2026-05-26T10:48:53Z

+	if !useAPIKey && strings.Contains(baseBetas, "context-1m-2025-08-07") {
+		parts := strings.Split(baseBetas, ",")


Avoid stripping 1M context beta from all OAuth traffic

This removes context-1m-2025-08-07 for every non-api_key auth path, which forces all OAuth requests down to the 200k window. That means OAuth accounts that are actually eligible for long-context will now fail requests above 200k tokens because the required beta is always deleted, creating a functional regression for valid long-context workloads instead of only handling the failing subset.

Useful? React with 👍 / 👎.

George Liu and others added 4 commits May 26, 2026 18:17

Update sdk/api/handlers/claude/code_handlers.go

e17b18d

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

Comment thread sdk/api/handlers/claude/code_handlers.go Outdated

Comment thread sdk/api/handlers/claude/code_handlers_probe_test.go

chatgpt-codex-connector Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

claude: short-circuit probe requests & drop context-1m beta on OAuth#3567

claude: short-circuit probe requests & drop context-1m beta on OAuth#3567
lyzgeorge wants to merge 5 commits into
router-for-me:devfrom
lyzgeorge:fix/context-1m-and-probe-shortcircuit

lyzgeorge commented May 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

lyzgeorge May 26, 2026

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		lastMsg := gjson.GetBytes(rawJSON, "messages.-1")
		return lastMsg.Exists() && lastMsg.Get("role").String() == "assistant"

		if !useAPIKey && strings.Contains(baseBetas, "context-1m-2025-08-07") {
		parts := strings.Split(baseBetas, ",")

Uh oh!

Conversation

lyzgeorge commented May 26, 2026

Summary

History

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

lyzgeorge May 26, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant