Skip to content

Comments

Add full Anthropic router replay token handling#928

Open
willccbb wants to merge 5 commits intomainfrom
codex/add-router-replay-support-to-anthropic-client
Open

Add full Anthropic router replay token handling#928
willccbb wants to merge 5 commits intomainfrom
codex/add-router-replay-support-to-anthropic-client

Conversation

@willccbb
Copy link
Member

@willccbb willccbb commented Feb 18, 2026

Motivation

  • Enable Anthropic-compatible backends (vLLM’s /v1/messages path) to return token-level outputs and router-replay payloads and have the client surface them in the same shape used for OpenAI chat completions.
  • Allow callers to pass router-replay payloads (e.g. routed_experts) through sampling_args without provider-specific branching by forwarding unknown sampling args into extra_body.

Description

  • Added parsing of token-level fields in AnthropicMessagesClient.from_native_response: prompt_token_ids, token_ids, and logprobs are validated and converted into ResponseTokens when present.
  • Implemented routed_experts decoding (base85 -> int32 -> reshape -> list) and attached decoded routed_experts to ResponseTokens when available.
  • Introduced parse_completion_logprobs and parse_tokens helpers and wired tokens=parse_tokens(response) into ResponseMessage while preserving tokens=None when fields are incomplete.
  • Kept and tightened request-side sampling-args normalization in get_native_response to validate extra_body is a mapping, fall back max_tokens when missing, and move unknown Anthropic args into sampling_args["extra_body"] so router replay payloads are forwarded unchanged.
  • Added imports (base64, numpy) and new unit tests covering request forwarding, routed-expert decoding, token extraction, and the negative case when logprobs are missing.

Testing

  • Ran style/format checks with uv run ruff check --fix verifiers/clients/anthropic_messages_client.py tests/test_client_multimodal_types.py and they passed.
  • Executed unit tests with uv run pytest tests/test_client_multimodal_types.py tests/test_client_auth_errors.py and all tests passed (20 passed).
  • Ran uv run pre-commit run --all-files; hooks initially reformatted a file and on final run the pre-commit checks passed.

Codex Task


Note

Medium Risk
Touches provider request/response translation and adds numpy/base85 decoding, which could affect Anthropic message calls and token accounting if parsing or arg-normalization has edge cases.

Overview
Adds token-level output support to AnthropicMessagesClient.from_native_response, populating ResponseMessage.tokens from prompt_token_ids, token_ids, and logprobs, and decoding optional router-replay routed_experts payloads (base85/int32/reshape) into the shared ResponseTokens shape.

Updates AnthropicMessagesClient.get_native_response to default max_tokens to 32768 when omitted and to forward unknown sampling args (e.g. routed_experts) via extra_body, with validation that extra_body is a mapping; adds focused unit tests covering forwarding behavior, defaulting, token extraction, routed-expert decoding, and the missing-logprobs fallback.

Written by Cursor Bugbot for commit 34bd2ca. This will update automatically on new commits. Configure here.

),
)
else:
routed_experts = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routed-experts decoding duplicated across two client files

Low Severity

The routed_experts decoding block — walrus-operator check, base64.b85decode, np.frombuffer with int32, .reshape, .tolist() — is copy-pasted verbatim from openai_chat_completions_client.py into anthropic_messages_client.py. Extracting this into a shared utility (e.g. in client_utils) would eliminate the duplication and ensure future fixes apply to both paths.

Fix in Cursor Fix in Web

@willccbb
Copy link
Member Author

@cursoragent

let's have ANTHROPIC_MAX_TOKENS as a global constant

when the anthropic client is initialized, we should log the default and mention that the value is required

@cursor
Copy link

cursor bot commented Feb 18, 2026

  • Added a global constant ANTHROPIC_MAX_TOKENS (set to 32768) in verifiers/clients/anthropic_messages_client.py and switched the fallback in normalize_sampling_args to use it.

  • On client initialization, AnthropicMessagesClient.setup_client now logs: that max_tokens is required for every Anthropic request and the current default value that will be used when callers omit it.

  • Linted the file and pushed the change to codex/add-router-replay-support-to-anthropic-client.

Open in Cursor Open in Web

…lt at init; use constant for fallback max_tokens

Co-authored-by: will brown <willccbb@users.noreply.github.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ willccbb
❌ cursoragent
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

.reshape(routed_experts["shape"])
.tolist()
),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Malformed routed_experts crashes entire response parsing

Medium Severity

The parse_tokens function gracefully returns None when prompt_token_ids, token_ids, or logprobs are missing or invalid, but the routed_experts decoding block (base64.b85decode + np.frombuffer + .reshape) has no try-except. If the server returns a routed_experts dict with valid data and shape keys but malformed content (e.g. corrupt base85 or shape mismatch), an unhandled ValueError propagates out of from_native_response, causing the entire response — including valid text content — to be lost as a ModelError. Wrapping the decode in a try-except and falling back to routed_experts = None would be consistent with the rest of the function's defensive design.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants