Skip to content

Conversation

@caozhiyuan
Copy link
Contributor

@caozhiyuan caozhiyuan commented Nov 3, 2025

This pull request introduces a new configuration system, structured logging, and support for the /v1/responses endpoint, along with improvements to model selection and request handling. The most important changes are grouped below:

Responses API Integration:

  • Added full support for the /v1/responses endpoint, including a new handler (src/routes/responses/handler.ts) that validates model support, streams or returns results, and logs all activity.
  • Enhanced src/routes/messages/handler.ts to route requests to the Responses API when supported by the selected model, including translation logic for payloads and results.
  • Updated the API documentation in README.md to include the new /v1/responses endpoint and clarify its purpose.

Configuration Management:

  • Added a new src/lib/config.ts module to provide persistent application configuration, including support for model-specific prompts, reasoning effort levels, and default model selection. Configuration is stored in a new config.json file in the app data directory, with automatic creation and safe permissions. [1] [2]

Logging Improvements:

  • Implemented a new buffered, file-based logging utility in src/lib/logger.ts for handler-level logging, with log rotation, retention, and structured output. Integrated this logger into key request handlers for better diagnostics. [1] [2] [3] [4] [5]

Token Counting Logic:

  • Refactored token counting in src/lib/tokenizer.ts to more accurately account for tool calls, array parameters, and model-specific behaviors (including GPT and Anthropic/Grok models). Added support for excluding certain schema keys and improved calculation for nested parameters. [1] [2] [3] [4] [5] [6] [7] [8]

Summary by CodeRabbit

  • New Features
    • New OpenAI-compatible POST /v1/responses endpoint (with streaming) and Anthropic-compatible messaging endpoints; added usage monitoring endpoints (/usage, /token).
  • Improvements
    • Disk-backed configurable defaults (config.json) and per-handler daily logging.
    • More accurate token accounting and expanded model capability metadata (vision, streaming, structured outputs).
    • Better handling of Anthropic "thinking" blocks (now include signature) and warmup small-model fallback.
  • Tests
    • New suites for response stream translation and message/response translations.

✏️ Tip: You can customize this high-level summary in your review settings.

…arsing and allign with vscode-copilot-chat extractThinkingData, otherwise it will cause miss cache occasionally
…ing small model if no tools are used

2.add bun idleTimeout = 0
3.feat: Compatible with Claude code JSONL file usage error scenarios, delay closeBlockIfOpen and map responses api to anthropic  support tool_use and fix spelling errors
4.feat: add configuration management with extra prompt handling and ensure config file creation
…just runServer to set verbose level correctly
@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Walkthrough

Adds a new Responses API (POST /responses and POST /v1/responses), disk-backed config and per-handler loggers, a bidirectional Anthropic↔Copilot translation layer (including streaming translation and function-call assembly), tokenizer refinements, route wiring, service integration, and new tests.

Changes

Cohort / File(s) Summary
Docs
README.md
Adds OpenAI-compatible POST /v1/responses, reorders endpoints, introduces Anthropic-compatible endpoints and usage-monitoring entries, and updates endpoint tables/formatting.
Configuration & Paths
src/lib/config.ts, src/lib/paths.ts, src/lib/state.ts
New disk-backed AppConfig at CONFIG_PATH with defaults and getters (getConfig, mergeConfigWithDefaults, getExtraPromptForModel, getSmallModel, getReasoningEffortForModel); PATHS exposes CONFIG_PATH; state adds verbose.
Logging
src/lib/logger.ts
New per-handler logger factory createHandlerLogger with daily log files, buffered writes, rotation/cleanup, and graceful shutdown handling; integrates with state.verbose.
Tokenization
src/lib/tokenizer.ts
Refactors tool token counting with isGpt branching, added calculatePropertiesTokens, array/property traversal, exclusions, and adjusted tool-count logic.
Copilot Responses service
src/services/copilot/create-responses.ts, src/services/copilot/get-models.ts
New comprehensive Responses types and streaming event shapes; createResponses posts to Copilot /responses; Model and ModelSupports extended with vision, streaming, structured_outputs, and supported_endpoints.
Responses routes & utils
src/routes/responses/route.ts, src/routes/responses/handler.ts, src/routes/responses/utils.ts
New responses route and handler with rate limiting, model capability checks, approval hook, streaming (SSE) support, and helpers to detect vision input and agent initiator.
Translation layer
src/routes/messages/responses-translation.ts, src/routes/messages/responses-stream-translation.ts
New bidirectional translators: Anthropic ↔ Copilot payload/result translation and streaming event translation with stateful block/function-call argument handling; exports stream state utilities and error shaping.
Messages & Chat handlers
src/routes/chat-completions/handler.ts, src/routes/messages/handler.ts, src/routes/messages/count-tokens-handler.ts, src/routes/messages/anthropic-types.ts
Handlers now use per-handler loggers; messages handler can route to Responses API when supported and may warm to configured small model; count-tokens logic adjusted (Grok tweaks); AnthropicThinkingBlock adds signature.
Server & bootstrap
src/server.ts, src/start.ts
Registers responses routes at /responses and /v1/responses; startup calls mergeConfigWithDefaults, sets state.verbose from options, delays server import until run, and sets bun idleTimeout to 0.
Tests
tests/anthropic-request.test.ts, tests/responses-stream-translation.test.ts, tests/responses-translation.test.ts
Tests updated/added for thinking block signature, streaming translation (function-call arguments assembly), and round-trip translation between Anthropic and Responses formats.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Translator
    participant Copilot as Copilot Responses API
    participant Logger

    rect rgb(235,245,255)
    Note over Client,Server: Non-streaming /v1/responses flow
    Client->>Server: POST /v1/responses (Anthropic payload)
    Server->>Logger: logger.debug(request)
    Server->>Translator: translateAnthropicMessagesToResponsesPayload()
    Translator-->>Server: ResponsesPayload
    Server->>Copilot: POST /responses (JSON)
    Copilot-->>Server: ResponsesResult (JSON)
    Server->>Translator: translateResponsesResultToAnthropic()
    Translator-->>Server: AnthropicResponse
    Server-->>Client: 200 JSON
    end

    rect rgb(255,245,235)
    Note over Client,Server: Streaming flow (stream=true)
    Client->>Server: POST /v1/responses (stream=true)
    Server->>Copilot: POST /responses (stream)
    Copilot-->>Server: stream of ResponseStreamEvent
    loop for each event
        Server->>Translator: translateResponsesStreamEvent(event, state)
        Translator-->>Server: AnthropicStreamEvents[]
        Server-->>Client: SSE chunk(s)
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Attention recommended for:

  • src/lib/tokenizer.ts — model-specific token logic and new helpers.
  • src/routes/messages/responses-stream-translation.ts — stateful streaming translation and function-call argument assembly/validation.
  • src/routes/messages/responses-translation.ts — extensive bidirectional mapping and edge-case handling.
  • src/routes/messages/handler.ts — conditional routing between legacy and Responses flows and warmup logic.
  • src/lib/logger.ts — file/stream lifecycle, buffer flushing, and shutdown behavior.

Poem

🐰 I hop through streams and stitch each arg,
Logs roll nightly while deltas spark.
Configs snug, small models on call,
Responses hum — both big and small.
A carrot cheer for every new endpoint! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title mentions three major features (responses API, configuration system, structured logging) but lacks specificity about the primary change and reads as a feature summary rather than a concise changeset description.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7835a4 and 318855e.

📒 Files selected for processing (6)
  • README.md (2 hunks)
  • src/lib/config.ts (1 hunks)
  • src/routes/messages/count-tokens-handler.ts (1 hunks)
  • src/routes/messages/handler.ts (5 hunks)
  • src/services/copilot/create-responses.ts (1 hunks)
  • src/start.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/routes/messages/handler.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/count-tokens-handler.ts
  • src/services/copilot/create-responses.ts
🧬 Code graph analysis (3)
src/start.ts (3)
src/lib/config.ts (1)
  • mergeConfigWithDefaults (99-120)
src/lib/state.ts (1)
  • state (21-27)
src/server.ts (1)
  • server (13-13)
src/lib/config.ts (1)
src/lib/paths.ts (1)
  • PATHS (10-14)
src/services/copilot/create-responses.ts (3)
src/lib/state.ts (1)
  • state (21-27)
src/lib/api-config.ts (2)
  • copilotHeaders (20-37)
  • copilotBaseUrl (16-19)
src/lib/error.ts (1)
  • HTTPError (6-13)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🔇 Additional comments (13)
src/routes/messages/count-tokens-handler.ts (1)

36-52: LGTM! Clearer logic and refined token counting.

The refactored variable name addToolSystemPromptCount is more descriptive than the previous mcpToolExist. The explicit checks for mcp__ prefixes and the single Skill tool exclusion make the intent clear. The Grok token adjustment from +480 to +120 aligns with the tokenizer refinements mentioned in the PR.

src/services/copilot/create-responses.ts (3)

8-27: LGTM! Comprehensive ResponsesPayload interface.

The interface is well-defined with appropriate optional fields and clear documentation. The note about service_tier being unsupported by GitHub Copilot is helpful context for the mutation on line 341.


207-319: LGTM! Comprehensive streaming event types.

The discriminated union of streaming event types provides excellent type safety for consuming code. The coverage includes text deltas, function call arguments, reasoning summaries, and lifecycle events (created, completed, failed). The consistent structure with sequence_number and type discriminator follows best practices.


329-359: LGTM! Clean implementation with proper error handling.

The function has a clear structure:

  1. Token validation (early return on missing token)
  2. Header construction with vision and initiator flags
  3. Request execution
  4. Error handling with logging and HTTPError
  5. Streaming vs non-streaming return paths

The payload mutation on line 341 was previously discussed and justified by the author for performance reasons (avoiding object creation). The implementation correctly handles both streaming and non-streaming responses.

Based on past review discussion on line 341.

src/start.ts (3)

9-9: LGTM! Config system integrated at startup.

The call to mergeConfigWithDefaults() early in the startup flow ensures that default configuration entries (like extraPrompts) are merged into the user's config file before the server starts processing requests. This placement is appropriate and aligns with the new configuration system introduced in the PR.

Also applies to: 31-32


38-38: LGTM! Verbose state integration and lazy server loading.

Setting state.verbose from options.verbose allows the logging infrastructure to adjust verbosity at runtime. The dynamic import of the server module on line 121 implements lazy loading, deferring module evaluation until the server is actually started. Both changes align with the PR's logging improvements and startup optimization.

Also applies to: 121-121


126-128: Verify explicit connection cleanup with idleTimeout: 0.

Setting idleTimeout: 0 disables Bun's automatic idle timeout, which is necessary for long-lived streaming connections (SSE). However, ensure that streaming handlers explicitly close or clean up connections when clients disconnect to prevent resource exhaustion.

README.md (2)

180-200: LGTM! Clear and comprehensive configuration documentation.

The documentation clearly explains:

  • Platform-specific config file locations
  • Default configuration structure
  • Purpose of each configuration field (extraPrompts, smallModel, modelReasoningEfforts)
  • How to customize and apply changes

The examples match the actual default values in the code (src/lib/config.ts), and the explanation of how extraPrompts are used in Anthropic translation is helpful for users.


210-215: LGTM! Endpoints table updated with new Responses API.

The addition of POST /v1/responses as the "most advanced interface" aligns with the PR's introduction of the Responses API. The table clearly documents both OpenAI and Anthropic compatible endpoints, providing users with a comprehensive overview of available interfaces.

src/lib/config.ts (4)

6-31: LGTM! Well-designed configuration structure.

The AppConfig interface is clean with appropriate optional fields for extensibility. The default gpt5ExplorationPrompt provides valuable guidance for efficient batching and parallel tool usage, which can help reduce token consumption. The default configuration sensibly targets gpt-5-mini as the small model with low reasoning effort for cost optimization.


35-51: LGTM! Cross-platform file creation with secure defaults.

The function properly ensures the config file exists with appropriate permissions. The chmod to 0o600 sets read/write for the owner only, which is a good security practice for configuration files. The silent catch for chmod failures (lines 47-49) is appropriate for Windows compatibility, where Unix-style permissions aren't supported.


72-97: LGTM! Correct merge logic preserving user customizations.

The merge function properly preserves user customizations while adding missing default prompts. The spread operator order on lines 91-92 (...defaultExtraPrompts, ...extraPrompts) ensures user values take precedence over defaults. The changed flag enables the caller to conditionally persist only when modifications are made, avoiding unnecessary disk writes.


122-142: LGTM! Clean getter API with sensible fallbacks.

The getter functions provide a clean, type-safe API for accessing configuration values. The lazy caching in getConfig() (line 123) efficiently defers config loading until needed. All getters have appropriate fallbacks:

  • getExtraPromptForModel: empty string (no extra prompt)
  • getSmallModel: "gpt-5-mini" (cost-effective default)
  • getReasoningEffortForModel: "high" (quality-first default)

The use of nullish coalescing (??) throughout ensures correct handling of undefined values.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ea08fe and 01d4adb.

📒 Files selected for processing (22)
  • README.md (1 hunks)
  • src/lib/config.ts (1 hunks)
  • src/lib/logger.ts (1 hunks)
  • src/lib/paths.ts (1 hunks)
  • src/lib/state.ts (1 hunks)
  • src/lib/tokenizer.ts (8 hunks)
  • src/routes/chat-completions/handler.ts (4 hunks)
  • src/routes/messages/anthropic-types.ts (1 hunks)
  • src/routes/messages/count-tokens-handler.ts (1 hunks)
  • src/routes/messages/handler.ts (5 hunks)
  • src/routes/messages/responses-stream-translation.ts (1 hunks)
  • src/routes/messages/responses-translation.ts (1 hunks)
  • src/routes/responses/handler.ts (1 hunks)
  • src/routes/responses/route.ts (1 hunks)
  • src/routes/responses/utils.ts (1 hunks)
  • src/server.ts (2 hunks)
  • src/services/copilot/create-responses.ts (1 hunks)
  • src/services/copilot/get-models.ts (2 hunks)
  • src/start.ts (2 hunks)
  • tests/anthropic-request.test.ts (2 hunks)
  • tests/responses-stream-translation.test.ts (1 hunks)
  • tests/responses-translation.test.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (13)
src/server.ts (1)
src/routes/responses/route.ts (1)
  • responsesRoutes (7-7)
src/routes/responses/handler.ts (6)
src/lib/logger.ts (1)
  • createHandlerLogger (148-182)
src/lib/rate-limit.ts (1)
  • checkRateLimit (8-46)
src/lib/state.ts (1)
  • state (21-27)
src/services/copilot/create-responses.ts (3)
  • ResponsesPayload (8-26)
  • createResponses (328-355)
  • ResponsesResult (113-131)
src/routes/responses/utils.ts (1)
  • getResponsesRequestOptions (6-13)
src/lib/approval.ts (1)
  • awaitApproval (5-15)
src/lib/logger.ts (2)
src/lib/paths.ts (1)
  • PATHS (10-14)
src/lib/state.ts (1)
  • state (21-27)
tests/responses-stream-translation.test.ts (3)
src/services/copilot/create-responses.ts (3)
  • ResponseOutputItemAddedEvent (270-275)
  • ResponseFunctionCallArgumentsDeltaEvent (247-253)
  • ResponseFunctionCallArgumentsDoneEvent (255-262)
src/routes/messages/responses-stream-translation.ts (2)
  • createResponsesStreamState (74-82)
  • translateResponsesStreamEvent (84-142)
src/routes/messages/anthropic-types.ts (1)
  • AnthropicStreamEventData (185-193)
src/routes/responses/route.ts (2)
src/routes/responses/handler.ts (1)
  • handleResponses (21-73)
src/lib/error.ts (1)
  • forwardError (15-47)
src/routes/messages/responses-translation.ts (3)
src/routes/messages/anthropic-types.ts (13)
  • AnthropicMessagesPayload (3-26)
  • AnthropicMessage (82-82)
  • AnthropicUserMessage (72-75)
  • AnthropicAssistantMessage (77-80)
  • AnthropicUserContentBlock (62-65)
  • AnthropicAssistantContentBlock (67-70)
  • AnthropicImageBlock (33-40)
  • AnthropicThinkingBlock (56-60)
  • AnthropicToolUseBlock (49-54)
  • AnthropicToolResultBlock (42-47)
  • AnthropicTextBlock (28-31)
  • AnthropicTool (84-88)
  • AnthropicResponse (90-112)
src/services/copilot/create-responses.ts (20)
  • ResponsesPayload (8-26)
  • ResponseInputItem (89-94)
  • ResponseInputContent (96-99)
  • ResponseInputMessage (57-62)
  • ResponseInputText (101-104)
  • ResponseInputImage (106-111)
  • ResponseInputReasoning (79-87)
  • ResponseFunctionToolCallItem (64-70)
  • ResponseFunctionCallOutputItem (72-77)
  • Tool (35-35)
  • ToolChoiceOptions (28-28)
  • ToolChoiceFunction (30-33)
  • ResponsesResult (113-131)
  • ResponseOutputItem (143-146)
  • ResponseOutputContentBlock (178-181)
  • ResponseOutputReasoning (156-162)
  • ResponseReasoningBlock (164-167)
  • ResponseOutputFunctionCall (169-176)
  • ResponseOutputText (183-187)
  • ResponseOutputRefusal (189-192)
src/lib/config.ts (2)
  • getReasoningEffortForModel (92-97)
  • getExtraPromptForModel (82-85)
src/routes/messages/handler.ts (10)
src/lib/logger.ts (1)
  • createHandlerLogger (148-182)
src/lib/state.ts (1)
  • state (21-27)
src/routes/messages/anthropic-types.ts (1)
  • AnthropicMessagesPayload (3-26)
src/lib/config.ts (1)
  • getSmallModel (87-90)
src/routes/messages/non-stream-translation.ts (2)
  • translateToOpenAI (29-47)
  • translateToAnthropic (281-327)
src/services/copilot/create-chat-completions.ts (2)
  • createChatCompletions (8-47)
  • ChatCompletionResponse (95-110)
src/routes/messages/responses-translation.ts (2)
  • translateAnthropicMessagesToResponsesPayload (48-87)
  • translateResponsesResultToAnthropic (352-374)
src/routes/responses/utils.ts (1)
  • getResponsesRequestOptions (6-13)
src/services/copilot/create-responses.ts (3)
  • createResponses (328-355)
  • ResponseStreamEvent (206-219)
  • ResponsesResult (113-131)
src/routes/messages/responses-stream-translation.ts (3)
  • createResponsesStreamState (74-82)
  • translateResponsesStreamEvent (84-142)
  • buildErrorEvent (606-612)
src/routes/responses/utils.ts (1)
src/services/copilot/create-responses.ts (2)
  • ResponsesPayload (8-26)
  • ResponseInputItem (89-94)
src/lib/config.ts (1)
src/lib/paths.ts (1)
  • PATHS (10-14)
src/routes/messages/responses-stream-translation.ts (3)
src/services/copilot/create-responses.ts (15)
  • ResponseStreamEvent (206-219)
  • ResponseCreatedEvent (233-237)
  • ResponseOutputItemAddedEvent (270-275)
  • ResponseOutputItemDoneEvent (277-282)
  • ResponseFunctionCallArgumentsDeltaEvent (247-253)
  • ResponseFunctionCallArgumentsDoneEvent (255-262)
  • ResponseTextDeltaEvent (302-309)
  • ResponseReasoningSummaryTextDeltaEvent (284-291)
  • ResponseReasoningSummaryTextDoneEvent (293-300)
  • ResponseTextDoneEvent (311-318)
  • ResponseCompletedEvent (221-225)
  • ResponseIncompleteEvent (227-231)
  • ResponseFailedEvent (264-268)
  • ResponseErrorEvent (239-245)
  • ResponsesResult (113-131)
src/routes/messages/anthropic-types.ts (1)
  • AnthropicStreamEventData (185-193)
src/routes/messages/responses-translation.ts (1)
  • translateResponsesResultToAnthropic (352-374)
src/routes/chat-completions/handler.ts (2)
src/lib/logger.ts (1)
  • createHandlerLogger (148-182)
src/lib/state.ts (1)
  • state (21-27)
src/services/copilot/create-responses.ts (3)
src/lib/state.ts (1)
  • state (21-27)
src/lib/api-config.ts (2)
  • copilotHeaders (20-37)
  • copilotBaseUrl (16-19)
src/lib/error.ts (1)
  • HTTPError (6-13)
src/start.ts (2)
src/lib/state.ts (1)
  • state (21-27)
src/server.ts (1)
  • server (13-13)
🔇 Additional comments (16)
src/lib/paths.ts (2)

8-8: LGTM!

The CONFIG_PATH constant follows the established pattern and integrates cleanly with the existing paths system.


19-19: LGTM!

Ensuring the config file exists at startup is the correct approach for persistent configuration.

src/lib/state.ts (1)

18-18: LGTM!

The verbose flag integrates cleanly with the new logging system and follows the existing state pattern.

src/routes/responses/route.ts (1)

1-15: LGTM!

The route setup follows established patterns and includes proper error handling via forwardError.

src/start.ts (3)

34-34: LGTM!

Setting state.verbose early ensures the logging system is properly configured before server startup.


117-117: Clarify the rationale for dynamic server import.

The server module is now imported dynamically rather than at the top level. While this can be beneficial for deferring initialization until after setup, it also makes the dependency less obvious.

Could you clarify:

  1. What specific issue does the dynamic import solve?
  2. Are there side effects in the server module that need to be delayed?
  3. Does this affect error reporting during server initialization?

122-124: Verify the implications of disabling idle timeout.

Setting idleTimeout: 0 disables the idle connection timeout, which means connections can remain open indefinitely. This could potentially:

  • Prevent proper resource cleanup
  • Allow connections to accumulate if clients don't properly close them
  • Impact server resource usage under certain conditions

Please verify:

  1. Is there a specific issue with Bun's default idle timeout behavior?
  2. Have you tested this under load with long-lived connections?
  3. Are there application-level timeouts to prevent resource exhaustion?
src/server.ts (2)

9-9: LGTM!

The import follows the established pattern for route modules.


25-25: LGTM!

Mounting the responses route at both /responses and /v1/responses (line 31) provides consistent API versioning compatibility.

tests/anthropic-request.test.ts (2)

139-139: LGTM!

The test properly includes the new required signature field on thinking blocks, maintaining consistency with the updated AnthropicThinkingBlock interface.


172-172: LGTM!

The test data correctly includes the signature field for thinking blocks with tool calls.

src/routes/chat-completions/handler.ts (4)

6-6: LGTM!

The migration to per-handler logging improves log organization and aligns with the new structured logging system.


17-17: LGTM!

Creating a dedicated logger instance for the chat-completions handler enables better log filtering and rotation.


23-23: LGTM!

The logging calls are appropriately updated to use the handler-specific logger instance.


44-50: Good defensive coding with the isNullish check.

The isNullish guard properly handles cases where max_tokens is null or undefined, preventing potential runtime issues when accessing selectedModel?.capabilities.limits.max_output_tokens.

src/routes/messages/anthropic-types.ts (1)

56-60: Signature field is properly handled across all translation layers and streaming.

All code paths that construct AnthropicThinkingBlock objects correctly include the signature field:

  • In responses-translation.ts:389, thinking blocks are constructed with signature: (item.encrypted_content ?? "") + "@" + item.id
  • In responses-stream-translation.ts:199-207, signature deltas are properly emitted as stream events
  • In responses-translation.ts:250-251, the signature field is correctly parsed and consumed
  • Tests include the signature field with proper values

No issues found. The required field is consistently set during construction and properly propagated through all translation and streaming modules.

Comment on lines +25 to +72
logger.debug("Responses request payload:", JSON.stringify(payload))

const selectedModel = state.models?.data.find(
(model) => model.id === payload.model,
)
const supportsResponses =
selectedModel?.supported_endpoints?.includes(RESPONSES_ENDPOINT) ?? false

if (!supportsResponses) {
return c.json(
{
error: {
message:
"This model does not support the responses endpoint. Please choose a different model.",
type: "invalid_request_error",
},
},
400,
)
}

const { vision, initiator } = getResponsesRequestOptions(payload)

if (state.manualApprove) {
await awaitApproval()
}

const response = await createResponses(payload, { vision, initiator })

if (isStreamingRequested(payload) && isAsyncIterable(response)) {
logger.debug("Forwarding native Responses stream")
return streamSSE(c, async (stream) => {
for await (const chunk of response) {
logger.debug("Responses stream chunk:", JSON.stringify(chunk))
await stream.writeSSE({
id: (chunk as { id?: string }).id,
event: (chunk as { event?: string }).event,
data: (chunk as { data?: string }).data ?? "",
})
}
})
}

logger.debug(
"Forwarding native Responses result:",
JSON.stringify(response).slice(-400),
)
return c.json(response as ResponsesResult)
Copy link

@coderabbitai coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid logging full request/response payloads
These logger.debug calls write the entire request payload and every streamed chunk to disk via the new per-handler logger (retained for seven days by default). That captures user prompts, tool arguments, and model outputs verbatim, which is a high-risk compliance/PII retention issue unless you have explicit consent and data-governance controls in place. Please remove or heavily redact these logs (e.g., log only metadata like model id or event types, or gate them behind a short-lived in-memory logger) before shipping.

🤖 Prompt for AI Agents
In src/routes/responses/handler.ts around lines 25 to 72, remove or replace the
current logger.debug calls that serialize the full request payload and streamed
chunks; instead log only non-PII metadata (e.g., model id, whether
vision/initiator flags are set, event type, chunk id) and never log prompt or
chunk.data content verbatim. For the initial payload log drop
JSON.stringify(payload) and log payload.model or selectedModel.id and any
boolean flags; for streaming, log only chunk.id and chunk.event (no chunk.data)
or redact/mask it, and consider gating any full-content logging behind an
explicit opt-in short-lived in-memory flag or a secure, auditable
data-governance path before shipping. Ensure logs written here do not persist
sensitive user data and update comments/tests accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

user need run start -v to enable logger

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caozhiyuan, thank you for the clarification! I see that the logger.debug calls only write to disk when users explicitly run with the -v flag, which significantly reduces the exposure compared to always-on logging.

That said, even opt-in verbose logging carries some residual risk:

  1. User awareness: Users enabling -v for debugging may not realize it captures full prompts, tool arguments, and model outputs for 7 days.
  2. Compliance context: In some jurisdictions (GDPR, CCPA), even short-term retention of user content requires explicit consent and data-governance controls, even when opt-in.
  3. Production incidents: Someone might enable verbose mode to troubleshoot an issue and forget the retention implications.

Suggestions to further reduce risk:

  • Log metadata only in verbose mode: Even with -v, consider logging only payload.model, vision, initiator, chunk.id, and chunk.event (not chunk.data or full payloads).
  • Add a separate flag for full-content logging: Reserve a more explicit flag (e.g., --log-full-payloads) with a clear warning about PII retention.
  • Document the implications: Add a note in your README or startup message that -v logs request/response content for 7 days.

If verbose mode is specifically intended for debugging with full payloads and your users understand the implications, this design is reasonable. Just ensure it's documented and consider the suggestions above for production environments.


✏️ Learnings added
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/responses/handler.ts:25-72
Timestamp: 2025-11-11T04:15:40.752Z
Learning: In the copilot-api repository, debug logging that includes full request/response payloads is intentionally gated behind the `-v` (verbose) flag to reduce PII exposure. Users must explicitly enable verbose mode to activate detailed logging.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines +16 to +27
getPayloadItems(payload).some((item) => {
if (!("role" in item) || !item.role) {
return true
}
const role = typeof item.role === "string" ? item.role.toLowerCase() : ""
return role === "assistant"
})

export const hasVisionInput = (payload: ResponsesPayload): boolean => {
const values = getPayloadItems(payload)
return values.some((item) => containsVisionContent(item))
}
Copy link

@coderabbitai coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix initiator detection to avoid misclassifying user-driven calls.

hasAgentInitiator currently returns true for any payload item lacking a role (e.g., tool call entries) or whenever an assistant_role message appears anywhere in history. After translation, almost every request includes prior assistant/tool items, so we now send X-Initiator: agent even when the user just typed the latest message. That misreports real user traffic to Copilot and can trigger agent-only throttling or different policy handling. Please determine the initiator from the most recent message that actually has a role, ignoring items without one.

-export const hasAgentInitiator = (payload: ResponsesPayload): boolean =>
-  getPayloadItems(payload).some((item) => {
-    if (!("role" in item) || !item.role) {
-      return true
-    }
-    const role = typeof item.role === "string" ? item.role.toLowerCase() : ""
-    return role === "assistant"
-  })
+export const hasAgentInitiator = (payload: ResponsesPayload): boolean => {
+  const items = getPayloadItems(payload)
+
+  for (let index = items.length - 1; index >= 0; index -= 1) {
+    const item = items[index]
+    if (!("role" in item)) continue
+
+    const role = (item as { role?: unknown }).role
+    if (typeof role === "string" && role.length > 0) {
+      return role.toLowerCase() === "assistant"
+    }
+  }
+
+  return false
+}
🤖 Prompt for AI Agents
In src/routes/responses/utils.ts around lines 16 to 27, the current logic treats
any item lacking a role as an agent initiator and also looks anywhere in history
for assistant messages; update it to determine the initiator from the most
recent payload item that actually has a role: iterate the payload items in
reverse order, skip items without a "role" or with an empty role, take the first
item that has a role, normalize it to lowercase, and return true only if that
role equals "assistant" (otherwise return false); ensure you handle
string/possibly-nonstring role types safely and keep type signatures unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic is consistent with the create-chat-completions.ts isAgentCall logic: https://github.com/ericc-ch/copilot-api/blob/master/src/services/copilot/create-chat-completions.ts#L21

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the create-chat-completions.ts file to examine the isAgentCall logic
cat src/services/copilot/create-chat-completions.ts

Length of output: 4282


@caozhiyuan, thank you for pointing out the consistency with isAgentCall in create-chat-completions.ts. You're right that both implementations use .some() to detect agent involvement in the payload.

I can see that isAgentCall checks:

const isAgentCall = payload.messages.some((msg) =>
  ["assistant", "tool"].includes(msg.role),
)

While hasAgentInitiator has similar logic, there are a couple of differences:

  1. Missing role handling: hasAgentInitiator returns true for items without a role field, while isAgentCall doesn't encounter this case (all messages have roles)
  2. Tool role: isAgentCall checks for both "assistant" and "tool" roles, but hasAgentInitiator only checks for "assistant"

Questions to clarify the intended behavior:

  • Is the intent of X-Initiator to indicate "who sent the most recent message" or "is there any agent involvement in the conversation history"?
  • Should items without a role field (e.g., function outputs, tool call items in the Responses API structure) be treated as agent-initiated?
  • Should hasAgentInitiator also check for "tool" role like isAgentCall does for consistency?

If the current behavior (checking for any assistant/tool in the entire payload) is intentional for both endpoints, then the implementation is consistent as you noted, and the review comment's concern would apply to the broader design rather than just this specific function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responses API function outputs, tool call items without a role field .

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the ResponseInputItem type definition and related types to understand which items have role fields
cat src/services/copilot/create-responses.ts | grep -A 30 "type ResponseInputItem"

Length of output: 696


🏁 Script executed:

#!/bin/bash
# Look for all the individual input item interfaces
ast-grep --pattern 'export interface Response$_Item {
  $$$
}'

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Search for ResponseFunctionToolCallItem and ResponseFunctionCallOutputItem definitions
rg -nP -A 10 "interface Response(Function|FunctionCall)" src/services/copilot/create-responses.ts

Length of output: 1087


🏁 Script executed:

#!/bin/bash
# Get a broader view of the ResponseInputItem union and its constituents
rg -nP -B 2 -A 40 "^export type ResponseInputItem" src/services/copilot/create-responses.ts

Length of output: 1145


🏁 Script executed:

#!/bin/bash
# Check the ResponseInputMessage definition to confirm it has a role field
rg -nP -A 10 "^export interface ResponseInputMessage" src/services/copilot/create-responses.ts

Length of output: 320


@caozhiyuan, thank you for the clarification! I can now see from the type definitions that in the Responses API:

  • ResponseInputMessage has a role field (user/assistant/system/developer)
  • ResponseFunctionToolCallItem and ResponseFunctionCallOutputItem legitimately don't have a role field
  • ResponseInputReasoning also lacks a role field

So the current logic that returns true for items without a role field will catch function calls and outputs, treating them as agent-initiated. This makes sense because the presence of tool/function interaction typically indicates agent involvement in the workflow.

To clarify the intended semantics: Does the X-Initiator header indicate:

  1. Any agent involvement in the conversation history (current implementation via .some()), or
  2. Who sent the most recent conversational message (checking only the last message with a role)?

If the intent is #1 (detecting any agent involvement), then the current implementation is correct and consistent with isAgentCall. If it's #2 (identifying the initiator of the current turn), then we'd need to check only the most recent message-type item.

Could you confirm which semantic is intended for the Responses API?

🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current logic is consistent with the create-chat-completions.ts isAgentCall logic

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caozhiyuan, understood! Since the logic is intentionally consistent with the existing isAgentCall implementation in create-chat-completions.ts, and you've confirmed this twice, I'll consider this design decision final. The current implementation maintains consistency across your codebase, which is a valid architectural choice.

Thank you for clarifying! 👍

@coderabbitai
Copy link

coderabbitai bot commented Nov 11, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the responses handler to see what payload it expects
echo "=== src/routes/responses/handler.ts ==="
cat src/routes/responses/handler.ts

echo -e "\n=== src/routes/messages/handler.ts (relevant sections) ==="
# Look for the translation logic in messages handler
rg -n -A 10 -B 2 "translateAnthropicMessagesToResponsesPayload" src/routes/messages/handler.ts

echo -e "\n=== Check handleWithResponsesApi function ==="
rg -n -A 30 "handleWithResponsesApi" src/routes/messages/handler.ts

Length of output: 5433


@caozhiyuan You're absolutely correct! I apologize for the error in the sequence diagram. After reviewing the actual implementation:

  • POST /v1/responses: Accepts ResponsesPayload directly (no Anthropic translation)
  • POST /v1/messages: Accepts Anthropic format and translates it via translateAnthropicMessagesToResponsesPayload() before calling the Responses API

Here's the corrected sequence diagram:

sequenceDiagram
    participant Client
    participant Server
    participant ResponsesAPI as Copilot<br/>Responses API
    participant Translator
    participant Logger

    rect rgb(230, 245, 255)
    Note over Client,Server: /v1/responses (Native ResponsesPayload)
    Client->>Server: POST /v1/responses<br/>(ResponsesPayload)
    Server->>Logger: debug: request received
    Server->>Server: validate payload & model capability
    Server->>ResponsesAPI: POST /responses with payload
    ResponsesAPI-->>Server: ResponsesResult
    Server-->>Client: JSON response (ResponsesResult)
    end

    rect rgb(255, 240, 245)
    Note over Client,Server: /v1/messages (Anthropic Format + Translation)
    Client->>Server: POST /v1/messages<br/>(AnthropicMessagesPayload)
    Server->>Server: shouldUseResponsesApi()?
    Server->>Translator: translateAnthropicMessagesToResponsesPayload()
    Translator-->>Server: ResponsesPayload
    Server->>ResponsesAPI: POST /responses with payload
    ResponsesAPI-->>Server: ResponsesResult
    Server->>Translator: translateResponsesResultToAnthropic()
    Translator-->>Server: AnthropicResponse
    Server-->>Client: JSON response (AnthropicResponse)
    end

    rect rgb(245, 245, 230)
    Note over Client,Server: Streaming (either endpoint)
    Client->>Server: POST with stream=true
    Server->>ResponsesAPI: fetch /responses (stream)
    ResponsesAPI-->>Server: EventSource (streamed events)
    loop for each ResponseStreamEvent
        alt /v1/responses
            Server-->>Client: SSE chunk (native format)
        else /v1/messages
            Server->>Translator: translateResponsesStreamEvent()
            Translator-->>Server: Array<AnthropicStreamEventData>
            Server-->>Client: SSE chunk (Anthropic format)
        end
    end
    end

    rect rgb(245, 230, 245)
    Note over Server,Logger: Logging Path
    Server->>Logger: createHandlerLogger("handler-name")
    Logger->>Logger: ensure logs/ directory
    Logger->>Logger: append to daily log file
    Logger->>Logger: flush buffer periodically
    end
Loading

Key differences:

  • /v1/responses is the native endpoint that accepts and returns Responses API format directly
  • /v1/messages is the Anthropic-compatible endpoint that performs bidirectional translation
  • Only the /v1/messages path uses the translation functions

Thank you for catching this!

🐰
(_(\
(='.'=)
(")_(")
learns from mistake

@caozhiyuan caozhiyuan changed the title feature about responses api feature about responses api, also introduces a new configuration system, structured logging Nov 11, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/routes/messages/responses-stream-translation.ts (4)

84-142: Guard post‑completion events and backstop message_start emission

Right now translateResponsesStreamEvent never checks state.messageCompleted, and handleResponseCompleted assumes a prior "response.created"message_start event. If the upstream ever emits stray events after completion/failure or omits response.created, we could generate malformed Anthropic streams (content without a message_start, or extra deltas after message_stop).

Consider:

  • Early‑returning when a terminal state has been reached.
  • Emitting a message_start lazily from handleResponseCompleted when one hasn’t been sent yet.

For example:

 export const translateResponsesStreamEvent = (
   rawEvent: ResponseStreamEvent,
   state: ResponsesStreamState,
 ): Array<AnthropicStreamEventData> => {
+  if (state.messageCompleted) {
+    // Ignore any events after a terminal condition.
+    return []
+  }
   const eventType = rawEvent.type
   switch (eventType) {
 const handleResponseCompleted = (
   rawEvent: ResponseCompletedEvent | ResponseIncompleteEvent,
   state: ResponsesStreamState,
 ): Array<AnthropicStreamEventData> => {
   const response = rawEvent.response
   const events = new Array<AnthropicStreamEventData>()
 
+  if (!state.messageStartSent) {
+    events.push(...messageStart(state, response))
+  }
+
   closeAllOpenBlocks(state, events)
   const anthropic = translateResponsesResultToAnthropic(response)
   events.push(

Also applies to: 410-432, 479-506


215-243: functionCallState validation branch is effectively unreachable

In handleFunctionCallArgumentsDelta, you call openFunctionCallBlock before reading state.functionCallStateByOutputIndex:

const blockIndex = openFunctionCallBlock(state, { outputIndex, events })
const functionCallState =
  state.functionCallStateByOutputIndex.get(outputIndex)
if (!functionCallState) {
  return handleFunctionCallArgumentsValidationError(...)
}

Because openFunctionCallBlock always creates and registers a FunctionCallStreamState when one doesn’t exist, functionCallState will never be falsy here. The "Received function call arguments delta without an open tool call block." path can’t actually trigger.

If you still want to enforce “no delta before a tool‑use block” as a hard error, you likely meant to validate before opening a block. For example:

-  const blockIndex = openFunctionCallBlock(state, {
-    outputIndex,
-    events,
-  })
-
-  const functionCallState =
-    state.functionCallStateByOutputIndex.get(outputIndex)
-  if (!functionCallState) {
+  const existingState =
+    state.functionCallStateByOutputIndex.get(outputIndex)
+  if (!existingState) {
     return handleFunctionCallArgumentsValidationError(
       new FunctionCallArgumentsValidationError(
         "Received function call arguments delta without an open tool call block.",
       ),
       state,
       events,
     )
   }
+  const blockIndex = openFunctionCallBlock(state, {
+    outputIndex,
+    events,
+  })
+  const functionCallState = existingState

Otherwise, consider deleting the unreachable validation branch to keep the handler lean.


617-665: Assumption: one function call stream per outputIndex

openFunctionCallBlock and FunctionCallDetails key the function‑call stream solely by outputIndex:

state.functionCallStateByOutputIndex: Map<number, FunctionCallStreamState>

If the underlying Responses stream ever emits multiple function_call items for the same output_index (e.g., parallel tool calls inside a single output), they’ll be merged into a single Anthropic tool_use block with one id/name, and later calls would overwrite or reuse the first block.

If the Responses service guarantees at most one function call per output_index, this is fine but worth documenting near FunctionCallStreamState. If not, consider extending the key (e.g., using outputIndex + item_id or a composite key) so each distinct call maps to its own content block.

Also applies to: 667-693


31-55: Whitespace‑run guard looks good; consider clarifying intent

The updateWhitespaceRunState + handleFunctionCallArgumentsDelta combo nicely defends against the “infinite line breaks” tool‑call bug while preserving previous state across chunks.

Note that the current implementation:

  • Counts only \r, \n, and \t toward MAX_CONSECUTIVE_FUNCTION_CALL_WHITESPACE.
  • Treats spaces as neutral (they neither increment nor reset the run).

If the intent is “cap consecutive line breaks/tabs” this is perfect. If instead you truly want “no more than N consecutive whitespace characters of any kind”, you’d need to include " " (and possibly other Unicode whitespace) in the counter.

Also applies to: 215-272

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01d4adb and 3cdc32c.

📒 Files selected for processing (1)
  • src/routes/messages/responses-stream-translation.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-11T04:33:30.522Z
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Applied to files:

  • src/routes/messages/responses-stream-translation.ts
🧬 Code graph analysis (1)
src/routes/messages/responses-stream-translation.ts (3)
src/services/copilot/create-responses.ts (15)
  • ResponseStreamEvent (206-219)
  • ResponseCreatedEvent (233-237)
  • ResponseOutputItemAddedEvent (270-275)
  • ResponseOutputItemDoneEvent (277-282)
  • ResponseFunctionCallArgumentsDeltaEvent (247-253)
  • ResponseFunctionCallArgumentsDoneEvent (255-262)
  • ResponseTextDeltaEvent (302-309)
  • ResponseReasoningSummaryTextDeltaEvent (284-291)
  • ResponseReasoningSummaryTextDoneEvent (293-300)
  • ResponseTextDoneEvent (311-318)
  • ResponseCompletedEvent (221-225)
  • ResponseIncompleteEvent (227-231)
  • ResponseFailedEvent (264-268)
  • ResponseErrorEvent (239-245)
  • ResponsesResult (113-131)
src/routes/messages/anthropic-types.ts (1)
  • AnthropicStreamEventData (185-193)
src/routes/messages/responses-translation.ts (1)
  • translateResponsesResultToAnthropic (352-374)
🔇 Additional comments (1)
src/routes/messages/responses-stream-translation.ts (1)

57-82: Stateful block management is clean and matches Anthropic’s block model

The combination of ResponsesStreamState, openTextBlockIfNeeded, openThinkingBlockIfNeeded, and the text/thinking handlers yields:

  • Stable index assignments per (outputIndex, contentIndex) / thinking key.
  • At most one open block at a time via closeOpenBlocks, matching Anthropic’s sequential content block semantics.
  • Sensible fallbacks where *.done events synthesize a delta when no prior delta was seen.

Overall this is a solid, maintainable state machine for the streaming bridge.

Also applies to: 304-408, 508-572

Comment on lines +451 to +462
const handleErrorEvent = (
rawEvent: ResponseErrorEvent,
state: ResponsesStreamState,
): Array<AnthropicStreamEventData> => {
const message =
typeof rawEvent.message === "string" ?
rawEvent.message
: "An unexpected error occurred during streaming."

state.messageCompleted = true
return [buildErrorEvent(message)]
}
Copy link

@coderabbitai coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Ensure error events also close open content blocks

handleResponseFailed and handleFunctionCallArgumentsValidationError both call closeAllOpenBlocks, but handleErrorEvent does not. If an "error" event arrives while a text/tool/thinking block is open, Anthropic consumers will see an error without matching content_block_stop events, which can break parsers that assume blocks are well‑balanced.

Recommend aligning handleErrorEvent with the other error paths:

 const handleErrorEvent = (
   rawEvent: ResponseErrorEvent,
   state: ResponsesStreamState,
 ): Array<AnthropicStreamEventData> => {
-  const message =
-    typeof rawEvent.message === "string" ?
-      rawEvent.message
-    : "An unexpected error occurred during streaming."
-
-  state.messageCompleted = true
-  return [buildErrorEvent(message)]
+  const events: Array<AnthropicStreamEventData> = []
+  const message =
+    typeof rawEvent.message === "string"
+      ? rawEvent.message
+      : "An unexpected error occurred during streaming."
+
+  closeAllOpenBlocks(state, events)
+  state.messageCompleted = true
+  events.push(buildErrorEvent(message))
+  return events
 }

Also applies to: 574-604



---

🏁 Script executed:

```shell
#!/bin/bash
# Inspect the handleErrorEvent function and surrounding context
rg -n -A 20 "const handleErrorEvent" src/routes/messages/responses-stream-translation.ts

# Inspect handleResponseFailed
rg -n -A 20 "const handleResponseFailed" src/routes/messages/responses-stream-translation.ts

# Inspect handleFunctionCallArgumentsValidationError
rg -n -A 20 "const handleFunctionCallArgumentsValidationError" src/routes/messages/responses-stream-translation.ts

# Check the range 574-604 mentioned in the review
sed -n '574,604p' src/routes/messages/responses-stream-translation.ts

# Search for closeAllOpenBlocks usage
rg -n "closeAllOpenBlocks" src/routes/messages/responses-stream-translation.ts

Length of output: 2967



---

🏁 Script executed:

```shell
#!/bin/bash
# Find other error handlers or similar patterns after line 574
sed -n '574,650p' src/routes/messages/responses-stream-translation.ts | head -80

Length of output: 1938



---

🏁 Script executed:

```shell
#!/bin/bash
# Search for all functions that might be error handlers
rg -n "^const handle" src/routes/messages/responses-stream-translation.ts

# Also search for any patterns with "Error" or "error" in function names
rg -n "const.*[Ee]rror.*= \(" src/routes/messages/responses-stream-translation.ts

Length of output: 695


Ensure error events also close open content blocks

The review comment is correct. Verification confirms:

  • handleErrorEvent (lines 451–462) does not call closeAllOpenBlocks
  • handleResponseFailed (line 440) does call it
  • handleFunctionCallArgumentsValidationError (line 471) does call it

Without closing open blocks, an error event arriving during an open text/tool/thinking block will leave consumers with unbalanced content_block_stop events, breaking parsers expecting well-formed streams.

The recommended fix properly aligns handleErrorEvent with the established error-handling pattern:

 const handleErrorEvent = (
   rawEvent: ResponseErrorEvent,
   state: ResponsesStreamState,
 ): Array<AnthropicStreamEventData> => {
+  const events: Array<AnthropicStreamEventData> = []
   const message =
     typeof rawEvent.message === "string" ?
       rawEvent.message
     : "An unexpected error occurred during streaming."
 
+  closeAllOpenBlocks(state, events)
   state.messageCompleted = true
+  events.push(buildErrorEvent(message))
+  return events
-  return [buildErrorEvent(message)]
 }

(Note: The review's reference to "Also applies to: 574–604" appears to be a review artifact; no handler exists in that range.)

🤖 Prompt for AI Agents
In src/routes/messages/responses-stream-translation.ts around lines 451 to 462,
handleErrorEvent sets state.messageCompleted and returns a buildErrorEvent but
does not call closeAllOpenBlocks; add a call to closeAllOpenBlocks(state) before
returning so any open content blocks are closed, then keep setting
state.messageCompleted = true and return [buildErrorEvent(message)] to match the
other error handlers' pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not closeAllOpenBlocks is also ok , when sent a error event to client.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caozhiyuan, understood! I'll note that your client implementation handles error events without requiring all content blocks to be closed first. Thank you for clarifying the intended behavior.

(\__/)
(•ㅅ•)
/>📝 Noted!

✏️ Learnings added
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/responses-stream-translation.ts:451-462
Timestamp: 2025-11-24T05:50:12.345Z
Learning: In src/routes/messages/responses-stream-translation.ts, the handleErrorEvent function intentionally does not call closeAllOpenBlocks before sending an error event to the client. This is acceptable in their error handling design - clients handle error events without requiring balanced content_block_stop events.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Some clients, like RooCode may send `service_tier` to `/responses` endpoint, but Copilot do not support this field and returns error
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/services/copilot/create-responses.ts (2)

343-347: Consider adding a timeout to the fetch request.

Long-running requests without a timeout can hang indefinitely, degrading reliability in production environments.

You can add a timeout using AbortController:

const controller = new AbortController()
const timeoutId = setTimeout(() => controller.abort(), 60000) // 60s timeout

try {
  const response = await fetch(`${copilotBaseUrl(state)}/responses`, {
    method: "POST",
    headers,
    body: JSON.stringify(requestPayload),
    signal: controller.signal,
  })
  clearTimeout(timeoutId)
  // ... rest of the logic
} catch (error) {
  clearTimeout(timeoutId)
  throw error
}

350-350: Logging the entire response object may be verbose.

The response object includes headers, status, and other metadata that might not be necessary for error diagnostics and could expose sensitive information.

Consider logging only essential details:

- consola.error("Failed to create responses", response)
+ consola.error("Failed to create responses", { 
+   status: response.status, 
+   statusText: response.statusText 
+ })
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cdc32c and f7835a4.

📒 Files selected for processing (1)
  • src/services/copilot/create-responses.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/services/copilot/create-responses.ts (3)
src/lib/state.ts (1)
  • state (21-27)
src/lib/api-config.ts (2)
  • copilotHeaders (20-37)
  • copilotBaseUrl (16-19)
src/lib/error.ts (1)
  • HTTPError (6-13)

}

// service_tier is not supported by github copilot
payload.service_tier = null
Copy link

@coderabbitai coderabbitai bot Nov 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating the input payload.

Directly mutating payload.service_tier creates an unexpected side effect for callers who may reuse the payload object. This violates the principle of least surprise.

Consider creating a copy before modification:

  const headers: Record<string, string> = {
    ...copilotHeaders(state, vision),
    "X-Initiator": initiator,
  }

- // service_tier is not supported by github copilot
- payload.service_tier = null
+ // service_tier is not supported by github copilot
+ const requestPayload = { ...payload, service_tier: null }

  const response = await fetch(`${copilotBaseUrl(state)}/responses`, {
    method: "POST",
    headers,
-   body: JSON.stringify(payload),
+   body: JSON.stringify(requestPayload),
  })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
payload.service_tier = null
const headers: Record<string, string> = {
...copilotHeaders(state, vision),
"X-Initiator": initiator,
}
// service_tier is not supported by github copilot
const requestPayload = { ...payload, service_tier: null }
const response = await fetch(`${copilotBaseUrl(state)}/responses`, {
method: "POST",
headers,
body: JSON.stringify(requestPayload),
})
🤖 Prompt for AI Agents
In src/services/copilot/create-responses.ts around line 341, the code mutates
the incoming payload by setting payload.service_tier = null; instead create a
shallow copy of the payload (e.g., using object spread or Object.assign) and
modify the copy's service_tier instead, then use that modified copy in
subsequent logic so the original payload passed by the caller remains unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

payload.service_tier = null can avoid creating an object, already tested and it works fine

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caozhiyuan, understood! If the performance benefit from avoiding object creation is valuable here and the mutation doesn't cause issues in your use case, that's a reasonable tradeoff. Thank you for clarifying the intent.


🧠 Learnings used
Learnt from: caozhiyuan
Repo: ericc-ch/copilot-api PR: 142
File: src/routes/messages/handler.ts:50-52
Timestamp: 2025-11-11T04:33:30.522Z
Learning: In src/routes/messages/handler.ts, forcing anthropicPayload.model to getSmallModel() when no tools are present is intentional behavior to fix Claude Code 2.0.28 warmup requests consuming premium model tokens. This applies to all requests without tools, not just warmup requests, and is an accepted design decision.

Copilot AI review requested due to automatic review settings December 5, 2025 15:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces comprehensive support for GitHub Copilot's Responses API alongside a new configuration system and structured logging infrastructure. The PR enables the use of advanced reasoning models through the /v1/responses endpoint while maintaining backward compatibility with the existing Chat Completions API. Key infrastructure improvements include persistent configuration management, per-handler file-based logging with rotation, and refined token counting logic.

Key Changes:

  • Added full Responses API support with request/response translation between Anthropic and OpenAI formats, including streaming and non-streaming modes
  • Introduced a persistent configuration system (config.json) for model-specific prompts, reasoning effort levels, and small model fallback
  • Implemented structured, buffered file-based logging with automatic rotation and retention for better diagnostics

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tests/responses-translation.test.ts Tests translation between Anthropic Messages and Responses API payloads/results
tests/responses-stream-translation.test.ts Tests streaming event translation for Responses API
tests/anthropic-request.test.ts Adds signature field to thinking blocks for testing
src/start.ts Initializes config at startup, moves server import to lazy-load, adds idle timeout, stores verbose flag
src/services/copilot/get-models.ts Extends model metadata with streaming, structured outputs, vision support, and supported endpoints
src/services/copilot/create-responses.ts New service for creating responses via Copilot Responses API with streaming support
src/server.ts Registers new /responses and /v1/responses routes
src/routes/responses/utils.ts Utility functions for detecting vision input and agent initiator in Responses payloads
src/routes/responses/route.ts Route handler for Responses endpoint
src/routes/responses/handler.ts Validates model support, handles streaming/non-streaming Responses API requests
src/routes/messages/responses-translation.ts Translates Anthropic Messages payloads to Responses format and vice versa
src/routes/messages/responses-stream-translation.ts Translates Responses API streaming events to Anthropic streaming format with state management
src/routes/messages/handler.ts Routes requests to Responses API for compatible models, replaces consola with handler logger
src/routes/messages/count-tokens-handler.ts Adjusts token counting logic for tools and Grok models
src/routes/messages/anthropic-types.ts Adds signature field to AnthropicThinkingBlock type
src/routes/chat-completions/handler.ts Replaces consola with handler logger for consistent logging
src/lib/tokenizer.ts Refactors token counting for tool calls, adds array parameter support, model-specific logic
src/lib/state.ts Adds verbose flag to global state
src/lib/paths.ts Adds CONFIG_PATH constant and ensures config file exists
src/lib/logger.ts New buffered file-based logging system with rotation, retention, and per-handler log files
src/lib/config.ts New configuration management system with disk persistence and default merging
README.md Documents new configuration system and /v1/responses endpoint

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

`🌐 Usage Viewer: https://ericc-ch.github.io/copilot-api?endpoint=${serverUrl}/usage`,
)

const { server } = await import("./server")
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic import of server is moved inside the function, but there's no error handling for the import failure. If the import fails, it will throw an unhandled error. Consider wrapping this in a try-catch block and providing a helpful error message.

Suggested change
const { server } = await import("./server")
let server;
try {
({ server } = await import("./server"));
} catch (err) {
consola.error("Failed to import the server module. Please ensure './server.ts' exists and is error-free.");
consola.error(err);
process.exit(1);
}

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +19
if (!("role" in item) || !item.role) {
return true
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for detecting agent initiator is inverted. The function returns true when there is NO role or when the role is "assistant", but the comment suggests it should detect agent presence. An item without a role returning true seems incorrect - typically items without roles should be user-initiated. This could cause incorrect X-Initiator headers to be sent.

Suggested change
if (!("role" in item) || !item.role) {
return true
}

Copilot uses AI. Check for mistakes.
Comment on lines +464 to +477
const handleFunctionCallArgumentsValidationError = (
error: FunctionCallArgumentsValidationError,
state: ResponsesStreamState,
events: Array<AnthropicStreamEventData> = [],
): Array<AnthropicStreamEventData> => {
const reason = error.message

closeAllOpenBlocks(state, events)
state.messageCompleted = true

events.push(buildErrorEvent(reason))

return events
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a potential issue with the handleFunctionCallArgumentsValidationError function - it closes all open blocks and marks the message as completed when encountering a validation error. However, the error event is pushed to the events array which is then returned, but the stream continues. This could lead to the stream continuing after an error is sent. Consider whether the stream should be forcefully terminated after validation errors.

Copilot uses AI. Check for mistakes.
)
}

// fix: copolit function call returning infinite line breaks until max_tokens limit
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "fix: copolit function call returning infinite line breaks" but "copolit" is misspelled. It should be "copilot".

Suggested change
// fix: copolit function call returning infinite line breaks until max_tokens limit
// fix: copilot function call returning infinite line breaks until max_tokens limit

Copilot uses AI. Check for mistakes.
if (anthropicPayload.model.startsWith("claude")) {
// https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/overview#pricing
tokenCount.input = tokenCount.input + 346
} else if (anthropicPayload.model.startsWith("grok")) {
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token count adjustment for Grok models has been reduced from 480 to 120 tokens without explanation. This is a significant change (75% reduction) that could affect billing or rate limiting calculations. Consider adding a comment explaining why this adjustment was made or verifying this is the correct value.

Suggested change
} else if (anthropicPayload.model.startsWith("grok")) {
} else if (anthropicPayload.model.startsWith("grok")) {
// Token count adjustment for Grok models was reduced from 480 to 120 tokens
// per updated documentation/pricing (see: [add link or ticket reference here]).
// This significant reduction (75%) may affect billing or rate limiting calculations.
// Please verify this is the correct value if updating related logic.

Copilot uses AI. Check for mistakes.
text,
})

const createOutPutTextContent = (text: string): ResponseInputText => ({
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelling of "OutPut" has inconsistent capitalization. It should be "Output" to match the naming convention used elsewhere (e.g., ResponseOutputText, createTextContent).

Suggested change
const createOutPutTextContent = (text: string): ResponseInputText => ({
const createOutputTextContent = (text: string): ResponseInputText => ({

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
tokens += encoder.encode(toolCall.id).length
tokens += encoder.encode(toolCall.function.name).length
tokens += encoder.encode(toolCall.function.arguments).length
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic change for calculating tool call tokens is significant - it now encodes individual fields (id, name, arguments) separately instead of the entire JSON-stringified object. However, there's no verification that this produces the same or more accurate token count. Consider adding a comment explaining why this change was made and whether it's been validated against actual API behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +166
if (Date.now() - lastCleanup > CLEANUP_INTERVAL_MS) {
cleanupOldLogs()
lastCleanup = Date.now()
}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup interval tracking uses a module-level lastCleanup variable that's shared across all logger instances. This could lead to race conditions where multiple loggers try to clean up simultaneously, or one logger prevents others from cleaning up on time. Consider moving this to per-instance tracking or using a mutex/lock.

Copilot uses AI. Check for mistakes.
type: "thinking",
thinking:
"I need to call the weather API to get current weather information.",
signature: "def456",
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature field is being added to test data but the test doesn't verify this field is properly handled or validated. Consider adding an assertion to ensure the signature is preserved through the translation.

Copilot uses AI. Check for mistakes.
// when use in codex cli, reasoning id is empty, so it will cause miss cache occasionally
const array = block.signature.split("@")
const signature = array[0]
const id = array[1]
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array index is being used but the split operation might not return the expected two parts if the signature format is different than expected. If block.signature doesn't contain "@", array[1] will be undefined, leading to an id of undefined. Consider adding validation or a fallback value.

Suggested change
const id = array[1]
const id = array.length > 1 && array[1] ? array[1] : ""

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants