Fix/gpt55 responses routing#3513
Conversation
The upstream Codex backend rejects HTTP /responses for newer models such as gpt-5.5 with: "routing_unsupported: GPT-5.5 routing only supports Responses-compatible requests". CodexAutoExecutor previously only chose the WebSocket transport when the downstream client also spoke WS, so HTTP-only clients (e.g. Cursor over /v1/chat/completions) always hit HTTP /responses and failed. Route to WS whenever the auth supports it AND either the downstream speaks WS OR the model is flagged prefer_websockets:true in codex_client_models.json (gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2, codex-auto-review). Sourcing the set from the embedded catalog means future Responses-only models start working without further changes.
…nses endpoint - Added tests for GPT-5.5 chat model to ensure correct usage of the /responses endpoint for both standard and streaming requests. - Updated OpenAICompatExecutor to route requests to the /responses endpoint when using GPT-5.5, enhancing compatibility with newer models. - Introduced helper functions for handling response translation and error management in streaming scenarios.
|
This pull request targeted The base branch has been automatically changed to |
There was a problem hiding this comment.
Code Review
This pull request updates the Codex and OpenAI-compatible executors to route specific models, such as gpt-5.5, through the /responses endpoint or WebSocket transport based on a dynamic model registry. Feedback highlights a compilation error due to missing helper functions, a bug where incorrect Content-Type headers are returned for non-streaming responses, and a recommendation to replace hardcoded model checks with the new registry logic. A performance optimization for SSE stream processing was also suggested.
| collectCodexOutputItemDone(eventData, outputItemsByIndex, &outputItemsFallback) | ||
| case "response.completed": | ||
| if reporter != nil { | ||
| if detail, ok := helps.ParseCodexUsage(eventData); ok { | ||
| reporter.Publish(ctx, detail) | ||
| } | ||
| } | ||
| completedData := patchCodexCompletedOutput(eventData, outputItemsByIndex, outputItemsFallback) |
There was a problem hiding this comment.
| return resp, err | ||
| } | ||
| reporter.EnsurePublished(ctx) | ||
| resp = cliproxyexecutor.Response{Payload: out, Headers: httpResp.Header.Clone()} |
There was a problem hiding this comment.
When useResponsesEndpoint is true, the upstream request is forced to stream (upstreamStream = true), meaning httpResp will have a Content-Type: text/event-stream header. However, the final response out is a single JSON object. Cloning the upstream headers directly will incorrectly tell the downstream client to expect an SSE stream. You should override the Content-Type to application/json for the final response.
respHeaders := httpResp.Header.Clone()
respHeaders.Set("Content-Type", "application/json")
resp = cliproxyexecutor.Response{Payload: out, Headers: respHeaders}| if idx := strings.LastIndex(normalized, "/"); idx >= 0 && idx < len(normalized)-1 { | ||
| normalized = strings.TrimSpace(normalized[idx+1:]) | ||
| } | ||
| return normalized == "gpt-5.5" || strings.HasPrefix(normalized, "gpt-5.5-") |
There was a problem hiding this comment.
The check for gpt-5.5 models is hardcoded here. To ensure consistency with the dynamic model catalog logic used in CodexAutoExecutor and to avoid future code changes when new models are released, consider using a shared helper that references the embedded model registry to determine if a model requires the /responses endpoint.
|
|
||
| outputItemsByIndex := make(map[int64][]byte) | ||
| var outputItemsFallback [][]byte | ||
| for _, line := range bytes.Split(rawJSON, []byte("\n")) { |
There was a problem hiding this comment.
Using bytes.Split(rawJSON, []byte("\n")) on the entire response body is less efficient than line-by-line processing, especially for large payloads. Since this function is parsing an SSE stream, consider using a bufio.Scanner with a bytes.NewReader and a large buffer (e.g., 50MB as seen in ExecuteStream) to iterate over the lines more efficiently.
scanner := bufio.NewScanner(bytes.NewReader(rawJSON))
scanner.Buffer(nil, 52_428_800) // 50MB
for scanner.Scan() {
line := scanner.Bytes()There was a problem hiding this comment.
Pull request overview
This PR updates executor routing so that model-specific transport requirements are honored (notably gpt-5.5 needing /responses or WebSocket transport), even when the downstream client uses a different protocol. It adds translation/handling for Responses API payloads and extends tests to cover both forced WebSocket routing and /responses routing for streaming and non-streaming use cases.
Changes:
- Route Codex “auto” execution to WebSockets when the selected model requires it (based on the embedded Codex model catalog), not only when the downstream is a WebSocket.
- Update
OpenAICompatExecutorto routegpt-5.5*to the/responsesendpoint and translate Responses API outputs for both non-stream and stream flows. - Add tests validating the new routing and translation behaviors for
gpt-5.5(and catalog-driven WebSocket requirements).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/runtime/executor/openai_compat_executor.go | Adds /responses routing for gpt-5.5* and Responses API translation/error handling for compat providers. |
| internal/runtime/executor/openai_compat_executor_compact_test.go | Adds tests ensuring gpt-5.5 uses /responses and translates correctly for stream and non-stream. |
| internal/runtime/executor/codex_websockets_executor.go | Adds catalog-driven logic to force WebSocket routing for models that prefer/require it. |
| internal/runtime/executor/codex_websockets_executor_test.go | Adds tests to validate catalog parsing and forced WebSocket routing for gpt-5.5 on HTTP downstream. |
Comments suppressed due to low confidence (1)
internal/runtime/executor/openai_compat_executor.go:143
- When routing non-stream downstream requests via
/responses, this path forcesstream:trueupstream but does not set anAccept: text/event-streamheader. Several executors in this repo setAcceptbased on whether the upstream request is streaming, and some providers may vary their response format based onAccept, which can break the SSE parsing inopenAICompatTranslateResponsesNonStream.
translated, _ = sjson.DeleteBytes(translated, "stream_options")
}
url := strings.TrimSuffix(baseURL, "/") + endpoint
httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(translated))
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aff797904d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return false | ||
| } | ||
| codexWSPreferredModelsOnce.Do(loadCodexWSPreferredModels) | ||
| _, ok := codexWSPreferredModels[base] |
There was a problem hiding this comment.
Match websocket-required Codex model variants
codexModelRequiresWebsockets only does an exact slug lookup (codexWSPreferredModels[base]), so variant model names like gpt-5.5-<snapshot> or other suffixed aliases are treated as non-websocket models and will be routed to HTTP by codexShouldUseWebsockets. That contradicts the new intent to force websocket transport for the gpt-5.5 family and can still produce upstream routing failures for those variants, even though similar variant handling was added on the OpenAI-compat path.
Useful? React with 👍 / 👎.
This pull request introduces enhanced routing logic for handling Codex and OpenAI-compatible models, particularly for models (like
gpt-5.5) that require WebSocket or/responsesendpoint transport. The changes ensure that these models are always routed using the appropriate transport, regardless of the downstream client's protocol, and improve compatibility and future extensibility. Comprehensive tests are added to verify the new routing logic.Codex WebSocket Routing Enhancements:
codexShouldUseWebsocketsincodex_websockets_executor.goto determine if a request should be routed via WebSockets, based on model requirements and client capabilities. This leverages an updated model catalog to dynamically support new models that require WebSockets without additional code changes. [1] [2]codexModelRequiresWebsocketsto check if a model requires WebSockets, sourcing the information from the embedded Codex client models catalog.gpt-5.5are properly routed via WebSockets, even for HTTP downstream clients.OpenAI Compatibility Executor Routing and Translation:
OpenAICompatExecutorto automatically use the/responsesendpoint forgpt-5.5and similar models, including when invoked via the OpenAI API, ensuring correct streaming and translation. [1] [2] [3]/responsesendpoint responses, including support for both streaming and non-streaming cases. [1] [2] [3] [4]Testing Improvements:
/responsesendpoints is correctly applied forgpt-5.5and similar models, covering both streaming and non-streaming scenarios.Dependency and Import Updates:
These changes make the executor more robust and future-proof for new models with special transport requirements, and improve test coverage for critical routing logic.