Skip to content

Fix/gpt55 responses routing#3513

Open
SpectrumTantrum wants to merge 3 commits into
router-for-me:devfrom
SpectrumTantrum:fix/gpt55-responses-routing
Open

Fix/gpt55 responses routing#3513
SpectrumTantrum wants to merge 3 commits into
router-for-me:devfrom
SpectrumTantrum:fix/gpt55-responses-routing

Conversation

@SpectrumTantrum
Copy link
Copy Markdown

This pull request introduces enhanced routing logic for handling Codex and OpenAI-compatible models, particularly for models (like gpt-5.5) that require WebSocket or /responses endpoint transport. The changes ensure that these models are always routed using the appropriate transport, regardless of the downstream client's protocol, and improve compatibility and future extensibility. Comprehensive tests are added to verify the new routing logic.

Codex WebSocket Routing Enhancements:

  • Added a new function codexShouldUseWebsockets in codex_websockets_executor.go to determine if a request should be routed via WebSockets, based on model requirements and client capabilities. This leverages an updated model catalog to dynamically support new models that require WebSockets without additional code changes. [1] [2]
  • Implemented codexModelRequiresWebsockets to check if a model requires WebSockets, sourcing the information from the embedded Codex client models catalog.
  • Added tests verifying that models like gpt-5.5 are properly routed via WebSockets, even for HTTP downstream clients.

OpenAI Compatibility Executor Routing and Translation:

  • Updated OpenAICompatExecutor to automatically use the /responses endpoint for gpt-5.5 and similar models, including when invoked via the OpenAI API, ensuring correct streaming and translation. [1] [2] [3]
  • Added logic to handle translation and error handling for /responses endpoint responses, including support for both streaming and non-streaming cases. [1] [2] [3] [4]

Testing Improvements:

  • Added comprehensive tests to ensure that the new routing logic for WebSockets and /responses endpoints is correctly applied for gpt-5.5 and similar models, covering both streaming and non-streaming scenarios.

Dependency and Import Updates:

  • Updated imports in several files to include new dependencies required for model catalog parsing and JSON handling. [1] [2] [3]

These changes make the executor more robust and future-proof for new models with special transport requirements, and improve test coverage for critical routing logic.

SpectrumTantrum and others added 3 commits May 20, 2026 23:43
The upstream Codex backend rejects HTTP /responses for newer models such as
gpt-5.5 with: "routing_unsupported: GPT-5.5 routing only supports
Responses-compatible requests". CodexAutoExecutor previously only chose the
WebSocket transport when the downstream client also spoke WS, so HTTP-only
clients (e.g. Cursor over /v1/chat/completions) always hit HTTP /responses
and failed.

Route to WS whenever the auth supports it AND either the downstream speaks
WS OR the model is flagged prefer_websockets:true in
codex_client_models.json (gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex,
gpt-5.2, codex-auto-review). Sourcing the set from the embedded catalog
means future Responses-only models start working without further changes.
…nses endpoint

- Added tests for GPT-5.5 chat model to ensure correct usage of the /responses endpoint for both standard and streaming requests.
- Updated OpenAICompatExecutor to route requests to the /responses endpoint when using GPT-5.5, enhancing compatibility with newer models.
- Introduced helper functions for handling response translation and error management in streaming scenarios.
Copilot AI review requested due to automatic review settings May 22, 2026 05:11
@github-actions github-actions Bot changed the base branch from main to dev May 22, 2026 05:11
@github-actions
Copy link
Copy Markdown

This pull request targeted main.

The base branch has been automatically changed to dev.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Codex and OpenAI-compatible executors to route specific models, such as gpt-5.5, through the /responses endpoint or WebSocket transport based on a dynamic model registry. Feedback highlights a compilation error due to missing helper functions, a bug where incorrect Content-Type headers are returned for non-streaming responses, and a recommendation to replace hardcoded model checks with the new registry logic. A performance optimization for SSE stream processing was also suggested.

Comment on lines +259 to +266
collectCodexOutputItemDone(eventData, outputItemsByIndex, &outputItemsFallback)
case "response.completed":
if reporter != nil {
if detail, ok := helps.ParseCodexUsage(eventData); ok {
reporter.Publish(ctx, detail)
}
}
completedData := patchCodexCompletedOutput(eventData, outputItemsByIndex, outputItemsFallback)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The functions collectCodexOutputItemDone and patchCodexCompletedOutput are called within openAICompatTranslateResponsesNonStream but are not defined in this file or package. This will result in a compilation error. Please ensure these helper functions are implemented and included in the PR.

return resp, err
}
reporter.EnsurePublished(ctx)
resp = cliproxyexecutor.Response{Payload: out, Headers: httpResp.Header.Clone()}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When useResponsesEndpoint is true, the upstream request is forced to stream (upstreamStream = true), meaning httpResp will have a Content-Type: text/event-stream header. However, the final response out is a single JSON object. Cloning the upstream headers directly will incorrectly tell the downstream client to expect an SSE stream. You should override the Content-Type to application/json for the final response.

		respHeaders := httpResp.Header.Clone()
		respHeaders.Set("Content-Type", "application/json")
		resp = cliproxyexecutor.Response{Payload: out, Headers: respHeaders}

if idx := strings.LastIndex(normalized, "/"); idx >= 0 && idx < len(normalized)-1 {
normalized = strings.TrimSpace(normalized[idx+1:])
}
return normalized == "gpt-5.5" || strings.HasPrefix(normalized, "gpt-5.5-")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check for gpt-5.5 models is hardcoded here. To ensure consistency with the dynamic model catalog logic used in CodexAutoExecutor and to avoid future code changes when new models are released, consider using a shared helper that references the embedded model registry to determine if a model requires the /responses endpoint.


outputItemsByIndex := make(map[int64][]byte)
var outputItemsFallback [][]byte
for _, line := range bytes.Split(rawJSON, []byte("\n")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using bytes.Split(rawJSON, []byte("\n")) on the entire response body is less efficient than line-by-line processing, especially for large payloads. Since this function is parsing an SSE stream, consider using a bufio.Scanner with a bytes.NewReader and a large buffer (e.g., 50MB as seen in ExecuteStream) to iterate over the lines more efficiently.

	scanner := bufio.NewScanner(bytes.NewReader(rawJSON))
	scanner.Buffer(nil, 52_428_800) // 50MB
	for scanner.Scan() {
		line := scanner.Bytes()

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates executor routing so that model-specific transport requirements are honored (notably gpt-5.5 needing /responses or WebSocket transport), even when the downstream client uses a different protocol. It adds translation/handling for Responses API payloads and extends tests to cover both forced WebSocket routing and /responses routing for streaming and non-streaming use cases.

Changes:

  • Route Codex “auto” execution to WebSockets when the selected model requires it (based on the embedded Codex model catalog), not only when the downstream is a WebSocket.
  • Update OpenAICompatExecutor to route gpt-5.5* to the /responses endpoint and translate Responses API outputs for both non-stream and stream flows.
  • Add tests validating the new routing and translation behaviors for gpt-5.5 (and catalog-driven WebSocket requirements).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
internal/runtime/executor/openai_compat_executor.go Adds /responses routing for gpt-5.5* and Responses API translation/error handling for compat providers.
internal/runtime/executor/openai_compat_executor_compact_test.go Adds tests ensuring gpt-5.5 uses /responses and translates correctly for stream and non-stream.
internal/runtime/executor/codex_websockets_executor.go Adds catalog-driven logic to force WebSocket routing for models that prefer/require it.
internal/runtime/executor/codex_websockets_executor_test.go Adds tests to validate catalog parsing and forced WebSocket routing for gpt-5.5 on HTTP downstream.
Comments suppressed due to low confidence (1)

internal/runtime/executor/openai_compat_executor.go:143

  • When routing non-stream downstream requests via /responses, this path forces stream:true upstream but does not set an Accept: text/event-stream header. Several executors in this repo set Accept based on whether the upstream request is streaming, and some providers may vary their response format based on Accept, which can break the SSE parsing in openAICompatTranslateResponsesNonStream.
		translated, _ = sjson.DeleteBytes(translated, "stream_options")
	}

	url := strings.TrimSuffix(baseURL, "/") + endpoint
	httpReq, err := http.NewRequestWithContext(ctx, http.MethodPost, url, bytes.NewReader(translated))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aff797904d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

return false
}
codexWSPreferredModelsOnce.Do(loadCodexWSPreferredModels)
_, ok := codexWSPreferredModels[base]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match websocket-required Codex model variants

codexModelRequiresWebsockets only does an exact slug lookup (codexWSPreferredModels[base]), so variant model names like gpt-5.5-<snapshot> or other suffixed aliases are treated as non-websocket models and will be routed to HTTP by codexShouldUseWebsockets. That contradicts the new intent to force websocket transport for the gpt-5.5 family and can still produce upstream routing failures for those variants, even though similar variant handling was added on the OpenAI-compat path.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants