Skip to content

Implement WebSocket Mode support for OpenAI Responses API #8644

@localai-bot

Description

@localai-bot

Summary

Implement WebSocket Mode support for LocalAI's OpenAI API-compatible Responses endpoint. This enables persistent WebSocket connections for long-running, tool-call-heavy agentic workflows.

Background

OpenAI recently introduced WebSocket Mode for their Responses API (https://developers.openai.com/api/docs/guides/websocket-mode). This mode enables:

  • Up to 40% faster end-to-end execution for workflows with 20+ tool calls
  • Persistent connections to /v1/responses via WebSocket
  • Incremental continuation - only send new inputs plus previous_response_id
  • Connection-local caching for low-latency continuations
  • Compatibility with Zero Data Retention (ZDR) and store=false

Technical Specifications

Connection

  • Endpoint: wss://api.openai.com/v1/responses (LocalAI: ws://<host>:<port>/v1/responses)
  • Authentication: Bearer token in header
  • Max duration: 60 minutes per connection

Message Types

1. response.create (Initial Turn)

{
  "type": "response.create",
  "model": "gpt-4o",
  "store": false,
  "input": [
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "..."}]
    }
  ],
  "tools": []
}

Note: stream and background fields are NOT used in WebSocket mode.

2. response.create with Warmup (Optional)

{
  "type": "response.create",
  "model": "gpt-4o",
  "generate": false,
  "input": [...],
  "tools": [...]
}

Returns a response_id that can be chained.

3. response.create with Continuation (Subsequent Turns)

{
  "type": "response.create",
  "model": "gpt-4o",
  "store": false,
  "previous_response_id": "resp_123",
  "input": [
    {
      "type": "function_call_output",
      "call_id": "call_123",
      "output": "tool result"
    },
    {
      "type": "message",
      "role": "user",
      "content": [{"type": "input_text", "text": "..."}]
    }
  ],
  "tools": []
}

Response Events (Server → Client)

  1. response.created
{
  "type": "response.created",
  "response": {"id": "resp_abc", "model": "...", "..."}
}
  1. response.progress
{
  "type": "response.progress",
  "response_id": "resp_abc",
  "output": [...]
}
  1. response.function_call_arguments.delta
{
  "type": "response.function_call_arguments.delta",
  "response_id": "resp_abc",
  "call_id": "call_123",
  "delta": "..."
}
  1. response.function_call_arguments.done
{
  "type": "response.function_call_arguments.done",
  "response_id": "resp_abc",
  "call_id": "call_123",
  "arguments": "..."
}
  1. response.done
{
  "type": "response.done",
  "response": {...}
}

Error Handling

  • previous_response_not_found (400): When continuing with store=false and response not in cache
  • websocket_connection_limit_reached (400): When 60-minute limit reached

Implementation Requirements

  1. WebSocket Server: Add WebSocket endpoint for /v1/responses
  2. Connection Management: Track active connections with 60-min timeout
  3. State Caching: Implement connection-local in-memory cache for responses
  4. Message Parsing: Handle all message types (response.create, etc.)
  5. Event Streaming: Send proper response events back to client
  6. Error Handling: Proper error responses for invalid states
  7. Compaction Support: Handle /responses/compact integration

Limits

  • One in-flight response at a time per connection
  • No multiplexing - multiple connections needed for parallel runs

Acceptance Criteria

  1. WebSocket endpoint accepts connections at /v1/responses
  2. response.create works for initial turn with full context
  3. response.create with previous_response_id works for continuations
  4. All response events are properly streamed to client
  5. Function call arguments are properly delta-streamed
  6. Error handling for previous_response_not_found implemented
  7. Connection timeout (60 min) is enforced
  8. Tool calling works correctly over WebSocket

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions