Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Kong version ($ kong version
)
3.8.0, 3.9.0
Current Behavior
When using kong
with the ai-proxy
, streaming responses are "buffered" in kong 3.9.0
but are working with the same configuration in 3.8.0
. See below for a fully reproducible example.
Expected Behavior
When requesting streaming, kong should correctly serve each SSE as soon as those are received.
Steps To Reproduce
You can use this docker-compose.yaml
, you may need to comment/uncomment the kong migrations bootstrap/up
piece:
name: local-kong
version: "3.4"
x-common-variables: &common-variables
KONG_PG_PORT: 5432
KONG_PG_USER: kong
KONG_PG_PASSWORD: kong
KONG_PG_DATABASE: kong
POSTGRES_USER: kong
POSTGRES_PASSWORD: kong
POSTGRES_DB: kong
KONG_HTTP_PROXY_PORT: 8000
KONG_HTTP_ADMIN_PORT: 8001
services:
kong-db:
networks:
- local-kong-net
image: postgres:17.2-alpine
ports:
- 5432:5432
environment:
<<: *common-variables
kong:
platform: "linux/amd64"
networks:
- local-kong-net
image: kong:3.8.0
ports:
- 8000:8000
- 8443:8443
- 8001:8001
- 8002:8002
- 8444:8444
environment:
<<: *common-variables
KONG_DATABASE: postgres
KONG_PG_HOST: kong-db
KONG_ADMIN_LISTEN: 0.0.0.0:8001
KONG_ADMIN_GUI_URL: http://localhost:8002
KONG_PROXY_LISTEN: 0.0.0.0:8000
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_PLUGINS: bundled
KONG_LOG_LEVEL: info
depends_on:
- kong-db
# command: ["kong", "migrations", "bootstrap"]
networks:
local-kong-net:
driver: bridge
Steps:
- run
docker compose up
- run the following configuration:
curl -X POST http://localhost:8001/services \ --data "name=openai" \ --data "url=https://api.openai.com" curl -X POST http://localhost:8001/routes \ --data "service.id=$(curl -s http://localhost:8001/services/openai | jq -r '.id')" \ --data name=openai \ --data "paths[]=/openai" curl -X POST http://localhost:8001/services/openai/plugins \ --header 'Content-Type: application/json' \ --header 'accept: application/json' \ --data '{ "name": "ai-proxy", "instance_name": "openai", "config": { "route_type": "llm/v1/chat", "model": { "provider": "openai" }, "auth": { "header_value": "Bearer <OPENAI_KEY>", "header_name": "Authorization" } } }'
- very streaming works:
curl -X POST http://localhost:8000/openai/v1/chat/completions \ -H 'Content-Type: application/json' \ --data-raw '{"model": "gpt-4o", "messages": [{"role": "user", "content": "What is deep learning?"}], "temperature": 0.7, "stream": true, "max_tokens": 100}'
- change the
kong
image to3.9.0
, run themigrations
and thendocker compose up
- re-run the same of point 3, kong will return all SSE at once, and also log (note that
3.8.0
does not have this warning):kong-1 | 2025/04/18 17:52:53 [warn] 1405#0: *5218 an upstream response is buffered to a temporary file /usr/local/kong/proxy_temp/4/00/0000000004 while reading upstream, client: 172.19.0.1
- disable the
ai-proxy
plugin and add the open key as bearer:-H "Authorization: Bearer $OPEN_AI_KEY"
- re-run the same of point 3, streaming will work.
Anything else?
While the steps are for openai
, I verified the same behavior with multiple providers (including bedrock
and self-hosted
).