Chat completions Streaming TTFT delay with llama-stack-client (TypeScript) vs OpenAI SDK (TypeScript)

When streaming chat completions from our server endpoint, we consistently see an additional delay before the first streamed chunk (TTFT) when using the LlamaStack TypeScript client, while the OpenAI Node SDK starts streaming almost immediately under the same conditions. The behavior is reproducible across the latest release and v0.4.0‑alpha.7 of llama-stack-client. The issue appears specific to the LlamaStack TypeScript SDK’s streaming path.

### Environment
* Node.js: v22.4.1
* LlamaStack TypeScript client versions tested:
    * llama-stack-client (latest from npm at time of filing)
    * llama-stack-client@0.4.0-alpha.7
* OpenAI Node SDK: openai (latest)
* Request: streaming chat completion (stream: true)

```
    const stream = await client.chat.completions.create({
        model: model,
        stream: true,
        messages: [
            { role: "system", content: "You are a helpful assistant." },
            { role: "user", content: "Explain SSE streaming in one paragraph." },
        ],
        temperature: 0.7,
    });
```

Will attach full script. Change the import statement to switch between openai/llamastack. 

[stream-chat.js](https://github.com/user-attachments/files/24897098/stream-chat.js)

### Evidence

Reproducible on all servers we tested including a local instance. 

LlamaStack (remote serve) — ~2.0s delay before first chunk
```
HTTP 200
alt-svc: h3=":443"; ma=93600
cache-control: max-age=0, no-cache, no-store
connection: keep-alive, Transfer-Encoding
content-type: text/event-stream;charset=utf-8
date: Tue, 27 Jan 2026 23:56:57 GMT
expires: Tue, 27 Jan 2026 23:56:57 GMT
pragma: no-cache
server-timing: cdn-cache; desc=MISS, edge; dur=1727, origin; dur=188, ak_p; desc="..."
strict-transport-security: max-age=15768000 ; includeSubDomains ; preload
transfer-encoding: chunked
x-correlation-id: 137df151-ee4f-448f-9686-9e8fed4f4c90
x-envoy-upstream-service-time: 158
x-request-id: 548d533c-1d63-4d78-b5a9-48ce7555fb07

[+1.996s] S[+1.998s] SE[+1.998s] ...
```

OpenAI SDK (remote server) — ~0.38–0.47s delay before first chunk
```
HTTP 200
alt-svc: h3=":443"; ma=93600
cache-control: max-age=0, no-cache, no-store
connection: keep-alive, Transfer-Encoding
content-type: text/event-stream;charset=utf-8
date: Tue, 27 Jan 2026 23:57:23 GMT
expires: Tue, 27 Jan 2026 23:57:23 GMT
pragma: no-cache
server-timing: cdn-cache; desc=MISS, edge; dur=81, origin; dur=163, ak_p; desc="..."
strict-transport-security: max-age=15768000 ; includeSubDomains ; preload
transfer-encoding: chunked
x-correlation-id: 872d2063-0805-4306-b0e5-679c84e130fd
x-envoy-upstream-service-time: 153
x-request-id: dda88dc6-1ea1-4ad7-8b47-812649bac8be

[+0.376s] S[+0.383s] SE[+0.389s] ...
```

LlamaStack (hitting localhost) — ~0.66–0.78s TTFT
```
HTTP 200
cache-control: no-cache
connection: keep-alive
content-type: text/event-stream;charset=utf-8
date: Tue, 27 Jan 2026 23:59:16 GMT
transfer-encoding: chunked
x-correlation-id: 98401572-d04d-4eaf-a6e3-9c6595c533b4
x-request-id: 3b554cfe-2693-4983-baf8-d99181e7391c

[+0.659s] Stream[+0.662s] ing[+0.662s] ...
```

OpenAI SDK (hitting localhost) — ~0.66–0.78s TTFT 
```
HTTP 200
cache-control: no-cache
connection: keep-alive
content-type: text/event-stream;charset=utf-8
date: Tue, 27 Jan 2026 23:59:28 GMT
transfer-encoding: chunked
x-correlation-id: c2d7b76f-b522-42a3-b5b1-ef3b4359a647
x-request-id: 4bee3e1f-5d97-4b79-9299-01a5d26a6d9b

[+0.394s] S[+0.395s] SE[+0.396s]...
```

### Request

Could you help verify whether the stream iterator in llama-stack-client is introducing initial buffering before yielding the first data: frame?
If so, can we get a fix (or a knob) to emit the first chunk immediately once received?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat completions Streaming TTFT delay with llama-stack-client (TypeScript) vs OpenAI SDK (TypeScript) #53

Environment

Evidence

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chat completions Streaming TTFT delay with llama-stack-client (TypeScript) vs OpenAI SDK (TypeScript) #53

Description

Environment

Evidence

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions