Skip to content

Chat dispatcher ignores body model field — always routes to primary slot #377

@thinmintdev

Description

@thinmintdev

Summary

POST /v1/chat/completions resolves every request through the legacy
slot resolver and dispatches to the primary slot, regardless of what
model the request body specifies.

Observed in production logs (2026-05-28):

2026-05-28T06:51:24 [info] dispatch.decision [hal0-dispatch]
    cache_state=legacy
    latency_ms=0.157
    model=qwen3-coder-reap-25b-a3b-q5km     ← caller asked for agent-hermes' model
    resolution_path=legacy_slot:primary     ← but we sent it to primary
    upstream=primary

The agent-hermes slot was loaded with qwen3-coder-reap-25b-a3b-q5km
on port 8002. The primary slot was loaded with the 40b coder on 8001.
A chat request asking for the 25b model was still forwarded to primary.

Root cause

Dispatcher.dispatch() (src/hal0/dispatcher/router.py) reaches Step 4
(legacy heuristics) because:

  1. The model isn't in the upstream registry (Lemonade-loaded models don't auto-register).
  2. No upstream's cached /v1/models advertises it (or the cache is cold).
  3. resolve_slot() in dispatcher/proxy.py matches by path, not by
    model name, and /v1/chat/completions always resolves to primary.

So the dropdown in the WebUI suggesting "talk to agent-hermes" is
effectively cosmetic for chat requests — they all land on primary.

Impact

  • Users can't route chat to specific slots by model name.
  • Multi-slot setups (primary + agent-hermes) effectively share the
    primary slot for all /v1/chat/completions traffic.
  • Will compound once we expose more chat-capable slots (NPU, FLM, etc.).

Proposed direction (not in scope of this issue — defer)

Either:

  • Auto-register Lemonade-loaded models into the model registry on slot transition to READY, so Step 1 finds them; OR
  • Make resolve_slot() consult slot manifests' [model] default + models lists when the path is /v1/chat/completions.

Deferred from a debug session on 2026-05-28 where we fixed the
swap-window 503 race; see related branch fix/swap-window-503.

Related

  • ADR-0006 (Lemonade migration) — registry/catalog drift was noted but not closed
  • Memory [[hal0_lemonade_hf_cache_gotchas]] — model catalog surfaces

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions