Closed
Description
Name and Version
llama-server f30f099
Operating systems
Linux
GGML backends
CUDA
Hardware
RTX 4090, CUDA
Models
E.g. Code Qwen 2.5 7B-Chat (Q8)
Problem description & steps to reproduce
llama-server stopped generating any tokens for me, regardless of model, starting with commit f30f099 from #11285.
Simply reverting the above commit, e.g. on top of todays master (6171c9d) does fix the issue for me.
To reproduce, goto http://localhost:8080, enter a question hit return, nothing happens.
First Bad Commit
Relevant log output
│ main: server is listening on http://0.0.0.0:8080 - starting the main loop │
│ srv update_slots: all slots are idle │
│ request: GET / 127.0.0.1 200 │
│ request: GET /favicon.ico 400 │
│ request: POST /v1/chat/completions 400 │