Closed
Description
Steps to reproduce
Apply this configuration:
type: service
name: deepseek-r1
image: lmsysorg/sglang:latest
env:
- MODEL_ID=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
commands:
- python3 -m sglang.launch_server
--model-path $MODEL_ID
--port 8000
--trust-remote-code
port: 8000
# Register the model
model:
name: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
type: chat
format: openai
# Uncomment to cache downloaded models
#volumes:
# - /root/.cache/huggingface/hub:/root/.cache/huggingface/hub
# Disable authorization
auth: false
resources:
gpu: 24GB
Try requesting the OpenAI-compatible endpoint.
Actual behaviour
curl http://localhost:3000/proxy/models/bihan/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <Token>" \
-d '{
"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
"messages": [
{
"role": "user",
"content": "Hello world"
}
],
"stream": true,
"max_tokens": 512
}'
{"detail":"Invalid chunk in model stream: 1 validation error for ChatCompletionsChunkResponse\nchoices -> 0 -> finish_reason\n unexpected value; permitted: 'stop', 'length', 'tool_calls', 'eos_token' (type=value_error.const; given=; permitted=('stop', 'length', 'tool_calls', 'eos_token'))"}
Expected behaviour
The request returns chat completions chunks.
dstack version
0.18.37
Server logs
Additional information
No response