Closed
Description
🚀 The feature, motivation and pitch
I noticed that after deploying the embedding model using vllm, the requests do not fully match the OpenAI format.
This issue has always been there:
{
"object": "error",
"message": "dimensions is currently not supported",
"type": "BadRequestError",
"param": null,
"code": 400
}
Alternatives
My script:
vllm serve "/workspace/share_data/base_llms/bce-embedding-base_v1" \
--served-model-name "bce-embedding-base_v1" \
--task "embedding" \
--trust-remote-code \
--host "0.0.0.0" \
--port 8000 \
--dtype auto \
--gpu-memory-utilization 0.4 \
--kv-cache-dtype auto \
--enable-prefix-caching \
--tensor-parallel-size 1 \
--max-num-seqs 256
vllm version: 0.7.2
Additional context
I hope this feature can be added as soon as possible, as it will be immensely helpful for building the knowledge base.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Type
Projects
Status
Done