Open
Description
Your current environment
I am using docker env for vLLM: vllm/vllm-openai:v0.7.1
🐛 Describe the bug
I launched openai-compatible inference server on k8s cluster serving intfloat/multilingual-e5-large-instruct
model. This is XLMRobertaModel type which is supposed to be using mean pooling instead of last pooling. But I confirmed that the result I get from vllm server matches the one that I can get normalizing the last hidden state. I think this must have been addressed from #9387, but apparently it's not.
The command I used to launch is "python -m vllm.entrypoints.openai.api_server --model /mnt/models/e5-large ..." and the directory under /mnt/models/e5-large
looks like this:
❯ ls -laRX .
drwxr-xr-x - jisoo 1 Apr 15:00 .
drwxr-xr-x - jisoo 1 Apr 13:45 ..
drwxr-xr-x - jisoo 1 Apr 15:00 1_Pooling
lrw-r--r-- 690 jisoo 1 Apr 13:45 config.json
lrw-r--r-- 1.1G jisoo 1 Apr 13:45 model.safetensors
lrw-r--r-- 349 jisoo 1 Apr 15:00 modules.json
lrw-r--r-- 53 jisoo 1 Apr 15:00 sentence_xlm-roberta_config.json
lrw-r--r-- 5.1M jisoo 1 Apr 13:45 sentencepiece.bpe.model
lrw-r--r-- 964 jisoo 1 Apr 13:45 special_tokens_map.json
lrw-r--r-- 17M jisoo 1 Apr 13:45 tokenizer.json
lrw-r--r-- 1.2k jisoo 1 Apr 13:45 tokenizer_config.json
./1_Pooling:
drwxr-xr-x - jisoo 1 Apr 15:00 .
drwxr-xr-x - jisoo 1 Apr 15:00 ..
lrw-r--r-- 271 jisoo 1 Apr 15:00 config.json
modules.json
[
{
"idx": 0,
"name": "0",
"path": "",
"type": "sentence_transformers.models.Transformer"
},
{
"idx": 1,
"name": "1",
"path": "1_Pooling",
"type": "sentence_transformers.models.Pooling"
},
{
"idx": 2,
"name": "2",
"path": "2_Normalize",
"type": "sentence_transformers.models.Normalize"
}
]
1_Pooling/config.json
{
"word_embedding_dimension": 1024,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
}
is there something that I am missing?
Thanks!
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.