Open
Description
Your current environment
>>> import vllm
INFO 02-12 20:27:04 __init__.py:190] Automatically detected platform cuda.
>>> vllm.__version__
'0.7.2'
🐛 Describe the bug
Hi,
It looks like Qwen models can generate tokens out of vocabulary. We can see this by feeding the generate tokens to the model which sometimes result in the following exception: Token id 151779 is out of vocabulary
. Here is a minimal code to reproduce this error.
import vllm
from transformers import AutoTokenizer
import numpy as np
PROMPT = """
<|im_start|>system
Please reason step by step, and put your final answer within \\boxed{}.<|im_end|>
<|im_start|>user
The equation $a^7xy-a^6y-a^5x=a^4(b^4-1)$ is equivalent to the equation $(a^mx-a^n)(a^py-a^2)=a^4b^4$ for some integers $m$, $n$, and $p$. Find $mnp$.<|im_end|>
<|im_start|>assistant
"""
if __name__ == '__main__':
model_path = "Qwen/Qwen2.5-1.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)
PROMPT_TOKEN_IDS = tokenizer.encode(PROMPT)
sampling_params = vllm.SamplingParams(temperature=1.2, max_tokens=100)
llm = vllm.LLM(model_path)
# can we now generate tokens out of vocabulary?
out_of_vocab = []
out_of_vocab_tokens = []
for i in range(100):
out = llm.generate(prompt_token_ids=PROMPT_TOKEN_IDS, sampling_params=sampling_params)
PROMPT_COMPLETION_TOKEN_IDS = PROMPT_TOKEN_IDS + list(out[0].outputs[0].token_ids)
try:
out2 = llm.generate(prompt_token_ids=PROMPT_COMPLETION_TOKEN_IDS, sampling_params=sampling_params)
out_of_vocab.append(0)
except Exception as e:
print(e)
# Extract token id from error message
token_id = int(str(e).split("Token id ")[1].split(" ")[0])
out_of_vocab_tokens.append(token_id)
out_of_vocab.append(1)
print(f"Proportion of out of vocabulary generations: {np.mean(out_of_vocab)}")
print(out_of_vocab_tokens)
selected output
Token id 151779 is out of vocabulary
Token id 151734 is out of vocabulary
...
Proportion of out of vocabulary generations: 0.03
[151925, 151779, 151734]
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.