[Bug]: Qwen/Qwen2.5-1.5B-Instruct generates out of vocabulary tokens

### Your current environment

```python
>>> import vllm
INFO 02-12 20:27:04 __init__.py:190] Automatically detected platform cuda.
>>> vllm.__version__
'0.7.2'
```

</details>


### 🐛 Describe the bug

Hi,

It looks like Qwen models can generate tokens out of vocabulary. We can see this by feeding the generate tokens to the model which sometimes result in the following exception: `Token id 151779 is out of vocabulary`. Here is a minimal code to reproduce this error.


```python
import vllm
from transformers import AutoTokenizer
import numpy as np

PROMPT = """
<|im_start|>system
Please reason step by step, and put your final answer within \\boxed{}.<|im_end|>
<|im_start|>user
The equation $a^7xy-a^6y-a^5x=a^4(b^4-1)$ is equivalent to the equation $(a^mx-a^n)(a^py-a^2)=a^4b^4$ for some integers $m$, $n$, and $p$.  Find $mnp$.<|im_end|>
<|im_start|>assistant
"""

if __name__ == '__main__':
    model_path = "Qwen/Qwen2.5-1.5B-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    PROMPT_TOKEN_IDS = tokenizer.encode(PROMPT)

    sampling_params = vllm.SamplingParams(temperature=1.2, max_tokens=100)
    llm = vllm.LLM(model_path)

    # can we now generate tokens out of vocabulary?
    out_of_vocab = []
    out_of_vocab_tokens = []
    for i in range(100):
        out = llm.generate(prompt_token_ids=PROMPT_TOKEN_IDS, sampling_params=sampling_params)
        PROMPT_COMPLETION_TOKEN_IDS = PROMPT_TOKEN_IDS + list(out[0].outputs[0].token_ids)
        try:
            out2 = llm.generate(prompt_token_ids=PROMPT_COMPLETION_TOKEN_IDS, sampling_params=sampling_params)
            out_of_vocab.append(0)
        except Exception as e:
            print(e)
            # Extract token id from error message
            token_id = int(str(e).split("Token id ")[1].split(" ")[0])
            out_of_vocab_tokens.append(token_id)
            out_of_vocab.append(1)
    
    print(f"Proportion of out of vocabulary generations: {np.mean(out_of_vocab)}")
    print(out_of_vocab_tokens)
        
```

selected output

```text
Token id 151779 is out of vocabulary
Token id 151734 is out of vocabulary
...
Proportion of out of vocabulary generations: 0.03
[151925, 151779, 151734]
```

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Qwen/Qwen2.5-1.5B-Instruct generates out of vocabulary tokens #13175

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Qwen/Qwen2.5-1.5B-Instruct generates out of vocabulary tokens #13175

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions