[Bug]: Wired problem about max token length in Qwen2.5-Math-RM-72B reward task

### Your current environment



```text
ValueError: The decoder prompt (length 8274) is longer than the maximum model length of 4096. Make sure that `max_model_len` is no smaller than the number of text tokens.
```



### 🐛 Describe the bug

I used 'reward' task of Qwen2.5-Math-RM-72B to process long prompts which is normal in the original huggingface implementation, but get the above error in vllm. Rope scaling can enable a normal run, but i'm not sure this is right. BTW, i tried to set max_model_len, no error is reported but can get __nan__ tensor, which is wired. Also, i check the original config of Qwen2.5-Math-RM-72B, the max position embedding is indeed 4096, and i use their tokenizer to print rm_tokenizer.model_max_length, the result is 131072. So I'm really confused what is wrong...

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Bug]: Wired problem about max token length in Qwen2.5-Math-RM-72B reward task #20828

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Bug]: Wired problem about max token length in Qwen2.5-Math-RM-72B reward task #20828

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions