Skip to content

Fix training stability issues with new vLLM version #140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 23, 2025
Merged

Conversation

saum7800
Copy link
Collaborator

vllm-project/vllm#12622 -- since this commit in vLLM, if you don't pass in a generation-config, it uses whatever it finds in generation_config.json from the model if it exists. if you want to use vllm defaults, you have to explicitly pass in generation_config="vllm". which is what used to happen by default before this commit.

For RL training, we need
repetition_penalty = 1.0
top_p = 1.0
top_k = 0
temperature = 1

Changes to the above changes the logprobs returned from vLLM which we use to calculate losses and gradient updates, which leads to unstable training.

we're setting the default generation config to "vllm" to have the above sampling params, instead of the generation_config.json from the model.

@saum7800 saum7800 requested review from bradhilton and corbt May 23, 2025 20:53
Copy link
Collaborator

@bradhilton bradhilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@saum7800 saum7800 merged commit 48918e0 into main May 23, 2025
1 check passed
@saum7800 saum7800 deleted the potential_fix branch May 23, 2025 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants