- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 10.9k
[Misc] Add penalties sampling parameters to serve tool #25974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add penalties sampling parameters to serve tool #25974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for frequency_penalty, presence_penalty, and repetition_penalty sampling parameters to the vllm bench serve tool. The implementation is correct in passing these parameters to the backend. However, there is a lack of client-side validation for the values of these new parameters. I've added a comment to suggest adding validation to improve user experience and prevent benchmark failures due to invalid inputs.
| sampling_group.add_argument( | ||
| "--frequency-penalty", | ||
| type=float, | ||
| default=None, | ||
| help="Frequency penalty sampling parameter. Only has effect on " | ||
| "openai-compatible backends.", | ||
| ) | ||
| sampling_group.add_argument( | ||
| "--presence-penalty", | ||
| type=float, | ||
| default=None, | ||
| help="Presence penalty sampling parameter. Only has effect on " | ||
| "openai-compatible backends.", | ||
| ) | ||
| sampling_group.add_argument( | ||
| "--repetition-penalty", | ||
| type=float, | ||
| default=None, | ||
| help="Repetition penalty sampling parameter. Only has effect on " | ||
| "openai-compatible backends.", | ||
| ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added penalty parameters (frequency_penalty, presence_penalty, repetition_penalty) are parsed as floats without any validation. The OpenAI API, and vLLM's implementation of it, has specific valid ranges for these parameters:
- frequency_penalty: between -2.0 and 2.0.
- presence_penalty: between -2.0 and 2.0.
- repetition_penalty: must be a positive float.
Passing values outside these ranges will cause requests to fail at the server level, which could be confusing for users running benchmarks. It would be better to add client-side validation for these parameters to provide immediate and clear feedback on invalid inputs. This could be done using a custom type function with argparse.
Head branch was pushed to by a user without write access
a9082b2    to
    fe78868      
    Compare
  
    Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
fe78868    to
    7199ec1      
    Compare
  
    …25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
…25974) Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com> Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
Adding the frequency_penalty, presence_penalty, and repetition_penalty sampling parameters to the serve tool. It allows enabling them for performance measurement.
Test Plan
Example for
frequency_penalty:Test Result