Skip to content

[Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API #8965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Prev Previous commit
Next Next commit
Improved documentation
  • Loading branch information
schoennenbeck committed Sep 30, 2024
commit cff3fdeaf6ea77d8c4b5c00dc7bca3d7aa28d0fb
7 changes: 5 additions & 2 deletions vllm/engine/arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -802,8 +802,11 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
'--scheduling-policy',
choices=['fcfs', 'priority'],
default="fcfs",
help='The scheduling policy to use. "fcfs" (default) or "priority"'
)
help='The scheduling policy to use. "fcfs" (first come first serve'
', i.e. requests are handled in order of arrival, this is the '
'default) or "priority" (requests are handled based on given '
'priority (lower value means earlier handling) and time of '
'arrival deciding any ties).')

return parser

Expand Down
Loading