-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Benchmark] Refactor sample_requests in benchmark_throughput #3613
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! This change makes sense to me and the time saved definitely adds up if you need to run this script repetitively.
Could you just do a quick run and compare the total number of input & output tokens from default args (1000 prompts) before and after this PR? Ideally we would like to avoid any surprises (I don't think there will be any, but just making sure).
Sure. Before
After
|
I am confused why it affects the e2e throughput? Seems like sampling requests happen before we measure the elapsed time? |
Because the requests are randomly sampled, so the test requests before this PR are different from those after this PR (random.sample VS random.shuffle). |
@rkooo567 Request sampling does happen before we measure the elapsed time, but as @gty111 mentioned the set of requests is different now for a given seed since the dataset is randomly shuffled first, then the first N prompts & expected outputs are tokenized to be counted (and passed if needed). I think since input & output tokens do fall in the same ballpark, this QoL PR is worth merging especially for people who run this script a lot. |
@ywang96 Can we merge this? Just wondering what blocks the merge. |
@WoosukKwon Yea of course (back then I didn't have the merge permission) |
@ywang96 I see. 🤣 Please merge this if you think it's ready. |
@zhuohan123 Just FYI this will change the dataset sampling slightly so don't be surprised to see the results change since you have been running this script a lot... |
…oject#3613) Co-authored-by: Roger Wang <ywang@roblox.com>
Sample_requests in benchmark_throughput.py is time consuming, because it will tokenize all the prompts and completions by default. But we only use a part of them. So we can generate dataset on the fly instead of tokenizing everything.
Before
After
And the difference of throughput can be caused by the random selection.
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!