Skip to content

Conversation

@luccafong
Copy link
Collaborator

@luccafong luccafong commented Sep 22, 2025

Summary: Allow skip ready check for bench serve through --ready-check-timeout-sec 0

Test Plan: vllm bench serve --ready-check-timeout-sec 0

Differential Revision: D82995002

@facebook-github-bot
Copy link

@luccafong has exported this pull request. If you are a Meta employee, you can view the originating diff in D82995002.

@mergify mergify bot added the performance Performance-related issues label Sep 22, 2025
@luccafong luccafong changed the title allow skip ready check for bench serve [benchmarks]allow skip ready check for bench serve Sep 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a --skip-ready-check flag for the bench serve command, allowing users to bypass the initial endpoint readiness check. The implementation is clear and correctly adds the new command-line argument and conditional logic. I've added one comment regarding user-facing log messages to improve clarity when the check is skipped. Additionally, while the change is functionally correct, it would benefit from corresponding test cases to verify the new flag's behavior and prevent future regressions.

@yeqcharlotte
Copy link
Collaborator

we can achieve same purpose by setting --ready-check-timeout-sec=0?

@luccafong
Copy link
Collaborator Author

we can achieve same purpose by setting --ready-check-timeout-sec=0?

I think it will raise the error directly

Lucia (Lu) Fang and others added 2 commits September 22, 2025 16:05
Summary: Allow skip ready check for bench serve through `--skip-ready-check`

Test Plan: `vllm bench serve  --skip-ready-check`

Differential Revision: D82995002

Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for this? I actually found the wait_for_endpooint util pretty handy (now I don't need to fire vllm serve and vllm bench serve command sequentially)

@luccafong
Copy link
Collaborator Author

What's the use case for this? I actually found the wait_for_endpooint util pretty handy (now I don't need to fire vllm serve and vllm bench serve command sequentially)

when the workload is huge, and we don't want to wait for 1 request to completed.

@ywang96
Copy link
Member

ywang96 commented Sep 23, 2025

when the workload is huge, and we don't want to wait for 1 request to completed.

Ah ok - then I think it's probably better to modify wait_for_endpooint so that it pings /health instead of sending one dummy request. (and maybe we make another flag --do-validate-benchmark-dataset) Do you think that makes more sense?

I also don't have a strong opinion on this current PR so going to approve it.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 23, 2025
@ywang96 ywang96 enabled auto-merge (squash) September 23, 2025 01:38
@minosfuture
Copy link
Contributor

when the workload is huge, and we don't want to wait for 1 request to completed.

Ah ok - then I think it's probably better to modify wait_for_endpooint so that it pings /health instead of sending one dummy request. (and maybe we make another flag --do-validate-benchmark-dataset) Do you think that makes more sense?

I also don't have a strong opinion on this current PR so going to approve it.

agree that pinging /health would be a better solution so we don't completely skip the readiness check (but just skip using single request test).

@ywang96 ywang96 merged commit eea1783 into vllm-project:main Sep 23, 2025
42 checks passed
@ywang96
Copy link
Member

ywang96 commented Sep 23, 2025

when the workload is huge, and we don't want to wait for 1 request to completed.

Ah ok - then I think it's probably better to modify wait_for_endpooint so that it pings /health instead of sending one dummy request. (and maybe we make another flag --do-validate-benchmark-dataset) Do you think that makes more sense?
I also don't have a strong opinion on this current PR so going to approve it.

agree that pinging /health would be a better solution so we don't completely skip the readiness check (but just skip using single request test).

@minosfuture Feel free to make a follow-up PR!

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: charlifu <charlifu@amd.com>
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
gjc0824 pushed a commit to gjc0824/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: gaojc <1055866782@qq.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
sducouedic pushed a commit to sducouedic/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants