Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance]: The performance of version 0.6.3 is weaker than that of version 0.6.2 in stress testing. #9581

Open
1 task done
skylee-01 opened this issue Oct 22, 2024 · 3 comments
Labels
performance Performance-related issues

Comments

@skylee-01
Copy link

Proposal to improve performance

The performance of version 0.6.3 is weaker than that of version 0.6.2 in stress testing.
Scenario: Agent
Stress Testing Data: Input 50 tokens, output 20 tokens.

Version 0.6.3: 22 QPS, with a significant drop after reaching the maximum value.
image

Version 0.6.2: 24 QPS, with available performance reaching over 90% at 36 QPS. The decline after reaching the maximum value is gradual.
image

Under the same conditions, comparing version 0.6.3 with 0.6.2, it was found that the prefill time for version 0.6.3 is about 13ms longer per instance than version 0.6.2. The main reason is a pause of 200-300 microseconds between two blocks.
Conditions for obtaining the data: Using batch_size=20, offline use of LLMEngine with 50 iterations to get the data.

image

image
https://github.com/skylee-01/experimental_data/blob/main/nsys_vllm_062.nsys-rep

image

https://github.com/skylee-01/experimental_data/blob/main/nsys_vllm_063.nsys-rep

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@skylee-01 skylee-01 added the performance Performance-related issues label Oct 22, 2024
@youkaichao
Copy link
Member

you can follow https://docs.vllm.ai/en/latest/getting_started/installation.html#install-the-latest-code to biset all the commits, and find the commit to blame.

and if you can come up with a fix, without affecting the performance of irrelevant usecases, that would be better.

@vrdn-23
Copy link
Contributor

vrdn-23 commented Oct 23, 2024

Is this related to #9764

@skylee-01
Copy link
Author

Is this related to #9764这与#9764 相关吗?

Thank you for your reply, I will verify it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance-related issues
Projects
None yet
Development

No branches or pull requests

3 participants