Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PP] Correct cache size check #13873

Merged
merged 1 commit into from
Feb 27, 2025
Merged

Conversation

zhengy001
Copy link
Contributor

@zhengy001 zhengy001 commented Feb 26, 2025

Single long prompt gets preempted(as cannot allocate slots) during decode and leads to recomput and incorrect output length.

Basically, it is because CacheEngine splits kv cache into separate pipeline by self.num_gpu_blocks //= parallel_config.pipeline_parallel_size. Long prompt does not get blocked even though it exceeds kv cache capacity. There should be the same logic for kv cache size check.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@youkaichao youkaichao requested a review from andoorve February 26, 2025 05:55
@youkaichao
Copy link
Member

@andoorve can you help take a look?

Copy link
Collaborator

@andoorve andoorve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thanks for taking a look!

@zhengy001
Copy link
Contributor Author

Hi @DarkLight1337,
I ran into this CI error. And your PR ce26b16 made change prompt_tokens from 202 to 201 and it is 202 again. I dont think it is related to this PR. Could you please take a look?

image

@DarkLight1337
Copy link
Member

Can you pull from latest main again? It might be fixed by #13890

@zhengy001
Copy link
Contributor Author

Can you pull from latest main again? It might be fixed by #13890

Sure.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Feb 26, 2025

Doesn't look like it's fixed. Let me open another PR.

Updated: Opened #13895

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) February 27, 2025 03:59
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 27, 2025
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
auto-merge was automatically disabled February 27, 2025 04:49

Head branch was pushed to by a user without write access

@DarkLight1337 DarkLight1337 merged commit 4b1d141 into vllm-project:main Feb 27, 2025
34 checks passed
johnnynunez pushed a commit to johnnynunez/vllm that referenced this pull request Mar 3, 2025
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Mar 5, 2025
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
Signed-off-by: Linkun Chen <github@lkchen.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants