[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

SageMoore · 2025-02-11T15:17:50Z

This patch changes the prefix_prefill kernel so that it will calculate the context length using the query length and the sequence length, both of which are already passed in. This makes the kernel a bit more usable on V1 where we don't keep track of the context lengths tensor in the attention meta data.

Signed-off-by: Sage Moore <sage@neuralmagic.com>

github-actions · 2025-02-11T15:18:06Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Sage Moore <sage@neuralmagic.com>

…prefix-prefill-refactor

Signed-off-by: Sage Moore <sage@neuralmagic.com>

tlrmchlsmth · 2025-02-20T20:31:43Z

vllm/attention/ops/prefix_prefill.py

-        cur_batch_ctx_len = tl.load(B_Ctxlen + cur_batch)
        cur_batch_seq_len = tl.load(B_Seqlen + cur_batch)
        cur_batch_in_all_start_index = tl.load(B_Start_Loc + cur_batch)
-        cur_batch_query_len = cur_batch_seq_len - cur_batch_ctx_len
+        cur_batch_in_all_stop_index = tl.load(B_Start_Loc + cur_batch + 1)
+        cur_batch_query_len = (cur_batch_in_all_stop_index -
+                               cur_batch_in_all_start_index)
+        cur_batch_ctx_len = cur_batch_seq_len - cur_batch_query_len


Do you think it would make sense to factor some of this out, since there's some repetition? I am suggesting this but can't immediately tell if it would actually improve things

this might be looking better in follow up PR: #13305

tlrmchlsmth

LGTM, thanks!

… longer has to pass in the context lengths (vllm-project#13095)

… longer has to pass in the context lengths (vllm-project#13095) Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

… longer has to pass in the context lengths (vllm-project#13095)

SageMoore added 2 commits February 10, 2025 23:24

init

95df571

Signed-off-by: Sage Moore <sage@neuralmagic.com>

init

0bfe435

Signed-off-by: Sage Moore <sage@neuralmagic.com>

SageMoore marked this pull request as ready for review February 11, 2025 15:32

SageMoore requested review from tlrmchlsmth and WoosukKwon as code owners February 11, 2025 15:32

SageMoore mentioned this pull request Feb 11, 2025

[ROCm][V1] Add intial ROCm support to V1 #12790

Merged

SageMoore added 4 commits February 11, 2025 19:12

minor fix

d2f3c85

Signed-off-by: Sage Moore <sage@neuralmagic.com>

Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…

c7497f3

…prefix-prefill-refactor

Merge branch 'main' of https://github.com/neuralmagic/vllm into sage/…

96f2dcb

…prefix-prefill-refactor

update rocm attention backend

a097ddd

Signed-off-by: Sage Moore <sage@neuralmagic.com>

SageMoore requested review from robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners February 17, 2025 17:15

mergify bot added the v1 label Feb 17, 2025

maleksan85 mentioned this pull request Feb 20, 2025

[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 #13305

Merged

tlrmchlsmth reviewed Feb 20, 2025

View reviewed changes

tlrmchlsmth approved these changes Feb 20, 2025

View reviewed changes

tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 20, 2025

Merge branch 'main' into sage/prefix-prefill-refactor

13cf6d0

simon-mo merged commit 558db80 into vllm-project:main Feb 22, 2025
42 of 46 checks passed

SageMoore deleted the sage/prefix-prefill-refactor branch March 3, 2025 15:18

Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no…

3ffae46

… longer has to pass in the context lengths (vllm-project#13095)

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no…

9942a1a

… longer has to pass in the context lengths (vllm-project#13095) Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no…

a39cdc8

… longer has to pass in the context lengths (vllm-project#13095)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

Uh oh!

SageMoore commented Feb 11, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 11, 2025

Uh oh!

tlrmchlsmth Feb 20, 2025 •

edited

Loading

Uh oh!

maleksan85 Feb 22, 2025

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths #13095

Uh oh!

Conversation

SageMoore commented Feb 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 11, 2025

Uh oh!

tlrmchlsmth Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maleksan85 Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SageMoore commented Feb 11, 2025 •

edited by github-actions bot

Loading

tlrmchlsmth Feb 20, 2025 •

edited

Loading