[Hardware][AMD] integrate aiter chunked prefill into vllm #18596

Zzz9990 · 2025-05-23T06:49:18Z

CMD: VLLM_TORCH_PROFILER_DIR=/mnt/raid0/sixifang/vllm/vllm_profile HIP_VISIBLE_DEVICES=4,5,6,7 VLLM_ROCM_USE_AITER=1 VLLM_USE_V1=1 vllm serve /models/models--amd--Meta-Llama-3.1-8B-Instruct-FP8-KV/snapshots/fa42f9a9105c545755fea25cf69f49ac8c8b40e1/ --tensor-parallel-size 4 --gpu-memory-utilization 0.9 --trust-remote-code --disable-log-requests --block-size 16 --max-model-len 32768 --dtype float16 --quantization fp8 --no-enable-prefix-caching --max-num-batched-tokens=8192

Performance with aiter:

prompts	isl	osl	con	req_throughput	median_e2e	median_ttft	median_tpot	median_itl	output_tps	total_tps
1280	2048	1024	128	2.18	58777.36	1041.87	56.22	42.28	2236.56	6707.49

Performance without aiter:

prompts	isl	osl	con	req_throughput	median_e2e	median_ttft	median_tpot	median_itl	output_tps	total_tps
1280	2048	1024	128	0.94	135871.09	2621.44	129.89	94.26	964.17	2891.56

Signed-off-by: fsx950223 <fsx950223@outlook.com>

github-actions · 2025-05-23T06:49:27Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-05-23T06:50:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zzz9990.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

tjtanaa · 2025-05-23T15:58:07Z

@Zzz9990 What is the difference with this PR? #17710
Could you rename the PR to be semantically related to the feature you are adding? "integrate aiter into vllm" is too broad and generic

mergify · 2025-05-26T06:55:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Zzz9990.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: fsx950223 <fsx950223@outlook.com>

Signed-off-by: charlifu <charlifu@amd.com>

HAIAI · 2025-06-17T07:06:31Z

Let's get this merged soon. cc @houseroad @WoosukKwon @simon-mo

houseroad

Accept to unblock. Since only touch AMD related logic, should be safe on other platform.

Could @HAIAI or someone from AMD approve this PR?

HAIAI

LGTM

HAIAI · 2025-06-17T07:41:06Z

@Zzz9990 would you take a look at failed checks?

houseroad · 2025-06-17T09:56:08Z

https://buildkite.com/vllm/ci/builds/22190#01977cbb-a80d-4f96-93d8-a821ca8095cf failure seems related? cc: @Zzz9990

houseroad · 2025-06-17T10:06:36Z

ImportError: cannot import name 'rocm_aiter_fused_add_rms_norm' from 'vllm.model_executor.layers.layernorm' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/layernorm.py)

Signed-off-by: fsx950223 <fsx950223@outlook.com>

Zzz9990 · 2025-06-17T10:20:00Z

ImportError: cannot import name 'rocm_aiter_fused_add_rms_norm' from 'vllm.model_executor.layers.layernorm' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/layernorm.py)
Done

gshtras · 2025-06-17T14:33:12Z

Please note all the failing AMD checks. Updating from the latest should help fix the build to get started.

HAIAI · 2025-06-17T18:58:16Z

@Zzz9990 @fsx950223 can you update/rebase from main?

SageMoore

Looks reasonable. If you can delete some more of the cascade attention code in the builder/metadata that would be great. If not, it's no big deal.

SageMoore · 2025-06-17T20:10:37Z

vllm/v1/attention/backends/rocm_aiter_fa.py

+            )
+            return output
+        else:
+            raise NotImplementedError(


AFAICT you need the use_cascade_attention method for API compatibility but everything else should be able to be removed. If not it's fine. No worries.

Signed-off-by: fsx950223 <fsx950223@outlook.com>

…ct#18596) Signed-off-by: fsx950223 <fsx950223@outlook.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: charlifu <charlifu@amd.com>

…ct#18596) Signed-off-by: fsx950223 <fsx950223@outlook.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: charlifu <charlifu@amd.com> Signed-off-by: minpeter <kali2005611@gmail.com>

…ct#18596) Signed-off-by: fsx950223 <fsx950223@outlook.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: charlifu <charlifu@amd.com> Signed-off-by: Yang Wang <elainewy@meta.com>

…ct#18596) Signed-off-by: fsx950223 <fsx950223@outlook.com> Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: fsx950223 <fsx950223@outlook.com> Co-authored-by: charlifu <charlifu@amd.com>

fsx950223 added 8 commits May 8, 2025 05:34

integrate aiter

d483fc2

Signed-off-by: fsx950223 <fsx950223@outlook.com>

add env variable

4f85566

Signed-off-by: fsx950223 <fsx950223@outlook.com>

rename function

ae85e79

Signed-off-by: fsx950223 <fsx950223@outlook.com>

optimize kernels with small query lens

87ea0ba

Signed-off-by: fsx950223 <fsx950223@outlook.com>

change condition

db4bc55

Signed-off-by: fsx950223 <fsx950223@outlook.com>

Merge remote-tracking branch 'upstream/main' into fa_upstream

efe59bd

Signed-off-by: fsx950223 <fsx950223@outlook.com>

add rocm aiter backend

40654e4

Signed-off-by: fsx950223 <fsx950223@outlook.com>

new fa impl

8dd236d

Signed-off-by: fsx950223 <fsx950223@outlook.com>

Zzz9990 requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners May 23, 2025 06:49

mergify bot added the v1 label May 23, 2025

mergify bot added the needs-rebase label May 23, 2025

Zzz9990 changed the title ~~[Hardware][AMD] integrate aiter into vll~~ [Hardware][AMD] integrate aiter chunked prefill into vllm May 26, 2025

Zzz9990 closed this May 26, 2025

Zzz9990 reopened this May 26, 2025

fsx950223 added 2 commits May 26, 2025 07:09

update api

e995fdc

Signed-off-by: fsx950223 <fsx950223@outlook.com>

optimize performance

a501ff0

Signed-off-by: fsx950223 <fsx950223@outlook.com>

fsx950223 force-pushed the fa_upstream3 branch from 71081a3 to a501ff0 Compare May 26, 2025 07:09

fsx950223 requested review from hmellor, tlrmchlsmth, mgoin and zhuohan123 as code owners May 26, 2025 07:12

charlifu added 2 commits June 6, 2025 16:07

Merge branch 'main' into fa_upstream3

13255d3

use on_gfx9 instead of on_mi250_mi300

7fae8df

Signed-off-by: charlifu <charlifu@amd.com>

houseroad mentioned this pull request Jun 10, 2025

[ROCm] Add rules to automatically label ROCm related PRs #19405

Merged

4 tasks

houseroad added the rocm Related to AMD ROCm label Jun 10, 2025

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 17, 2025

houseroad approved these changes Jun 17, 2025

View reviewed changes

HAIAI approved these changes Jun 17, 2025

View reviewed changes

houseroad enabled auto-merge (squash) June 17, 2025 08:08

houseroad disabled auto-merge June 17, 2025 09:56

revert layernorm

2e6e0df

Signed-off-by: fsx950223 <fsx950223@outlook.com>

fsx950223 force-pushed the fa_upstream3 branch from 49432b1 to 2e6e0df Compare June 17, 2025 10:13

SageMoore approved these changes Jun 17, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into fa_upstream3

77c95e4

houseroad enabled auto-merge (squash) June 18, 2025 06:20

fsx950223 added 2 commits June 18, 2025 07:11

fix bug

4c9bd30

Signed-off-by: fsx950223 <fsx950223@outlook.com>

fix upstream issue

262b47f

Signed-off-by: fsx950223 <fsx950223@outlook.com>

auto-merge was automatically disabled June 18, 2025 07:35
Head branch was pushed to by a user without write access

simon-mo merged commit 8b6e1d6 into vllm-project:main Jun 18, 2025
66 of 68 checks passed

tjtanaa mentioned this pull request Jun 20, 2025

[Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) #19904

Merged

4 tasks

Uh oh!

[Hardware][AMD] integrate aiter chunked prefill into vllm #18596

[Hardware][AMD] integrate aiter chunked prefill into vllm #18596

Uh oh!

Conversation

Zzz9990 commented May 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

mergify bot commented May 23, 2025

Uh oh!

tjtanaa commented May 23, 2025

Uh oh!

mergify bot commented May 26, 2025

Uh oh!

HAIAI commented Jun 17, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

HAIAI left a comment

Choose a reason for hiding this comment

Uh oh!

HAIAI commented Jun 17, 2025

Uh oh!

houseroad commented Jun 17, 2025

Uh oh!

houseroad commented Jun 17, 2025

Uh oh!

Zzz9990 commented Jun 17, 2025

Uh oh!

gshtras commented Jun 17, 2025

Uh oh!

HAIAI commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

SageMoore Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Zzz9990 commented May 23, 2025 •

edited by github-actions bot

Loading

HAIAI commented Jun 17, 2025 •

edited

Loading