-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
Open
Labels
feature requestNew feature or requestNew feature or requestrocmRelated to AMD ROCmRelated to AMD ROCm
Description
🚀 The feature, motivation and pitch
This is an issue that tracks PRs related to AITER https://github.com/ROCm/aiter .
AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good, unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.
Note: This issue tracker description has been reorganized from the latest to the oldest
Based on AITER commit (12 July 2025): 916bf3c
- [V1] [ROCm] [AITER] Upgrade AITER to commit
916bf3c
and bugfix APIs #20880 - [FEAT] [ROCm] [AITER]: Add AITER HIP block quant kernel #21242
- [ROCm][AITER] Support AITER Rope ops in RotaryEmbedding Module. #22521
Based on AITER commit:
Based on AITER commit: 636a9f0d56c202040e93b9560c296441b7f77233
- Add weight preshuffled PTPC FP8 GEMM ([ROCm][FEAT] Integrate AITER gemm w8a8 ptpc #19417)
Based on AITER commit: 648764942e552a8bb5fe16026703716a81f05374
- AITER MHA V1 ([Hardware][AMD] integrate aiter chunked prefill into vllm #18596) ([Hardware][AMD] integrate aiter into vllm #17710)
- Patch for new AITER commit ([ROCm] [AITER] [Bugfix] Patch for AITER commit
648764942e552a8bb5fe16026703716a81f05374
#18990) - [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) #19904
- [ROCm][FEAT] Enable Full Graph Mode in AITER MLA V1 Attn Backend (Decode Phase only) #20254
- [V1] [ROCm] Enable EP with AITER Fused MoE #20270
Enhancement
- Bugfix to enable PP with AITER MLA [Bugfix] Enable PP with AITER+V1 #19822
- Add padding to weight to use block scaled fused moe on Qwen3-235B TP4 ([Bugfix] Add padding for block-scale fused-moe weights for AITER lib #19234)
- [Bugfix][V1][ROCm] Fix AITER Flash Attention Backend (Fix API Break and Local Attention Logic: affecting Llama4) #19904
Based on AITER commit: c1debd87ce0391aa27438d9e07e76e4fea7c4b70
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) #17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" #17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 #17912)
- Use AITER fused moe external API ([FEAT] [ROCm] Upgrade AITER Fused MoE kernels. #18271)
- [FEAT][ROCm] Upgrade AITER MLA v1 backend #18338
- [FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 #18825
- Enable full context length of DeepSeekV3 [ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. #18938
Based on AITER commit: 5a77249
The kernels from #14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear #14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature #14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER #15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel #15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA #15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel #16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support #16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support #17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine #17523)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. #16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
#17857
Bugfix
Archived on 2025-05-14
The kernels from #14007 has been broken down into the following PRs for ease of review:
- AITER Linear ([FEAT] [ROCm]: Support AITER Linear #14916)
- AITER RMS Norm ([FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959)
- AITER Fused MoE + Block Scaled Fused MoE ([FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967)
- AITER Block Scaled A8W8 GEMM ([FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature #14968)
- AITER Paged Attention ([FEAT][ROCm] Integrate Paged Attention Kernel from AITER #15001)
- AITER INT8 a8w8 GEMM kernel ([FEAT] [ROCm] Add AITER int8 scaled gemm kernel #15433)
- AITER MLA ([FEAT][ROCm]: Support AITER MLA #15893)
- AITER Tkw1 for Llama4 FP8 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
([ROCm] (Deprecated) Enable AITER Tkw1 kernel #16418) - AITER CK_MoE for Llama4 BF16 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- Enable AITER Fused MoE in V1 Engine ([FEAT] [ROCm]: AITER Fused MOE V1 Support #16752) To be merged after
- AITER Tkw1 ([ROCm] Add aiter tkw1 kernel for Llama4 fp8 #16727)
- AITER CK_MoE for Llama4 ([ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints #16674)
- AITER 2Stage CK MoE [FEAT] [ROCm]: Add AITER CK 2 Stages MoE support #17110
- AITER MLA V1 ([FEAT][ROCm]: Support AITER MLA on V1 Engine #17523)
- Fix MLA Backend v0 due to AITER API change in newer version ([BugFix][AMD] Compatible patch for latest AITER(05/07/2025) #17864)
- It has been reverted (Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" #17910) as it introduced new properties that causes pre-commit to fail. The bug fix PR is ([BugFix][AMD] Compatible patch for AITER lib after 04/20 #17912)
- AITER MHA V1 ([Hardware][AMD] integrate aiter into vllm #17710)
- AITER biased group topk ([FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 #17955)
Enhancement::
- Restrict Fused MoE based on Model that are actually using the kernel [Misc][ROCm] Restrict Aiter moe to specific models. #16435
- [BugFix] [ROCm]: Bugfix and handle addition case of input for
rocm_aiter_rms_norm
#17857
Bugfix
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestrocmRelated to AMD ROCmRelated to AMD ROCm