Optimize fp8 headim64 tile #91

jmkuebler · 2025-09-15T20:14:35Z

This PR commits an optimized tile size to speedup decoding for models w/ head_dim 64 when running in FP8. It applies for example to GPT-OSS.
See vllm-project/vllm#24916 (optimization 3) for the improvement.

cc @LucasWilkinson

Signed-off-by: Jonas Kuebler <kuebj@amazon.com>

LucasWilkinson

LGTM; thanks for the contribution!

jmkuebler and others added 2 commits September 15, 2025 20:27

optimize fp8 tile sizes for headdim 64 for faster fp8 decoding

75136e4

Signed-off-by: Jonas Kuebler <kuebj@amazon.com>

fix syntax error

90a6e1d

Signed-off-by: Jonas Kuebler <kuebj@amazon.com>

jmkuebler force-pushed the optimize_fp8_headim64_tile branch from c689a53 to 90a6e1d Compare September 15, 2025 20:28

jmkuebler mentioned this pull request Sep 15, 2025

[Feature]: Make FP8 Attention fast for GPT-OSS w/ FA3 on Hopper vllm-project/vllm#24916

Open

1 task

Merge branch 'main' into optimize_fp8_headim64_tile

ab8b8e7

LucasWilkinson approved these changes Sep 19, 2025

View reviewed changes

LucasWilkinson merged commit 4695e6b into vllm-project:main Sep 19, 2025
1 check passed

LucasWilkinson mentioned this pull request Sep 24, 2025

[FA/Chore] Bump vllm-flash-attention vllm-project/vllm#25537

Merged

jmkuebler mentioned this pull request Sep 29, 2025

[Hopper] optimize decoding performance for headdim 128 fp8 #96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize fp8 headim64 tile #91

Optimize fp8 headim64 tile #91

Uh oh!

jmkuebler commented Sep 15, 2025 •

edited

Loading

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimize fp8 headim64 tile #91

Optimize fp8 headim64 tile #91

Uh oh!

Conversation

jmkuebler commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jmkuebler commented Sep 15, 2025 •

edited

Loading