[build] fix: only build FA3 for Hopper #28205

AlpinDale · 2025-11-06T11:07:09Z

Purpose

FA3 is only used for Hopper, yet it's built for every Ampere (sm_80, sm_87, sm_89) and Ada (sm_89) arches. This needlessly increases build times, and inflates the wheel size. FA3 technically supports A100 (sm_80), but as discussed offline with @mgoin, it's not stable and vLLM does not actually enable it.

vLLM FA PR: vllm-project/flash-attention#105

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: AlpinDale <alpindale@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

setup.py

gemini-code-assist

Code Review

This pull request aims to restrict the build of FlashAttention 3 (FA3) to only Hopper architectures to reduce build times and wheel size. It does this by updating the flash-attention dependency to a fork that contains the necessary CMake changes and by modifying the build conditions in setup.py.

My review identifies two issues in setup.py:

A new helper function _has_sm90_or_higher uses a broad except Exception: pass, which can hide errors. I've suggested adding logging.
The conditional logic for building the FA3 extension is flawed. It uses or instead of and, which makes the condition more permissive instead of restrictive, and could lead to build failures. I've provided a corrected version of the logic.

These changes are important for the stability and correctness of the build process.

setup.py

Signed-off-by: AlpinDale <alpindale@gmail.com>

heheda12345 · 2025-11-07T08:04:04Z

cmake/external_projects/vllm_flash_attn.cmake

-          GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
-          GIT_TAG a893712401d70362fbb299cd9c4b3476e8e9ed54
+          # temporary until vllm-project/flash-attention#105 is merged
+          GIT_REPOSITORY https://github.com/AlpinDale/flash-attention.git


we should wait until the above PR is merged and use the flash-attention in vllm-project

mergify · 2025-11-11T12:53:00Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @AlpinDale.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

LucasWilkinson · 2025-11-17T05:03:44Z

Apologies for the delay; there was offline discussions about deprecating FA2 and using Ampere for FA3; this seems far enough out that it makes sense to deprecate this for now

FA side PR merged

AlpinDale · 2025-11-17T05:07:06Z

setup.py

    return version


+def _has_sm90_or_higher() -> bool:


@LucasWilkinson thoughts on excluding Blackwell from FA3 builds as well? From what I understand, Blackwell still uses FA2 (at least until FA4 support lands).

hmm this _has_sm90_or_higher is a bit messy; sorry should have reviewed this closer.

We should either do:

if not arch_list use torch.cuda.get_arch_list()

handle this at the CMake level like FlashMLA and just create empty targets when no kernels would get built:

vllm/cmake/external_projects/flashmla.cmake

Lines 128 to 132 in 60e089f

else()

# Create empty targets for setup.py when not targeting sm90a systems

add_custom_target(_flashmla_C)

add_custom_target(_flashmla_extension_C)

endif()

preference for #2 in which case we wouldnt need to filter for Blackwell here

Understood. I'm not a fan of the current approach either; I'll change it to follow FlashMLA's example later today.

thank you! 🙏

[build] fix: only build FA3 for Hopper

c965499

Signed-off-by: AlpinDale <alpindale@gmail.com>

mergify bot added the ci/build label Nov 6, 2025

chatgpt-codex-connector bot reviewed Nov 6, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

setup.py Outdated Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

AlpinDale added 2 commits November 6, 2025 11:26

address comments

53ea676

Signed-off-by: AlpinDale <alpindale@gmail.com>

broad exception

14a1f9e

Signed-off-by: AlpinDale <alpindale@gmail.com>

heheda12345 reviewed Nov 7, 2025

View reviewed changes

heheda12345 requested a review from LucasWilkinson November 7, 2025 08:04

mergify bot added the needs-rebase label Nov 11, 2025

AlpinDale commented Nov 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[build] fix: only build FA3 for Hopper #28205

[build] fix: only build FA3 for Hopper #28205

AlpinDale commented Nov 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

heheda12345 Nov 7, 2025

Uh oh!

mergify bot commented Nov 11, 2025

Uh oh!

LucasWilkinson commented Nov 17, 2025

Uh oh!

AlpinDale Nov 17, 2025

Uh oh!

LucasWilkinson Nov 17, 2025 •

edited

Loading

Uh oh!

AlpinDale Nov 17, 2025

Uh oh!

LucasWilkinson Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	else()
	# Create empty targets for setup.py when not targeting sm90a systems
	add_custom_target(_flashmla_C)
	add_custom_target(_flashmla_extension_C)
	endif()

Uh oh!

[build] fix: only build FA3 for Hopper #28205

Are you sure you want to change the base?

[build] fix: only build FA3 for Hopper #28205

Conversation

AlpinDale commented Nov 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

heheda12345 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Nov 11, 2025

Uh oh!

LucasWilkinson commented Nov 17, 2025

Uh oh!

AlpinDale Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlpinDale Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlpinDale commented Nov 6, 2025 •

edited by github-actions bot

Loading

LucasWilkinson Nov 17, 2025 •

edited

Loading