Fix cuda_archs_loose_intersection when handling sm_*a #20207

huydhn · 2025-06-28T09:18:53Z

After #20086 lands, I start to see CUDA error: no kernel image is available for execution on the device error again for meta-llama/llama-4-maverick-17b-128e-instruct-fp8. The error surfaces on the periodic H100 benchmark run I set up on https://github.com/pytorch/pytorch-integration-testing/actions/runs/15941259223/job/44969615173#step:14:1508

It turns out that there is another bug in cuda_archs_loose_intersection function in which it wrongly calls set(_CUDA_ARCHS "*a") consecutively for 9.0a and 10.0a, effectively overwriting the former with the later. This issue only happens for sm_*a when there are more than one of them in TORCH_CUDA_ARCH_LIST, i.e. TORCH_CUDA_ARCH_LIST='7.5;8.0;8.6;9.0a;10.0a'. I think this finally explains why this kernel either works for Hopper or Blackwell, but not both.

Testing

Building vLLM locally finally show the correct CUDA_ARCH being selected. Previously, it was either 10.0a or 9.0a if I excluded Blackwell from the list.

TORCH_CUDA_ARCH_LIST='7.5;8.0;8.6;9.0a;10.0a' VLLM_FA_CMAKE_GPU_ARCHES='80-real;90-real' python3 setup.py bdist_wheel --dist-dir=dist --py-limited-api=cp38

...
-- Building moe_data for archs: 9.0a;10.0a <-- Show both
...

CI build log also looks correct https://buildkite.com/vllm/fastcheck/builds/28660#0197b5d8-a37f-4e28-9953-80f63a89a0de/127-5344

cc @mgoin

Signed-off-by: Huy Do <huydhn@gmail.com>

github-actions · 2025-06-28T09:19:02Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @huydhn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical bug in the CMake build configuration that prevented vLLM from correctly compiling kernels for multiple sm_*a CUDA architectures simultaneously. The fix ensures that all specified sm_*a architectures are properly recognized and included in the build process, thereby resolving runtime errors related to missing kernel images on devices like H100. Additionally, it improves the verbosity and helpfulness of build-time messages for the moe_data component.

Highlights

Core Bug Fix: cuda_archs_loose_intersection for sm_*a Architectures: The primary issue where sm_*a CUDA architectures (e.g., 9.0a for Hopper, 10.0a for Blackwell) were being incorrectly handled in the cuda_archs_loose_intersection CMake function has been resolved. Previously, the logic would overwrite the list of target architectures, leading to only one sm_*a variant being processed. The updated code now correctly iterates through and appends all relevant sm_*a architectures, ensuring proper kernel compilation for multi-architecture setups and resolving CUDA error: no kernel image is available for execution on the device errors.
Enhanced Build Feedback for moe_data: The build system now provides clearer status messages regarding the compilation of moe_data. A new message explicitly states which CUDA architectures moe_data is being built for. Additionally, improved messages are displayed when moe_data is not built, including a specific recommendation to upgrade the CUDA compiler to version 12.3 or later for optimal support of FP8 quantized MoE models on Hopper or Blackwell.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The code changes address a bug in the cuda_archs_loose_intersection function that caused incorrect CUDA architecture selection. The fix involves generalizing the architecture matching logic and adding more informative build messages.

CMakeLists.txt

Signed-off-by: Huy Do <huydhn@gmail.com>

houseroad · 2025-06-28T09:43:37Z

cmake/utils.cmake

-      list(REMOVE_ITEM _TGT_CUDA_ARCHS "10.0")
-      set(_CUDA_ARCHS "10.0a")
+  foreach(_arch ${_SRC_CUDA_ARCHS})
+    if(_arch MATCHES "\\a$")


shall we explicitly handle 9.0a or 10.0a? If we have something new like 11.0a, do we expect to have this logic to handle them here?

The cmake output looks correct when I throw in 12.0a, printing-- Building moe_data for archs: 9.0a;10.0a;12.0a. I couldn't find a reason on why 9.0a and 10.0a needs to be handled explicitly here, and a similar cmake loop logic has already been implemented to handle +PTX suffix. So, having a loop to handle all *a cases looks like the right choice. Let's wait for more context from folks I guess.

I think the loop is the right solution 👍 this function was originally written when the only "a" variant was 9.0a and we were naively hopefully the the lack of forward compatibility was a temporary one time hopper thing; looks like this is the direction Nvidia is going though 😞 so we should just embrace it haha. I think the 10.0a was just kinda tacked on so good catch!

mgoin · 2025-06-28T09:55:06Z

Cc @LucasWilkinson @tlrmchlsmth

houseroad

Looks good. Okay, after read the code again, I think it makes sense to use a loop with regex.

Is it because the old code called set(_CUDA_ARCHS "10.0a") instead of list(APPEND _CUDA_ARCHS "10.0a")?

LucasWilkinson

LGTM; good catch! Thanks for the fix!

LucasWilkinson · 2025-06-28T16:13:30Z

Side note (FYI @mgoin): we should probably restrict the use of a variants to only when it is necessary (i.e. we use one of the a features / instructions). e.g. is dont see anything in moe data that would need a (adds wgmma); we should probably just build these for 9.0 and 10.0 (Nvidia has made there compute capability stuff so confusing now 😞)

ill open a PR

Fix cuda_archs_loose_intersection when handling sm_*a

69b4af7

Signed-off-by: Huy Do <huydhn@gmail.com>

huydhn requested a review from tlrmchlsmth as a code owner June 28, 2025 09:18

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

mergify bot added the ci/build label Jun 28, 2025

gemini-code-assist bot reviewed Jun 28, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

Thank you Gemini!

81fb042

Signed-off-by: Huy Do <huydhn@gmail.com>

houseroad reviewed Jun 28, 2025

View reviewed changes

houseroad approved these changes Jun 28, 2025

View reviewed changes

houseroad added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 28, 2025

LucasWilkinson approved these changes Jun 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix cuda_archs_loose_intersection when handling sm_*a #20207

Fix cuda_archs_loose_intersection when handling sm_*a #20207

huydhn commented Jun 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

houseroad Jun 28, 2025

Uh oh!

huydhn Jun 28, 2025 •

edited

Loading

Uh oh!

LucasWilkinson Jun 28, 2025 •

edited

Loading

Uh oh!

mgoin commented Jun 28, 2025

Uh oh!

houseroad left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

LucasWilkinson commented Jun 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fix cuda_archs_loose_intersection when handling sm_*a #20207

Are you sure you want to change the base?

Fix cuda_archs_loose_intersection when handling sm_*a #20207

Conversation

huydhn commented Jun 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Uh oh!

github-actions bot commented Jun 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

houseroad Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgoin commented Jun 28, 2025

Uh oh!

houseroad left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

huydhn commented Jun 28, 2025 •

edited by github-actions bot

Loading

huydhn Jun 28, 2025 •

edited

Loading

LucasWilkinson Jun 28, 2025 •

edited

Loading

LucasWilkinson commented Jun 28, 2025 •

edited

Loading