-
Couldn't load subscription status.
- Fork 286
[Nvidia][SM121] Add intrin.h include to gemm_mma.h for sm120+ #785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To make sm120 arch runnable.
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @HaoKang-Timmy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request addresses a critical dependency by incorporating a missing header file into an existing template. This ensures that downstream components, specifically those targeting sm120 hardware, can compile correctly without errors related to undeclared intrinsics.
Highlights
- Missing Header Inclusion: The pull request resolves a compilation issue for tilelang code targeting sm120 hardware by adding the necessary #include "intrin.h" to gemm_mma.h. This is crucial because gemm_sm120.h relies on gemm_mma.h, and intrin.h provides essential intrinsics for sm120 compilation.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
WalkthroughAdded a single include directive for "intrin.h" in src/tl_templates/cuda/gemm_mma.h, placed after the existing include of "cuda_fp8.h". No other code, API, or control-flow changes. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the intrin.h header to gemm_mma.h to support compilation for sm120 hardware. The change is functionally correct and safe, as the contents of intrin.h are guarded by a preprocessor check for newer architectures. I have added one comment with a suggestion to potentially improve modularity by including the header in a more specific file, which could enhance long-term maintainability.
|
|
||
| #include "common.h" | ||
| #include "cuda_fp8.h" | ||
| #include "intrin.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better modularity, consider including intrin.h directly in the files that require it (e.g., gemm_sm120.h, as mentioned in the PR description) rather than in this more general header. This approach helps to minimize header dependencies and makes it explicit which components rely on the SM90+ intrinsics. If gemm_mma.h itself doesn't use any symbols from intrin.h, moving this include would lead to a cleaner design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/tl_templates/cuda/gemm_mma.h (1)
12-12: Include LGTM; consider guarding for SM120 and avoiding Windows intrin.h collisionThe addition unblocks SM120 builds. Two optional tweaks:
- Gate it to SM120 to reduce surface area on other arches.
- Ensure we always include the local header (not MSVC’s intrin.h on Windows) by guarding or making the path explicit.
-#include "intrin.h" +#if defined(__CUDA_ARCH_LIST__) && (__CUDA_ARCH_LIST__ >= 1200) +# include "intrin.h" // SM120 intrinsics +#endifPlease confirm Windows CI resolves this to the project’s intrin.h (and not the toolchain one). If collision is observed, switch to an explicit relative path.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/tl_templates/cuda/gemm_mma.h(1 hunks)
gemm_sm120.his based ongemm_mma.h.However,
gemm_mma.hdid not includeintrin.h, which is essential to compile tilelang code on sm120 hardware.Summary by CodeRabbit