-
Notifications
You must be signed in to change notification settings - Fork 14.3k
CUDA: Blackwell features for non-native builds #18436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Blackwell features for non-native builds #18436
Conversation
Yes, you can run the Release CI manually against a specified branch, or on your own fork. Edit: I have already tested with |
CISC
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Together with #18441 we should be in business...
|
|
|
@JohannesGaessler: This is the cmake output using a fresh build dir using commit 0c89864 on master: All compiled just fine. cmake version is 3.31.6. I will test up next your change in PR #18457 |
This PR adds architectures to enable the recent Blackwell-specific MXFP4 optimizations for non-native builds. The problem with
120f-virtualwhich we were using in the initial PR is that it doesn't match some regex that CMake was using to validate CUDA architectures. But the same regex seems to be compatible with120a-realand121a-realso I would suggest that we for now simply build those since there is no other hardware to cover. Newer CMake versions come with a bugfix for the regex so presumably this will be less problematic for us to handle in the future. @CISC is there a way to run the Windows CUDA release CI without merging a PR?