-
Notifications
You must be signed in to change notification settings - Fork 257
Remove int_scaled_mm's dependency on triton for cpu #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @cpuhrsch Could you please review and see if the changes are reasonable to you? Thanks. |
Hi @cpuhrsch Could you please suggest how to deal with the issue (CPU impl availability depends on triton and AUTOTUNER_ENABLE)? Thanks! |
Hey @Xia-Weiwen - Thank you for the PR! Sorry for the delay in review. Also, please note the CI hasn't run green. Another way to resolve this could be to move
into |
@cpuhrsch Thanks! I will give it a try. A question is what |
@Xia-Weiwen - it's used for a Triton autotuner that allows us to cycle over a very large number of configs for a given fixed input shape. See https://github.com/pytorch-labs/ao/tree/main/torchao/kernel#autotuner-and-custom-triton-kernels |
Thank you @cpuhrsch. Looks like CPU impl does not need this. |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/128
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 97dfea8 with merge base cbd90e3 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @cpuhrsch This PR requires latest torch nightly (after 20241026) to pass CI. May I know when torch nightly will be updated in the CI? Thanks. |
@Xia-Weiwen - can you try merging the latest version of main? You might be built on top of a commit that pinned the nightly version. I see |
Thanks |
Hi @cpuhrsch CI is green. Could you please review? Thanks. |
* code beautification * debug info * debug * add missing args * typo * fix dtype check
The
int_scaled_mm
op in Torchao is designed for CUDA at the beginning. However, this op is also needed for CPU. This PR adds a path for CPU inintmm.int_scaled_mm
. It is not registered as and implementation fortorchao.int_scaled_mm
because we want to use Inductor for further optimization which cannot recognize thetorchao.int_scaled_mm
op.This change requires this PR pytorch/pytorch#136942. Otherwise, there might be numerical issues. So, it works with pytorch nightly since 20241026.
Test is covered by
test/kernel/test_autotuner.py
andtest/prototype/test_smoothquant.py
.