-
Notifications
You must be signed in to change notification settings - Fork 267
[reland2][ROCm] preshuffled weight mm #2207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2207
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New FailuresAs of commit b4115d3 with merge base 5549da8 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jeffdaily i am having issues of importing this PR. Can you first try to resolve the build errors? |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jeffdaily there is a linter failure |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jeffdaily there is also a failure in rocm test
but this looks like the test should even not be run on AMD? |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot run all |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot drci |
Seems to have some CUDA test failures where arch string parsing has some issue. Feels unlikely caused by this PR but want to double check with you folks:
|
Also a noob question: how should i restart ci or ci is always automatically restarted after a new code commit push? |
@mxz297 so if you are a meta employee it will automatically restart on commit push but unfortunately for everyone else you will need to manually kick it off |
Any insight on the following error? Processing /pytorch/ao × python setup.py egg_info did not run successfully. |
Taking a look |
Okay so this is coming from this line; >>> from torch.utils.cpp_extension import _get_cuda_arch_flags
>>> _get_cuda_arch_flags()
/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.py:2410:
UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilat
ion.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
warnings.warn(
Traceback (most recent call last):
File "<input>", line 1, in <module>
_get_cuda_arch_flags()
~~~~~~~~~~~~~~~~~~~~^^
File "/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.p
y", line 2430, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
~~~~~~~~~^^^^
IndexError: list index out of range When you are calling get_arch_list with no args and the default system arch is not picked up with this logic: |
@jeffdaily Can you rebase I am still alittle confused by this CI |
No description provided.