Skip to content

[reland2][ROCm] preshuffled weight mm #2207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jeffdaily
Copy link
Contributor

No description provided.

Copy link

pytorch-bot bot commented May 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2207

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit b4115d3 with merge base 5549da8 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2025
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link

pytorch-bot bot commented May 14, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm label May 14, 2025
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297
Copy link

mxz297 commented May 14, 2025

@jeffdaily i am having issues of importing this PR. Can you first try to resolve the build errors?

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297
Copy link

mxz297 commented May 15, 2025

@jeffdaily there is a linter failure

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297
Copy link

mxz297 commented May 15, 2025

@jeffdaily there is also a failure in rocm test

module = Linear(in_features=32, out_features=128, bias=False)
config = MXFPInferenceConfig(block_size=32, activation_dtype=torch.float4_e2m1fn_x2, weight_dtype=torch.float4_e2m1fn_x2, gemm_kernel_choice=<MXGemmKernelChoice.CUTLASS: 'cutlass'>, set_inductor_config=False)

    @register_quantize_module_handler(MXFPInferenceConfig)
    def _mx_inference_linear_transform(
        module: torch.nn.Module, config: MXFPInferenceConfig
    ):
        # TODO Sm120 has slightly more restrictive reqs
        # TODO handle AMD
>       assert is_sm_at_least_100(), "MXFP is only supported on sm100 machiens for now"
E       AssertionError: MXFP is only supported on sm100 machiens for now

but this looks like the test should even not be run on AMD?

cc @drisspg @atalman @jerryzh168

@pytorch-bot pytorch-bot bot removed the ciflow/rocm label May 15, 2025
@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@drisspg drisspg added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label May 16, 2025
@drisspg
Copy link
Contributor

drisspg commented May 16, 2025

@mxz297 yeah this should be skipped, can you rebase past: #2209

@facebook-github-bot
Copy link
Contributor

@mxz297 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@mxz297
Copy link

mxz297 commented May 16, 2025

@pytorchbot run all

Copy link

pytorch-bot bot commented May 16, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'run' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@mxz297
Copy link

mxz297 commented May 16, 2025

@pytorchbot drci

@mxz297
Copy link

mxz297 commented May 16, 2025

@drisspg @atalman @jerryzh168

Seems to have some CUDA test failures where arch string parsing has some issue. Feels unlikely caused by this PR but want to double check with you folks:

Processing /pytorch/ao
  Preparing metadata (setup.py) ... 25l-� �error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      W0516 16:40:07.414810 215 site-packages/torch/utils/cpp_extension.py:118] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.6'
      W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
      W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 35, in <module>
        File "/pytorch/ao/setup.py", line 544, in <module>
          ext_modules=get_extensions(),
        File "/pytorch/ao/setup.py", line 432, in get_extensions
          cuda_arch_flags = _get_cuda_arch_flags()
        File "/opt/conda/envs/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2434, in _get_cuda_arch_flags
          arch_list[-1] += '+PTX'
      IndexError: list index out of range

@mxz297
Copy link

mxz297 commented May 16, 2025

Also a noob question: how should i restart ci or ci is always automatically restarted after a new code commit push?

@drisspg
Copy link
Contributor

drisspg commented May 16, 2025

@mxz297 so if you are a meta employee it will automatically restart on commit push but unfortunately for everyone else you will need to manually kick it off

@mxz297
Copy link

mxz297 commented May 19, 2025

@drisspg @atalman @jerryzh168

Any insight on the following error?

Processing /pytorch/ao
Preparing metadata (setup.py) ... 25l-� �error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [13 lines of output]
W0516 16:40:07.414810 215 site-packages/torch/utils/cpp_extension.py:118] No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-12.6'
W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
W0516 16:40:07.421015 215 site-packages/torch/utils/cpp_extension.py:2414] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
Traceback (most recent call last):
File "", line 2, in
File "", line 35, in
File "/pytorch/ao/setup.py", line 544, in
ext_modules=get_extensions(),
File "/pytorch/ao/setup.py", line 432, in get_extensions
cuda_arch_flags = _get_cuda_arch_flags()
File "/opt/conda/envs/venv/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2434, in _get_cuda_arch_flags
arch_list[-1] += '+PTX'
IndexError: list index out of range

@drisspg
Copy link
Contributor

drisspg commented May 19, 2025

Taking a look

@drisspg
Copy link
Contributor

drisspg commented May 19, 2025

Okay so this is coming from this line;

>>> from torch.utils.cpp_extension import _get_cuda_arch_flags
>>> _get_cuda_arch_flags()
/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.py:2410:
UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilat
ion.
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    _get_cuda_arch_flags()
    ~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/drisspg/.conda/envs/nightly/lib/python3.13/site-packages/torch/utils/cpp_extension.p
y", line 2430, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
    ~~~~~~~~~^^^^
IndexError: list index out of range

When you are calling get_arch_list with no args and the default system arch is not picked up with this logic:

https://github.com/pytorch/pytorch/blob/6487ea30b3fb3fe550d0e8e7feaf25bc3cffb626/torch/utils/cpp_extension.py#L2360

@drisspg
Copy link
Contributor

drisspg commented May 22, 2025

@jeffdaily Can you rebase I am still alittle confused by this CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: rocm topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants