Skip to content

[AUTOGENERATED] [release/2.6][ROCm][TunableOp] Improve identification of fastest solution (#144942) #2041

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rocm-mici
Copy link

Cherry-pick of #2018

…tion (pytorch#144942) (#2018)

This PR addresses some stability issues with identifying the fastest
solution on AMD GPUs, particularly the MI300.

Changes include:
- An improved timer, StreamTimerNoSync
- More aggressive skipping of slow solutions
- Additional statistics that can be used for diagnostics
PYTORCH_TUNABLEOP_VERBOSE=3

Pull Request resolved: pytorch#144942
Approved by: https://github.com/jeffdaily

(cherry picked from commit fd0cd6a)
@naromero77amd naromero77amd changed the title [AUTOGENERATED] [release/2.6] [release/2.5][ROCm][TunableOp] Improve identification of fastest solution (#144942) [AUTOGENERATED] [release/2.6][ROCm][TunableOp] Improve identification of fastest solution (#144942) Apr 21, 2025
@naromero77amd naromero77amd marked this pull request as ready for review April 21, 2025 23:31
@rocm-repo-management-api
Copy link

Jenkins build for 3c1cff822d696ea330e528435b4f0f555f305aab commit is in progress
Links: Blue Ocean view / Build artifacts

@jithunnair-amd jithunnair-amd merged commit 1a808fd into release/2.6 Apr 22, 2025
0 of 2 checks passed
@jithunnair-amd jithunnair-amd deleted the autogenerated/release/2.6_cherry-pick_pr-2018 branch April 22, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants