Re-enable MIOpen (cudnn) for amd cards, default MIOPEN_FIND_MODE=FAST, PYTORCH_MIOPEN_SUGGEST_NHWC=0 #11381
+11
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remove
cudnn.enabled = Falsefor AMD cards so MIOpen is enabled again.Default env vars if not specified (so these are easy to override by users if they care):
MIOPEN_FIND_MODE=FASTsolves initial slowdown issues particularly for VAE (miopen searching also seems to have little actual perf benefit if you let it run, at least in my experience on rdna3 for sdxl & wan) so this seems a better default.PYTORCH_MIOPEN_SUGGEST_NHWC=0This resolves the significant regression in ImageUpscaleWithModel perf with miopen enabled on > rocm 7.In particular this improves
ImageUpscaleWithModelperf on rocm7.1: 7.9s -> 2.4s(using a simple single image example workflow).
Tested on my 7900 GRE (rdna3) on Linux with rocm 7.1 & 6.4.
Resolves #10447
Relates to #10302, #10448, pytorch/pytorch#170764, ROCm/TheRock#2485