Skip to content

Conversation

@alexheretic
Copy link
Contributor

@alexheretic alexheretic commented Dec 17, 2025

Remove cudnn.enabled = False for AMD cards so MIOpen is enabled again.

Default env vars if not specified (so these are easy to override by users if they care):

  • MIOPEN_FIND_MODE=FAST solves initial slowdown issues particularly for VAE (miopen searching also seems to have little actual perf benefit if you let it run, at least in my experience on rdna3 for sdxl & wan) so this seems a better default.
  • PYTORCH_MIOPEN_SUGGEST_NHWC=0 This resolves the significant regression in ImageUpscaleWithModel perf with miopen enabled on > rocm 7.

In particular this improves ImageUpscaleWithModel perf on rocm7.1: 7.9s -> 2.4s
(using a simple single image example workflow).

Tested on my 7900 GRE (rdna3) on Linux with rocm 7.1 & 6.4.

Resolves #10447
Relates to #10302, #10448, pytorch/pytorch#170764, ROCm/TheRock#2485

Default MIOPEN_FIND_MODE=FAST
Default PYTORCH_MIOPEN_SUGGEST_NHWC=0
@alexheretic
Copy link
Contributor Author

cc @comfyanonymous can you re-check if this works as well as disabling cudnn for your test scenarios? The additional PYTORCH_MIOPEN_SUGGEST_NHWC=0 switch resolves perf issues with rocm7.1 upscaling for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disabling cudnn regresses ImageUpscaleWithModel performance on ROCM 6.4

1 participant