Skip to content

Conversation

@comfy-ovum
Copy link

To offset the substantial effects of #10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in #10302 to disable cudNN for all AMD users.

nothing else in comfy uses env vars to enable/disable stuff

@comfyanonymous

You keep insisting that because cudnn=False runs faster for you, it must therefore be forced on everyone. That is not engineering. That is theology.

Let us review what you have done. Your pull request simply hard-codes torch.backends.cudnn.enabled = False for all RDNA3 and RDNA4 users. You wrote that you "have no idea why it helps but it does" on your system. That may be true for your test box, your driver, your kernel. But issue #10447 shows another user whose performance collapsed the moment cudnn was disabled. Issue #10460 shows the same pattern. For them, your patch breaks what once worked. That alone should end this argument: if a change helps some and harms others, the correct path is configurability, not decree.

Then you said "nothing else in Comfy uses env vars to enable or disable stuff." False. Comfy already reads them: COMFYUI_DIR, proxies, path expansions, HTTP settings. Users have asked for .env support and config overrides repeatedly. Pretending this tool never touches environment variables is historical revision, not justification. The absence of a precedent is not a reason to block a useful one.

ComfyUI runs on wildly different hardware and software combinations: Linux, Windows, ROCm 6.x, Torch 2.x, FlashAttention, tuned kernels, patched builds. The very nature of this ecosystem demands flexibility. A developer who locks a single behavior across such diversity is courting regression. Hardware changes. Drivers update. A fix today becomes a bottleneck tomorrow. Your forced flag will age like milk.

The purpose of an env var is precisely this: to give users an escape hatch when automatic detection fails or when blanket assumptions crumble. A flag such as COMFYUI_CUDNN_ENABLED=1 or 0 would let everyone test, measure, and choose without touching the source. It adds no maintenance cost. It adds resilience. It adds honesty.

If you truly believe in your optimization, you can keep it as the default, as this PR allows. It simply adds a message and the required fuctionality to support it:

"cudnn disabled for AMD; set COMFYUI_CUDNN_ENABLED=1 to re-enable."

That informs without coercing. That is how grown-up software behaves.

Right now, your stance is that your environment defines truth. It does not. It defines your truth. And until you allow others to define theirs, what you are enforcing is not a performance improvement, but a constraint masquerading as genius.

@jovan2009
Copy link

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious.
pytorch/pytorch#167242

@RandomGitUser321
Copy link
Contributor

RandomGitUser321 commented Nov 10, 2025

Why make it an environmental variable when you can just make it a launch arg like --force-cudnn-enabled?

In cli_args.py, you'd add something like:
parser.add_argument("--force-cudnn-enabled", action="store_true", help="Force ComfyUI to use CUDNN descriptive words with more words following words")

The model_management.py already imports the args.

@jovan2009
Copy link

jovan2009 commented Nov 15, 2025

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 s/it to about 35 s/it. For details about what I did look at the link at pytorch repository I mentioned in my previous post here.
TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :))))))
Cheers!
@comfy-ovum @comfyanonymous

@VladanZ
Copy link

VladanZ commented Nov 16, 2025

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 it/s to about 35 it/s. For details about what I did look at the link at pytorch repository I mentioned in my previous post here. TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :)))))) Cheers! @comfy-ovum @comfyanonymous

45-50 it/s down to 35 it/s seems like significant slow down, not speedup to me. But maybe my math is just wrong and works different these days.

@jovan2009
Copy link

45-50 it/s down to 35 it/s seems like significant slow down, not speedup to me. But maybe my math is just wrong and works different these days.

@VladanZ
You are right, I apparently mistyped, it is the other way around, not it/s but s/it. In other words shorter time with this modification compared with "before". Thanks for pointing this out, I will make the correction. It was a wan 2.2 + 4 steps Lora workflow, it would have been wonderful to have 45 or 35 steps/ second.

@tapstoop
Copy link

Related to this subject, the issue with cudnn might be close to be solved in the near future, at least with torch nightly builds, once they will be compiled with cudnn v. 9.15. Keeping an eye on some discussions at the Pytorch repository I came to the conclusion that it might be the case that it will be even a way to use a separate cudnn 9.15 installation instead of the one compiled inside torch, beginning with stable 2.9.1. What exactly will be that way is above my level of knowledge. I opened an issue at Pytorch repository to ask some clarifications because the subject is way over my head. I too would like to see the issue solved as soon as possible so I am kinda anxious. pytorch/pytorch#167242

Update: today I managed to run ComfyUI with last CUDNN version 9.16. I got a performance leap with one of my usual WAN 2.2 I2V workflows from about 45-50 s/it to about 35 s/it. For details about what I did look at the link at pytorch repository I mentioned in my previous post here. TLDR: I used last CUDNN 9.16 downloaded from nvidia, I installed it and I symply drag and drop the dlls over the same dlls in torch folder, torch being the yesterday nightly 2.10.0.dev20251114+cu130. I modified a number in the file ops.py, you can see exacly where at this commit: b4f30bd. I put a larger number, something like 91700 in order to skip over the current workaround. Instead my PC exploding or being sucked into a black hole, it worked. :)))))) Cheers! @comfy-ovum @comfyanonymous

Any idea if there could be such a workaround for Linux ?

@jovan2009
Copy link

jovan2009 commented Dec 11, 2025

Any idea if there could be such a workaround for Linux ?

If you look at the linked above pytorch opened issue (and at torch 2.9.1 release notes https://github.com/pytorch/pytorch/releases), beginning with pytorch 2.9.1 the "official" workaround was intended to be by simply installing with pip latest nvidia-cudnn-cu13 python package and pytorch should use that as its Cudnn. In Windows for me it didn't work, that's why I resorted to copy Cudnn files over Pytorch installation. But I assume in Linux should work, at least?

Edit: note that if you manage to make Pytorch to use the latest cudnn there is no need any more to edit comfyui files on current git version, it will not use conv3d workaround on cudnn >= 9.15

@alexheretic
Copy link
Contributor

This is about an AMD workaround. I don't think nvidia cudnn or cuda library versions can be related.

The root cause could be a regression in miopen in the latest rocm releases. I have confirmed at least some operations have regressed since 6.4, though I don't think this particular operation is what motivated disabling cudnn/miopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants