Add environment variable to opt out of #10302 (forced disablement of …#10463
Closed
comfy-ovum wants to merge 64 commits intoComfy-Org:masterfrom
Closed
Add environment variable to opt out of #10302 (forced disablement of …#10463comfy-ovum wants to merge 64 commits intoComfy-Org:masterfrom
comfy-ovum wants to merge 64 commits intoComfy-Org:masterfrom
Conversation
…ement of cudnn for all AMD users)
Added a warning message about the state of torch-directml.
alexheretic
reviewed
Oct 26, 2025
| AMD_RDNA2_AND_OLDER_ARCH = ["gfx1030", "gfx1031", "gfx1010", "gfx1011", "gfx1012", "gfx906", "gfx900", "gfx803"] | ||
|
|
||
| try: | ||
| if is_amd(): |
Contributor
There was a problem hiding this comment.
I think we still need the is_amd() check here, the following nested logic applies only to amd cards.
There was a problem hiding this comment.
(quickly double checks)... it is in there, oh wait... hmm.... how did that happen! Fixed now.
That reminds me that the RDNA2 cut-off point is somewhat arbitary, but not my code. RDNA2 VAE decoding certainly benefits just as much as RDNA3. Not sure how it is when you aren't using cobbled together Windows drivers though.
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint. * Updated design using Tensor Subclasses * Fix FP8 MM * An actually functional POC * Remove CK reference and ensure correct compute dtype * Update unit tests * ruff lint * Fix missing keys * Rename quant dtype parameter * Rename quant dtype parameter * Fix unittests for CPU build
…g#10499) In the case of --cache-none lazy and subgraph execution can cause anything to be run multiple times per workflow. If that rerun nodes is in itself a subgraph generator, this will crash for two reasons. pending_subgraph_results[] does not cleanup entries after their use. So when a pending_subgraph_result is consumed, remove it from the list so that if the corresponding node is fully re-executed this misses lookup and it fall through to execute the node as it should. Secondly, theres is an explicit enforcement against dups in the addition of subgraphs nodes as ephemerals to the dymprompt. Remove this enforcement as the use case is now valid.
To enable this feature use: --fast pinned_memory
Contributor
|
lgtm, much nicer to have env control of this rather than maintaining patches 👍 |
* ops: dont take an offload stream if you dont need one * ops: prioritize mem transfer The async offload streams reason for existence is to transfer from RAM to GPU. The post processing compute steps are a bonus on the side stream, but if the compute stream is running a long kernel, it can stall the side stream, as it wait to type-cast the bias before transferring the weight. So do a pure xfer of the weight straight up, then do everything bias, then go back to fix the weight type and do weight patches.
Updated help text for the --fast argument to clarify potential risks.
…put of Rodin3D nodes (Comfy-Org#10556)
…nk-amd-cudnn-envvar
|
This commit has become too messy. Resubmitting as ..... #10649 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…cudnn for all AMD users)
To offset the substantial effects of #10302, this PR provides (and informs the user of) an environment variable that can be set to nullify the unilateral decision made in #10302 to disable cudNN for all AMD users.
It simply employs the standard pattern for such things:
Should #10302 be later removed it is still a useful additional to enhance configurability for AMD users.