-
Notifications
You must be signed in to change notification settings - Fork 11k
Open
Labels
Potential BugUser is reporting a bug. This should be tested.User is reporting a bug. This should be tested.
Description
Custom Node Testing
- I have tried disabling custom nodes and the issue persists (see how to disable custom nodes if you need help)
Expected Behavior
~15 seconds overhead for the first run
Actual Behavior
123 seconds overhead on simple xl workflow. First run execution time 135.39 seconds, second only 12.50 seconds. Besides core functions, the more things you add later, the more overhead it will have, especially bad with detailer custom nodes, first time execution rises to 404.44 seconds with just one facedetailer.
Steps to Reproduce
2-pass workflow with upscale it was tested on and ended up with 123s overhead. Only explicitly disabling it via --fast fp16_accumulation fp8_matrix_mult cublas_ops startup flag after relaunch helps.
Debug Logs
Total VRAM 48519 MB, total RAM 160650 MB
pytorch version: 2.8.0+cu128
xformers version: 0.0.32.post2
Enabled fp16 accumulation.
Set vram state to: HIGH_VRAM
Device: cuda:0 NVIDIA GeForce RTX 4090 D : cudaMallocAsync
Using xformers attention
Python version: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
ComfyUI version: 0.3.57
ComfyUI frontend version: 1.25.11
Skipping loading of custom nodes
Context impl SQLiteImpl.
Will assume non-transactional DDL.
No target revision found.
Starting server
To see the GUI go to: http://0.0.0.0:8188
To see the GUI go to: http://[::]:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type V_PREDICTION
Using xformers attention in VAE
Using xformers attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.bfloat16
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
loaded diffusion model directly to GPU
Requested to load SDXL
loaded completely 9.5367431640625e+25 4897.0483474731445 True
Requested to load SDXLClipModel
loaded completely 9.5367431640625e+25 1560.802734375 True
100%|█████████████████████████████████████████████████████████| 28/28 [00:29<00:00, 1.06s/it]
Requested to load AutoencoderKL
loaded completely 9.5367431640625e+25 159.55708122253418 True
61%|██████████████████████████████████▊ | 11/18 [00:26<00:03, 2.00it/s]
100%|█████████████████████████████████████████████████████████| 18/18 [00:28<00:00, 1.57s/it]
Prompt executed in 135.39 seconds
got prompt
100%|█████████████████████████████████████████████████████████| 28/28 [00:03<00:00, 8.54it/s]
100%|█████████████████████████████████████████████████████████| 18/18 [00:05<00:00, 3.39it/s]
Prompt executed in 12.50 secondsOther
No response
newsletternewsletter, MagicShiba, FeepingCreature and TheToxin-git
Metadata
Metadata
Assignees
Labels
Potential BugUser is reporting a bug. This should be tested.User is reporting a bug. This should be tested.