-
-
Notifications
You must be signed in to change notification settings - Fork 44
Description
I just wanted to write a thank you to @robertvoy for the fantastic stack. I have been using Comfyui for some time and I it has always been slow to get anything done due to my hardware configuration as it is a budget build. I could not afford to get a 3090 and even less a 5090. So I did what I could with what I had. At the first, I was using a cmp 50HX and if course it took for ever to generate 4 draft images.
example,
comfyui-cu126 | got prompt
comfyui-cu126 | loaded completely 8294.191662399291 8155.373291015625 True
100% 30/30 [11:35<00:00, 23.19s/it]
comfyui-cu126 | Requested to load AutoencodingEngine
comfyui-cu126 | loaded completely 179.30928993225098 159.87335777282715 True
comfyui-cu126 | Prompt executed in 00:11:40
After seeing ComfyUI-Distributed video on Youtube I decided to give this a try.
Then I saw the same card for 50 bucks so I bought 3 more and I added one more to my main system and the other two into another system I had linked over 10gbit networking. So I had 40GB of vram with 22.15 TFLOPS at FP16 and 560GB memory bandwidth per card for 200 bucks.
After configuring everything and having 2 nodes with two cards each for a total of 4 cards my flow went from 23.19s/it or 11 minutes and 40 seconds to completion to 6.64s/it or under 5 minutes for the entire work flow which is fantastic.
comfyui-cu126 | got prompt
100% 30/30 [03:19<00:00, 6.64s/it]
comfyui-cu126 | Requested to load AutoencodingEngine
comfyui-cu126 | loaded completely 179.30928993225098 159.87335777282715 True
comfyui-cu126 | [Distributed] Master - Timeout. Still waiting for workers: ['f1112ac6-4084-4428-92a4-e5888f18dc48', '41ee4efd-a9e7-4640-af76-4eb9869f3160']
comfyui-cu126 | [Distributed] Master - Probe grace: worker f1112ac6-4084-4428-92a4-e5888f18dc48 appears busy (queue_remaining=1). Continuing to wait.
comfyui-cu126 | Prompt executed in 295.25 seconds
So basically it cut my time to less than half for total time and it cuts the iteration times to 1/4 of the original time. This is fantastic.
Thank you so much for this.... I mean if it can do this with ancient hardware, I can only imaging what this can do with 4 x 3060.
It is insane what this can do with distributed upscale, I am very grateful you did this and I wish nothing more than success for you.
Thank you once more..
PS. Once I fine tune this a little more, I will posting a full article and howto on Reddit. I will send you the link.