Skip to content

Not an issue but a thank you! #52

@alsimms

Description

@alsimms

I just wanted to write a thank you to @robertvoy for the fantastic stack. I have been using Comfyui for some time and I it has always been slow to get anything done due to my hardware configuration as it is a budget build. I could not afford to get a 3090 and even less a 5090. So I did what I could with what I had. At the first, I was using a cmp 50HX and if course it took for ever to generate 4 draft images.

example,

comfyui-cu126 | got prompt
comfyui-cu126 | loaded completely 8294.191662399291 8155.373291015625 True
100% 30/30 [11:35<00:00, 23.19s/it]
comfyui-cu126 | Requested to load AutoencodingEngine
comfyui-cu126 | loaded completely 179.30928993225098 159.87335777282715 True
comfyui-cu126 | Prompt executed in 00:11:40

After seeing ComfyUI-Distributed video on Youtube I decided to give this a try.

Then I saw the same card for 50 bucks so I bought 3 more and I added one more to my main system and the other two into another system I had linked over 10gbit networking. So I had 40GB of vram with 22.15 TFLOPS at FP16 and 560GB memory bandwidth per card for 200 bucks.

After configuring everything and having 2 nodes with two cards each for a total of 4 cards my flow went from 23.19s/it or 11 minutes and 40 seconds to completion to 6.64s/it or under 5 minutes for the entire work flow which is fantastic.

comfyui-cu126 | got prompt
100% 30/30 [03:19<00:00, 6.64s/it]
comfyui-cu126 | Requested to load AutoencodingEngine
comfyui-cu126 | loaded completely 179.30928993225098 159.87335777282715 True
comfyui-cu126 | [Distributed] Master - Timeout. Still waiting for workers: ['f1112ac6-4084-4428-92a4-e5888f18dc48', '41ee4efd-a9e7-4640-af76-4eb9869f3160']
comfyui-cu126 | [Distributed] Master - Probe grace: worker f1112ac6-4084-4428-92a4-e5888f18dc48 appears busy (queue_remaining=1). Continuing to wait.
comfyui-cu126 | Prompt executed in 295.25 seconds

So basically it cut my time to less than half for total time and it cuts the iteration times to 1/4 of the original time. This is fantastic.

Thank you so much for this.... I mean if it can do this with ancient hardware, I can only imaging what this can do with 4 x 3060.

It is insane what this can do with distributed upscale, I am very grateful you did this and I wish nothing more than success for you.

Thank you once more..

PS. Once I fine tune this a little more, I will posting a full article and howto on Reddit. I will send you the link.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions