WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional arg#10238
WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional arg#10238alexheretic wants to merge 1 commit intoComfy-Org:masterfrom
vae_tile_size optional arg#10238Conversation
|
@comfyanonymous this provides quite a big improvement for me (589s -> 25s encode time), and perhaps for other amd users too. wdyt? |
|
Looks like a good improvement, could you add a test if the maintainers would merge it? |
Test Evidence CheckIf this PR changes user-facing behavior, visual proof (screen recording or screenshot) is required. PRs without applicable visual documentation may not be reviewed until provided. You can add it by:
|
e7bc958 to
c4913b1
Compare
|
I'd appreciate some guidance on this. Is this something that could get merged? Is one of the alternative approaches mentioned more attractive? |
|
Hey sorry about the long delays. From a node design point of view this is probably pointing out a flaw in way the tiler nodes are designed (being limited to encode and decode nodes). Can you one-shot everyone tilers needs by having a VAE-in VAE-out tiler node where you set the tiler config and all all consumers of that node use that tile config? That can then feed this, the usual wan I2V node and all the other video models the bring in a VAE? Regarding WAN VAE specifically, its one of the gentler video VAEs for VRAM so its a weird on to have trouble on. The WAN VAE probably has signficant scope for straight up VRAM reduction using the recursive rolling upscale algorithm just implemented in the LTX VAE which is able to reduce the VRAM consumption as low as a 3 frame window (WAN is at 6 today), theres a sweet spot at 4 last time I did the math on it. |
|
Thanks for the feedback, that approach makes sense I think. I'll take a look. |
I experience slow VAE performance on my AMD RX 7900 GRE gpu and can usually improve this by opting for the tiled VAE nodes. However,
WanImageToVideodoes VAE encoding and is currently not configurable. This leads to wan workflows being slow for me, see benchmarks.I propose we add a
vae_tile_sizeoptional argument toWanImageToVideo(and similar). By default this will be0to mean untiled, ie acting as it did previously. If set the value will be used as the x & y tile size. This allows users, like me, a way to workaround poor wan VAE untiled encode performance.As the default behaviour is unchanged this should be backward compatible.
Alternatives
TiledWanImageToVideo.Wan 2.1 VAE benchmarks (480x832 * 81 frames)
System info
MIOPEN_FIND_MODE=FASTVAE Encode
Benches show significant improvement using tiled vae encoding. On my setup 256x256 performed best. 589s -> 25s.
Untiled vs 512 vs 384 vs 256 vs 128
2 runs each.
untiled
Yes really 10 minutes 😞
tiled 512,512,32,256,8
tiled 384,384,32,256,8
tiled 256,256,32,256,8
tiled 128,128,32,256,8
VAE Decode
Benches also show significant improvement using tiled vae decoding. On my setup 256x256 performed best.
Note: Decoding is already a separate node so no code changes required, this is just kinda related and perhaps interesting.
Untiled vs 512 vs 384 vs 256 vs 128
4 runs each (where possible).
untiled
OOM 😢
tiled 512,512,32,124,8
OOM 😢
tiled 384,384,32,124,8
tiled 256,256,32,124,8
tiled 128,128,32,124,8