WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg by alexheretic · Pull Request #10238 · Comfy-Org/ComfyUI

alexheretic · 2025-10-06T17:05:29Z

I experience slow VAE performance on my AMD RX 7900 GRE gpu and can usually improve this by opting for the tiled VAE nodes. However, WanImageToVideo does VAE encoding and is currently not configurable. This leads to wan workflows being slow for me, see benchmarks.

I propose we add a vae_tile_size optional argument to WanImageToVideo (and similar). By default this will be 0 to mean untiled, ie acting as it did previously. If set the value will be used as the x & y tile size. This allows users, like me, a way to workaround poor wan VAE untiled encode performance.

As the default behaviour is unchanged this should be backward compatible.

Alternatives

Add new "tiled" variant nodes for wan, e.g. TiledWanImageToVideo.
Automatically pick tiled encoding for certain GPUs, e.g. my gpu -> 256x256 tiled encoding.

Wan 2.1 VAE benchmarks (480x832 * 81 frames)

System info

MIOPEN_FIND_MODE=FAST

Total VRAM 16368 MB, total RAM 64217 MB
pytorch version: 2.9.0.dev20250827+rocm6.4
AMD arch: gfx1100
ROCm version: (6, 4)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX 7900 GRE : native
Using Flash Attention
Python version: 3.12.11 (main, Jun  4 2025, 10:32:37) [GCC 15.1.1 20250425]
ComfyUI version: 0.3.62
ComfyUI frontend version: 1.27.7
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float16

VAE Encode

Benches show significant improvement using tiled vae encoding. On my setup 256x256 performed best. 589s -> 25s.

Untiled vs 512 vs 384 vs 256 vs 128

2 runs each.

untiled

Yes really 10 minutes 😞

[WanImageToVideo]: 608.79s
[WanImageToVideo]: 588.72s

tiled 512,512,32,256,8

[WanImageToVideo]: 41.86s
[WanImageToVideo]: 43.68s

tiled 384,384,32,256,8

[WanImageToVideo]: 30.41s
[WanImageToVideo]: 28.89s

tiled 256,256,32,256,8

[WanImageToVideo]: 25.00s
[WanImageToVideo]: 25.35s

tiled 128,128,32,256,8

[WanImageToVideo]: 45.57s
[WanImageToVideo]: 45.31s

VAE Decode

Benches also show significant improvement using tiled vae decoding. On my setup 256x256 performed best.
Note: Decoding is already a separate node so no code changes required, this is just kinda related and perhaps interesting.

Untiled vs 512 vs 384 vs 256 vs 128

4 runs each (where possible).

untiled

OOM 😢

tiled 512,512,32,124,8

OOM 😢

tiled 384,384,32,124,8

[VAEDecodeTiled]: 73.94s
[VAEDecodeTiled]: 99.03s
[VAEDecodeTiled]: 62.71s
[VAEDecodeTiled]: 66.34s

tiled 256,256,32,124,8

[VAEDecodeTiled]: 60.79s
[VAEDecodeTiled]: 61.21s
[VAEDecodeTiled]: 54.53s
[VAEDecodeTiled]: 47.72s

tiled 128,128,32,124,8

[VAEDecodeTiled]: 72.18s
[VAEDecodeTiled]: 71.70s
[VAEDecodeTiled]: 71.47s
[VAEDecodeTiled]: 71.29s

alexheretic · 2025-10-13T22:04:40Z

@comfyanonymous this provides quite a big improvement for me (589s -> 25s encode time), and perhaps for other amd users too. wdyt?

reneleonhardt · 2025-12-30T09:45:55Z

Looks like a good improvement, could you add a test if the maintainers would merge it?

comfy-pr-bot · 2026-01-22T03:40:13Z

Test Evidence Check

⚠️ Warning: Visual Documentation Missing

If this PR changes user-facing behavior, visual proof (screen recording or screenshot) is required. PRs without applicable visual documentation may not be reviewed until provided.

You can add it by:

GitHub: Drag & drop media directly into the PR description
YouTube: Include a link to a short demo

… arg

alexheretic · 2026-02-05T09:43:30Z

I'd appreciate some guidance on this. Is this something that could get merged? Is one of the alternative approaches mentioned more attractive?

rattus128 · 2026-02-09T07:04:48Z

Hey sorry about the long delays.

From a node design point of view this is probably pointing out a flaw in way the tiler nodes are designed (being limited to encode and decode nodes). Can you one-shot everyone tilers needs by having a VAE-in VAE-out tiler node where you set the tiler config and all all consumers of that node use that tile config? That can then feed this, the usual wan I2V node and all the other video models the bring in a VAE?

Regarding WAN VAE specifically, its one of the gentler video VAEs for VRAM so its a weird on to have trouble on. The WAN VAE probably has signficant scope for straight up VRAM reduction using the recursive rolling upscale algorithm just implemented in the LTX VAE which is able to reduce the VRAM consumption as low as a 3 frame window (WAN is at 6 today), theres a sweet spot at 4 last time I did the math on it.

alexheretic · 2026-02-13T18:48:39Z

Thanks for the feedback, that approach makes sense I think. I'll take a look.

alexheretic requested a review from Kosinkadink as a code owner October 6, 2025 17:05

alexheretic mentioned this pull request Oct 24, 2025

Improve AMD performance. #10302

Merged

alexheretic mentioned this pull request Dec 28, 2025

[gfx1201/gfx1151] Collecting MIOpen and hipBLASLt logs ROCm/TheRock#2591

Open

alexheretic requested review from comfyanonymous and guill as code owners February 5, 2026 00:15

WanImageToVideo, WanFirstLastFrameToVideo: Add vae_tile_size optional…

c4913b1

… arg

alexheretic force-pushed the wan-vae-tiled-encode branch from e7bc958 to c4913b1 Compare February 5, 2026 09:39

rattus128 self-assigned this Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg#10238

WanImageToVideo, WanFirstLastFrameToVideo: Add `vae_tile_size` optional arg#10238
alexheretic wants to merge 1 commit intoComfy-Org:masterfrom
alexheretic:wan-vae-tiled-encode

alexheretic commented Oct 6, 2025 •

edited

Loading

Uh oh!

alexheretic commented Oct 13, 2025

Uh oh!

reneleonhardt commented Dec 30, 2025

Uh oh!

comfy-pr-bot commented Jan 22, 2026

Uh oh!

alexheretic commented Feb 5, 2026

Uh oh!

rattus128 commented Feb 9, 2026

Uh oh!

alexheretic commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alexheretic commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternatives

Wan 2.1 VAE benchmarks (480x832 * 81 frames)

VAE Encode

untiled

tiled 512,512,32,256,8

tiled 384,384,32,256,8

tiled 256,256,32,256,8

tiled 128,128,32,256,8

VAE Decode

untiled

tiled 512,512,32,124,8

tiled 384,384,32,124,8

tiled 256,256,32,124,8

tiled 128,128,32,124,8

Uh oh!

alexheretic commented Oct 13, 2025

Uh oh!

reneleonhardt commented Dec 30, 2025

Uh oh!

comfy-pr-bot commented Jan 22, 2026

Test Evidence Check

Uh oh!

alexheretic commented Feb 5, 2026

Uh oh!

rattus128 commented Feb 9, 2026

Uh oh!

alexheretic commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexheretic commented Oct 6, 2025 •

edited

Loading