Pengjun Fang, Harry Yang
In this repository, we present WanFM (frame morphing), which builds upon Wan 2.2 Image-to-Video (I2V) and introduces several key enhancements:-
Last Frame Constraint: Enforces precise alignment between the generated last frame and the target frame, ensuring consistent video endpoints.
-
Bidirectional Denoising with Time Reversal Fusion: Performs denoising both forward (first-to-last) and backward (last-to-first), fusing intermediate results at every step for superior temporal coherence. To accommodate bidirectional fusion, the original denoising formula—where each step depends on the previous one—has been redesigned, allowing non-continuous, step-wise integration of forward and backward denoised states.
-
Prompt-Adapted Temporal Attention: During the reverse pass, temporal self-attention is rotated to align backward generation with the prompt, enabling bidirectionally refined, prompt-consistent video sequences.
With these improvements, we achieve First–Last–Frame-to-Video generation (FLF2V), enabling controllable and consistent video synthesis given the first and last frames as constraints.
demo.mp4
Please see Wan2.2 (https://github.com/Wan-Video/Wan2.2?tab=readme-ov-file#installation).
| Models | Download Links | Description |
|---|---|---|
| I2V-A14B | 🤗 Huggingface 🤖 ModelScope | Image-to-Video MoE model, supports 480P & 720P |
- Single-GPU inference
python generate.py \
--task flf2v-A14B \
--size 832*480 \
--ckpt_dir ./Wan2.2-I2V-A14B \
--offload_model False \
--frame_num 81 \
--sample_steps 40 \
--sample_shift 16 \
--sample_guide_scale 5 \
--prompt <prompt> \
--first_frame <first frame path> \
--last_frame <last frame path> \
--save_file <output path> \
--bidirectional_sampling- Multi-GPU inference using FSDP + DeepSpeed Ulysses
torchrun --nproc_per_node=8 --master_port 39550 generate.py \
--task flf2v-A14B \
--size 832*480 \
--ckpt_dir ./Wan2.2-I2V-A14B \
--offload_model False \
--convert_model_dtype \
--frame_num 81 \
--sample_steps 40 \
--sample_shift 16 \
--sample_guide_scale 5 \
--dit_fsdp \
--t5_fsdp \
--ulysses_size 2 \
--prompt <prompt> \
--first_frame <first frame path> \
--last_frame <last frame path> \
--save_file <output path> \
--bidirectional_samplingIf you encounter OOM (Out-of-Memory) issues, you can use the --offload_model True, --convert_model_dtype and --t5_cpu options to reduce GPU memory usage.
|
example0.mp4 |
|
example1.mp4 |
|
example2.mp4 |
|
example3.mp4 |
|
example4.mp4 |
|
example5.mp4 |
|
example6.mp4 |
|
example7.mp4 |
|
example8.mp4 |
|
example9.mp4 |




















