Skip to content

ff2416/WanFM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wan Frame Morphing (WanFM)

logo

Pengjun Fang, Harry Yang

In this repository, we present WanFM (frame morphing), which builds upon Wan 2.2 Image-to-Video (I2V) and introduces several key enhancements:
  • Last Frame Constraint: Enforces precise alignment between the generated last frame and the target frame, ensuring consistent video endpoints.

  • Bidirectional Denoising with Time Reversal Fusion: Performs denoising both forward (first-to-last) and backward (last-to-first), fusing intermediate results at every step for superior temporal coherence. To accommodate bidirectional fusion, the original denoising formula—where each step depends on the previous one—has been redesigned, allowing non-continuous, step-wise integration of forward and backward denoised states.

  • Prompt-Adapted Temporal Attention: During the reverse pass, temporal self-attention is rotated to align backward generation with the prompt, enabling bidirectionally refined, prompt-consistent video sequences.

With these improvements, we achieve First–Last–Frame-to-Video generation (FLF2V), enabling controllable and consistent video synthesis given the first and last frames as constraints.

Demo

demo.mp4

Run WanFM

Enviroment Preparation

Please see Wan2.2 (https://github.com/Wan-Video/Wan2.2?tab=readme-ov-file#installation).

Model Download

Models Download Links Description
I2V-A14B 🤗 Huggingface 🤖 ModelScope Image-to-Video MoE model, supports 480P & 720P

Run First-Last-Frame-to-Video Generation

  • Single-GPU inference
python generate.py \
    --task flf2v-A14B \
    --size 832*480 \
    --ckpt_dir ./Wan2.2-I2V-A14B \
    --offload_model False \
    --frame_num 81 \
    --sample_steps 40 \
    --sample_shift 16 \
    --sample_guide_scale 5 \
    --prompt <prompt> \
    --first_frame <first frame path> \
    --last_frame <last frame path> \
    --save_file <output path> \
    --bidirectional_sampling
  • Multi-GPU inference using FSDP + DeepSpeed Ulysses
torchrun --nproc_per_node=8 --master_port 39550 generate.py \
    --task flf2v-A14B \
    --size 832*480 \
    --ckpt_dir ./Wan2.2-I2V-A14B \
    --offload_model False \
    --convert_model_dtype \
    --frame_num 81 \
    --sample_steps 40 \
    --sample_shift 16 \
    --sample_guide_scale 5 \
    --dit_fsdp \
    --t5_fsdp \
    --ulysses_size 2 \
    --prompt <prompt> \
    --first_frame <first frame path> \
    --last_frame <last frame path> \
    --save_file <output path> \
    --bidirectional_sampling

If you encounter OOM (Out-of-Memory) issues, you can use the --offload_model True, --convert_model_dtype and --t5_cpu options to reduce GPU memory usage.

More Examples

flf2v_input_first_frame flf2v_input_last_frame
example0.mp4
lmotion1_0 lmotion1_1
example1.mp4
002 003
example2.mp4
pusa0_0 pusa0_1
example3.mp4
pusa1_0 pusa1_1
example4.mp4
pusa2_0 pusa2_1
example5.mp4
pusa3_0 pusa3_1
example6.mp4
cola1 cola2
example7.mp4
pusa4_0 pusa4_1
example8.mp4
huang0 huang1
example9.mp4

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages