ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Shaoshu Yang, Xiaodong Cun, Yong Zhang^#, Ying Shan, and Ran He^#

(# corresponding author)

TL; DR

ZeroSmooth is a training-free plug-in for video diffusers to enable high-frame rate video generation. One can build self-cascaded video models with our methods that generate smooth results while preserving the contents of the original outputs.

🎶 Notes

Welcome everyone to collaborate on the code repository, improve methods, and do more downstream tasks. Please check the CONTRIBUTING.md
If you have any questions or comments, we are open for discussion.

Abstract

Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video models can only generate low frame rate videos due to the limited GPU mem- ory as well as the difficulty of modeling a large set of frames. The training videos are always uniformly sampled at a specified interval for temporal compression. Previous methods promote the frame rate by either training a video interpolation model in pixel space as a postprocessing stage or training an interpolation model in latent space for a specific base video model. In this paper, we propose a training-free video interpolation method for generative video diffusion models, which is generalizable to different models in a plug-and-play manner. We investigate the non-linearity in the feature space of video diffusion models and transform a video model into a self-cascaded video diffusion model with incorporating the designed hidden state correction modules. The self-cascaded architecture and the correction module are proposed to retain the temporal consistency between key frames and the interpolated frames. Extensive evaluations are preformed on multiple popular video models to demonstrate the effectiveness of the propose method, especially that our training-free method is even comparable to trained interpolation models supported by huge compute resources and large-scale datasets.

Changelog

[2024.6.3]: 🔥 Release paper.

📝 TODO

Update gallery
Release codes
Hugging Face Gradio demo

Citation

@misc{yang2024zerosmooth,
      title={ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation}, 
      author={Shaoshu Yang and Yong Zhang and Xiaodong Cun and Ying Shan and Ran He},
      year={2024},
      eprint={2406.00908},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

TL; DR

🎶 Notes

Abstract

Changelog

📝 TODO

Citation

About

Releases

Packages

ssyang2020/ZeroSmooth

Folders and files

Latest commit

History

Repository files navigation

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

TL; DR

🎶 Notes

Abstract

Changelog

📝 TODO

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages