Skip to content

AILab-CVC/FreeNoise

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

🔥🔥🔥 The LongerCrafter for longer high-quality video generation are now released!

✅ totally no tuning      ✅ less than 20% extra time      ✅ support 512 frames     

           

Haonan Qiu, Menghan Xia*, Yong Zhang, Yingqing He,
Xintao Wang, Ying Shan, and Ziwei Liu*


(* corresponding author)

From Tencent AI Lab and Nanyang Technological University.

Input: "A chihuahua in astronaut suit floating in space, cinematic lighting, glow effect";
Resolution: 1024 x 576; Frames: 64.

Input: "Campfire at night in a snowy forest with starry sky in the background";
Resolution: 1024 x 576; Frames: 64.

🔆 Introduction

🤗🤗🤗 LongerCrafter (FreeNoise) is a tuning-free and time-efficient paradigm for longer video generation based on pretrained video diffusion models.

1. Longer Single-Prompt Text-to-video Generation

Longer single-prompt results. Resolution: 256 x 256; Frames: 512. (Compressed)

2. Longer Multi-Prompt Text-to-video Generation

Longer multi-prompt results. Resolution: 256 x 256; Frames: 256. (Compressed)

📝 Changelog

  • [2023.10.24]: 🔥🔥 Release the LongerCrafter (FreeNoise), longer video generation!
  • [2023.10.25]: 🔥🔥 Release the 256x256 model and support multi-prompt generation!

🧰 Models

Model Resolution Checkpoint Description

|VideoCrafter (Text2Video)|320x512|Temporarily Unavailable|Support 128 frames on NVIDIA A100 (40GB) |VideoCrafter (Text2Video)|576x1024|Hugging Face|Support 64 frames on NVIDIA A100 (40GB) |VideoCrafter (Text2Video)|256x256|Hugging Face|Support 512 frames on NVIDIA A100 (40GB)

(Reduce the number of frames when you have smaller GPUs, e.g. 256x256 resolutions with 64 frames.)

⚙️ Setup

Install Environment via Anaconda (Recommended)

conda create -n freenoise python=3.8.5
conda activate freenoise
pip install -r requirements.txt

💫 Inference

1. Longer Text-to-Video

  1. Download pretrained T2V models via Hugging Face, and put the model.ckpt in checkpoints/base_1024_v1/model.ckpt.
  2. Input the following commands in terminal.
  sh scripts/run_text2video_freenoise_1024.sh

2. Longer Multi-Prompt Text-to-Video

  1. Download pretrained T2V models via Hugging Face, and put the model.pth in checkpoints/base_256_v1/model.pth.
  2. Input the following commands in terminal.
  sh scripts/run_text2video_freenoise_mp_256.sh

🧲 Support For Other Models

FreeNoise is supposed to work on other similar frameworks. An easy way to test compatibility is by shuffling the noise to see whether a new similar video can be generated (set eta to 0). If your have any questions about applying FreeNoise to other frameworks, feel free to contact Haonan Qiu.

👨‍👩‍👧‍👦 Crafter Family

VideoCrafter: Framework for high-quality video generation.

ScaleCrafter: Tuning-free method for high-resolution image/video generation.

TaleCrafter: An interactive story visualization tool that supports multiple characters.

😉 Citation

@misc{qiu2023freenoise,
      title={FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling}, 
      author={Haonan Qiu and Menghan Xia and Yong Zhang and Yingqing He and Xintao Wang and Ying Shan and Ziwei Liu},
      year={2023},
      eprint={2310.15169},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

📢 Disclaimer

We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.