GitHub - minhloi0901/DEMO at ab67fe9cb4bde4695c6a6db893979eed4d9f2783

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
assets		assets
configs		configs
data/webvid		data/webvid
ds_configs		ds_configs
inference/DEMO_VisualDatasetRepeat		inference/DEMO_VisualDatasetRepeat
models		models
prompts		prompts
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
infer.sh		infer.sh
inference.py		inference.py
key_file.json		key_file.json
requirements.txt		requirements.txt
train_net.py		train_net.py

Repository files navigation

DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Penghui Ruan, Pichao Wang, Divya Saxena, Jiannong Cao, Yuhui Shi

Accepted at NeurIPS 2024 (Poster)

Slow motion flower petals fall from a blossom, landing softly on the ground.

Lavie	VideoCrafter2	ModelScope	Demo

An old man with white hair is shown speaking.

Lavie	VideoCrafter2	ModelScope	Demo

Jockeys racing.

Lavie	VideoCrafter2	ModelScope	Demo

1. Getting Started

Install ffmpeg

we write videos use ffmpeg, you can install by fllowing command:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

Environment Preparation

git clone git@github.com:PR-Ryan/DEMO.git

conda create -n demo python=3.8
conda activate demo
pip install -r requirements.txt

Here's the refined Markdown code for the inference instructions:

2. Inference

Download Pretrained Models from ModelScope

To download pretrained models, run the following command:

bash models/download.sh

Alternatively, you can download directly from Hugging Face and place the downloaded folder in models/modelscopet2v.

Download Fine-Tuned Checkpoints

Download our fine-tuned checkpoints from Hugging Face.

Prepare Inference Prompt

Create an inference prompt file at prompts/test_prompt.csv. Here’s an example format:

id,prompt
1,a fat dog is playing in the yard.
2,a fat car is parked by the road.
3,a fat balloon is floating in the air.

Start Inference

To start inference, run:

bash scripts/inference_deeepspeed.sh

By default, distributed inference utilizes all available GPUs. To manually specify GPUs, add the --include flag in the DeepSpeed command:

--include="localhost:<your gpu ids>"

Inference Configuration

All configurations for inference are found in configs/t2v_inference_deepspeed.yaml. In this file, you can adjust the following settings:

infer_dataset: Specify your dataset type and prompt path.
batch_size: Set the batch size for diffusion sampling.
decoder_bs: Define the batch size for VAE decoding.
pretrained: Set checkpoint paths for pretrained models.

The DeepSpeed configurations for inference are located in ds_config/ds_config_inference.json. You can also use a custom DeepSpeed configuration by modifying the deepspeed_config setting in configs/t2v_inference_deepspeed.yaml.

With our optimized inference code, this model can generate video at 256x256 resolution with 16 frames on an 8GB GPU with a batch size of 1.

3. Training

Dataset Preparation

Follow the instruction and download Web-Vid dataset. we provide an example training dataset under data/webvid_example If you prefer to use your own dataset, please refer to tools/datasets/video_datasets.py to define your own dataset and preprocessing step.

Download pretrained models from ModelScope

bash models/download.sh

You can also direcly download from huggingface and place the folder as models/modelscopet2v

Train the Model

To train the model, run the following command:

bash scripts/train_deeepspeed.sh

By default, data distributed parallel training is used, utilizing all available GPUs. If you want to manually specify the GPUs, add the --include flag to the DeepSpeed command:

--include="localhost:<gpu_ids>"

Training Configuration

All training configurations are in the configs/t2v_train_deepspeed.yaml file. You can customize the following settings:

train_dataset: Define your dataset type and provide the prompt path.
pretrained: Specify the checkpoint paths for pretrained models.

The DeepSpeed configurations for training are located in ds_config/ds_config_train.json. You can customize these settings or provide your own DeepSpeed configuration by modifying the deepspeed_config parameter in configs/t2v_train_deepspeed.yaml.

Key DeepSpeed Settings

In ds_config/ds_config_train.json, you can specify:

train_micro_batch_size_per_gpu: The batch size for each GPU.
gradient_accumulation_steps: Number of steps for gradient accumulation.
zero_optimization: Configurations for DeepSpeed's ZeRO optimization. By default, we use stage 2 with optimizer offloading to the CPU, which may increase CPU memory usage. Disable this if you have limited CPU memory. If your GPUs have large memory, you can switch to stage 1 for faster convergence.
optimizer: By default, we use DeepSpeed's highly optimized CPU Adam for faster training, which requires compiling with nvcc during the first run. You may need to set CUDA_HOME and LD_LIBRARY_PATH environment variables. Alternatively, you can switch to another optimizer in ds_config/ds_config_train.json. Refer to the DeepSpeed documentation for more information.

Monitoring Training

TensorBoard is enabled by default for monitoring the training process. To view the training progress, launch TensorBoard with:

tensorboard --logdir=tensorboard_log/demo

TODO

Release model weights.
Release inference and training code.
Huggingface demo.
gradio application.

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Penghui Ruan - penghui.ruan@connect.polyu.hk

Project Link: https://pr-ryan.github.io/DEMO-project/

Acknowledgments

This repo is heavily built upon VGen from alibaba. We sincerely thanks for their effort to contribting the open-source conmmunity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Slow motion flower petals fall from a blossom, landing softly on the ground.

An old man with white hair is shown speaking.

Jockeys racing.

1. Getting Started

Install ffmpeg

Environment Preparation

2. Inference

Download Pretrained Models from ModelScope

Download Fine-Tuned Checkpoints

Prepare Inference Prompt

Start Inference

Inference Configuration

3. Training

Dataset Preparation

Download pretrained models from ModelScope

Train the Model

Training Configuration

Key DeepSpeed Settings

Monitoring Training

TODO

License

Contact

Acknowledgments

BibTex

About

Releases

Packages

Languages

License

minhloi0901/DEMO

Folders and files

Latest commit

History

Repository files navigation

DEMO: Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning

Slow motion flower petals fall from a blossom, landing softly on the ground.

An old man with white hair is shown speaking.

Jockeys racing.

1. Getting Started

Install ffmpeg

Environment Preparation

2. Inference

Download Pretrained Models from ModelScope

Download Fine-Tuned Checkpoints

Prepare Inference Prompt

Start Inference

Inference Configuration

3. Training

Dataset Preparation

Download pretrained models from ModelScope

Train the Model

Training Configuration

Key DeepSpeed Settings

Monitoring Training

TODO

License

Contact

Acknowledgments

BibTex

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages