Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
PR-Ryan committed Oct 31, 2024
1 parent 8eaa91d commit ab67fe9
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 25 deletions.
109 changes: 92 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,7 @@


<!-- GETTING STARTED -->
## Getting Started

### Prerequisites
### 1. Getting Started
#### Install ffmpeg
we write videos use ffmpeg, you can install by fllowing command:
```bash
Expand All @@ -78,49 +76,126 @@ sudo apt-get update && apt-get install ffmpeg libsm6 libxext6 -y



### Inference
It is worthy to note that, with our optimized inference code, our model allow to generate video with 256*256*16 with even on GPU with 8GB for batch size 1.
Here's the refined Markdown code for the inference instructions:


### 2. Inference

#### Download Pretrained Models from ModelScope

To download pretrained models, run the following command:

```bash
bash models/download.sh
```

#### Download our fine-tuned [checkpoints](https://huggingface.co/Ryan-PR/DEMO) from huggingface.
Alternatively, you can download directly from [Hugging Face](https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis) and place the downloaded folder in `models/modelscopet2v`.

#### Prepare inference prompt in prompts/your_prompt.csv. Example prompt file as:
```bash
#### Download Fine-Tuned Checkpoints

Download our fine-tuned [checkpoints](https://huggingface.co/Ryan-PR/DEMO) from Hugging Face.

#### Prepare Inference Prompt

Create an inference prompt file at `prompts/test_prompt.csv`. Here’s an example format:

```csv
id,prompt
1,a fat dog is playing in the yard.
2,a fat car is parked by the road.
3,a fat balloon is floating in the air.
```

#### Start Inference

To start inference, run:

```bash
bash scripts/inference_deeepspeed.sh
```

### Training
By default, distributed inference utilizes all available GPUs. To manually specify GPUs, add the `--include` flag in the DeepSpeed command:

#### Dataset Preparation
Follow the instruction and download [Web-Vid](https://github.com/m-bain/webvid) dataset. If you prefer to use your own dataset, please refer to tools/datasets/video_datasets.py to define your own dataset and preprocessing step.
```bash
--include="localhost:<your gpu ids>"
```

### Inference Configuration

All configurations for inference are found in `configs/t2v_inference_deepspeed.yaml`. In this file, you can adjust the following settings:

- **`infer_dataset`**: Specify your dataset type and prompt path.
- **`batch_size`**: Set the batch size for diffusion sampling.
- **`decoder_bs`**: Define the batch size for VAE decoding.
- **`pretrained`**: Set checkpoint paths for pretrained models.

#### Download Pretrained Models from ModelScope
The DeepSpeed configurations for inference are located in `ds_config/ds_config_inference.json`. You can also use a custom DeepSpeed configuration by modifying the `deepspeed_config` setting in `configs/t2v_inference_deepspeed.yaml`.

With our optimized inference code, this model can generate video at 256x256 resolution with 16 frames on an 8GB GPU with a batch size of 1.




### 3. Training

#### Dataset Preparation
Follow the instruction and download [Web-Vid](https://github.com/m-bain/webvid) dataset. we provide an example training dataset under data/webvid_example
If you prefer to use your own dataset, please refer to tools/datasets/video_datasets.py to define your own dataset and preprocessing step.


#### Download pretrained models from ModelScope
```bash
bash models/download.sh
```
You can also direcly download from [huggingface](https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis) and place the folder as `models/modelscopet2v`





#### Train the Model

To train the model, run the following command:

#### Train the model
```bash
bash scripts/train_deeepspeed.sh
```
Note that, we use deepspeed stage 2 with cpu_adam for speeding up the train process, you may need to specify the CUDA_HOME and LD_LIBRARY_PATH in the script, to allow deepspeed to compile binaries for cpu_adam. You can also simply skip this by switching to other optimizer in the ds_configs/ds_config_train.json

By default, data distributed parallel training is used, utilizing all available GPUs. If you want to manually specify the GPUs, add the `--include` flag to the DeepSpeed command:

```bash
--include="localhost:<gpu_ids>"
```

#### Training Configuration

All training configurations are in the `configs/t2v_train_deepspeed.yaml` file. You can customize the following settings:

- **`train_dataset`**: Define your dataset type and provide the prompt path.
- **`pretrained`**: Specify the checkpoint paths for pretrained models.

The DeepSpeed configurations for training are located in `ds_config/ds_config_train.json`. You can customize these settings or provide your own DeepSpeed configuration by modifying the `deepspeed_config` parameter in `configs/t2v_train_deepspeed.yaml`.

#### Key DeepSpeed Settings

In `ds_config/ds_config_train.json`, you can specify:

- **`train_micro_batch_size_per_gpu`**: The batch size for each GPU.
- **`gradient_accumulation_steps`**: Number of steps for gradient accumulation.
- **`zero_optimization`**: Configurations for DeepSpeed's ZeRO optimization. By default, we use stage 2 with optimizer offloading to the CPU, which may increase CPU memory usage. Disable this if you have limited CPU memory. If your GPUs have large memory, you can switch to stage 1 for faster convergence.
- **`optimizer`**: By default, we use DeepSpeed's highly optimized CPU Adam for faster training, which requires compiling with `nvcc` during the first run. You may need to set `CUDA_HOME` and `LD_LIBRARY_PATH` environment variables. Alternatively, you can switch to another optimizer in `ds_config/ds_config_train.json`. Refer to the [DeepSpeed documentation](https://www.deepspeed.ai/) for more information.

#### Monitoring Training

TensorBoard is enabled by default for monitoring the training process. To view the training progress, launch TensorBoard with:

```bash
tensorboard --logdir=tensorboard_log/demo
```







Expand All @@ -129,10 +204,10 @@ Note that, we use deepspeed stage 2 with cpu_adam for speeding up the train proc


<!-- ROADMAP -->
## Roadmap
## TODO

- [x] Open source model weights.
- [x] Open source inference and training code.
- [x] Release model weights.
- [x] Release inference and training code.
- [ ] Huggingface demo.
- [ ] gradio application.

Expand Down
9 changes: 1 addition & 8 deletions configs/t2v_inference_deepspeed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,13 +120,11 @@ use_div_loss: False

# Model
scale_factor: 0.18215
# cfg.use_checkpoint = True
# cfg.use_sharded_ddp = False
use_fsdp: False
use_fp16: True
temporal_attention: True

# cfg.guidances = []


auto_encoder: {
'type': 'AutoencoderKL',
Expand All @@ -153,12 +151,7 @@ negative_prompt: 'Distorted, discontinuous, Ugly, blurry, low resolution, motion
# training and optimizer
ema_decay: 0.9999

# lr: 5e-5
weight_decay: 0.0
# betas: (0.9, 0.999)
# eps: 1.0e-8
# chunk_size: 16
# decoder_bs: 8
alpha: 0.7

# scheduler
Expand Down

0 comments on commit ab67fe9

Please sign in to comment.