update

minhloi0901 · Oct 31, 2024 · ab67fe9 · ab67fe9
1 parent 8eaa91d
commit ab67fe9
Show file tree

Hide file tree

Showing 2 changed files with 93 additions and 25 deletions.
diff --git a/README.md b/README.md
@@ -57,9 +57,7 @@
 
 
 <!-- GETTING STARTED -->
-## Getting Started
-
-### Prerequisites
+### 1. Getting Started
 #### Install ffmpeg
 we write videos use ffmpeg, you can install by fllowing command:
 ```bash
@@ -78,49 +76,126 @@ sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y
 
 
 
-### Inference
-It is worthy to note that, with our optimized inference code, our model allow to generate video with 256*256*16 with even on GPU with 8GB for batch size 1.
+Here's the refined Markdown code for the inference instructions:
+
+
+### 2. Inference
+
 #### Download Pretrained Models from ModelScope
+
+To download pretrained models, run the following command:
+
 ```bash
 bash models/download.sh
 ```
 
-#### Download our fine-tuned [checkpoints](https://huggingface.co/Ryan-PR/DEMO) from huggingface.
+Alternatively, you can download directly from [Hugging Face](https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis) and place the downloaded folder in `models/modelscopet2v`.
 
-#### Prepare inference prompt in prompts/your_prompt.csv. Example prompt file as:
-```bash
+#### Download Fine-Tuned Checkpoints
+
+Download our fine-tuned [checkpoints](https://huggingface.co/Ryan-PR/DEMO) from Hugging Face.
+
+#### Prepare Inference Prompt
+
+Create an inference prompt file at `prompts/test_prompt.csv`. Here’s an example format:
+
+```csv
 id,prompt
 1,a fat dog is playing in the yard.
 2,a fat car is parked by the road.
 3,a fat balloon is floating in the air.
 ```
+
 #### Start Inference
+
+To start inference, run:
+
 ```bash
 bash scripts/inference_deeepspeed.sh
 ```
 
-### Training
+By default, distributed inference utilizes all available GPUs. To manually specify GPUs, add the `--include` flag in the DeepSpeed command:
 
-#### Dataset Preparation
-Follow the instruction and download [Web-Vid](https://github.com/m-bain/webvid) dataset. If you prefer to use your own dataset, please refer to tools/datasets/video_datasets.py to define your own dataset and preprocessing step.
+```bash
+--include="localhost:<your gpu ids>"
+```
 
+### Inference Configuration
 
+All configurations for inference are found in `configs/t2v_inference_deepspeed.yaml`. In this file, you can adjust the following settings:
 
+- **`infer_dataset`**: Specify your dataset type and prompt path.
+- **`batch_size`**: Set the batch size for diffusion sampling.
+- **`decoder_bs`**: Define the batch size for VAE decoding.
+- **`pretrained`**: Set checkpoint paths for pretrained models.
 
-#### Download Pretrained Models from ModelScope
+The DeepSpeed configurations for inference are located in `ds_config/ds_config_inference.json`. You can also use a custom DeepSpeed configuration by modifying the `deepspeed_config` setting in `configs/t2v_inference_deepspeed.yaml`.
+
+ With our optimized inference code, this model can generate video at 256x256 resolution with 16 frames on an 8GB GPU with a batch size of 1.
+
+
+
+
+### 3. Training
+
+#### Dataset Preparation
+Follow the instruction and download [Web-Vid](https://github.com/m-bain/webvid) dataset. we provide an example training dataset under data/webvid_example
+If you prefer to use your own dataset, please refer to tools/datasets/video_datasets.py to define your own dataset and preprocessing step.
 
+
+#### Download pretrained models from ModelScope
 ```bash
 bash models/download.sh
 ```
+You can also direcly download from [huggingface](https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis) and place the folder as `models/modelscopet2v`
+
+
 
 
 
+#### Train the Model
+
+To train the model, run the following command:
 
-#### Train the model
 ```bash
 bash scripts/train_deeepspeed.sh
 ```
-Note that, we use deepspeed stage 2 with cpu_adam for speeding up the train process, you may need to specify the CUDA_HOME and LD_LIBRARY_PATH in the script, to allow deepspeed to compile binaries for cpu_adam. You can also simply skip this by switching to other optimizer in the ds_configs/ds_config_train.json
+
+By default, data distributed parallel training is used, utilizing all available GPUs. If you want to manually specify the GPUs, add the `--include` flag to the DeepSpeed command:
+
+```bash
+--include="localhost:<gpu_ids>"
+```
+
+#### Training Configuration
+
+All training configurations are in the `configs/t2v_train_deepspeed.yaml` file. You can customize the following settings:
+
+- **`train_dataset`**: Define your dataset type and provide the prompt path.
+- **`pretrained`**: Specify the checkpoint paths for pretrained models.
+
+The DeepSpeed configurations for training are located in `ds_config/ds_config_train.json`. You can customize these settings or provide your own DeepSpeed configuration by modifying the `deepspeed_config` parameter in `configs/t2v_train_deepspeed.yaml`.
+
+#### Key DeepSpeed Settings
+
+In `ds_config/ds_config_train.json`, you can specify:
+
+- **`train_micro_batch_size_per_gpu`**: The batch size for each GPU.
+- **`gradient_accumulation_steps`**: Number of steps for gradient accumulation.
+- **`zero_optimization`**: Configurations for DeepSpeed's ZeRO optimization. By default, we use stage 2 with optimizer offloading to the CPU, which may increase CPU memory usage. Disable this if you have limited CPU memory. If your GPUs have large memory, you can switch to stage 1 for faster convergence.
+- **`optimizer`**: By default, we use DeepSpeed's highly optimized CPU Adam for faster training, which requires compiling with `nvcc` during the first run. You may need to set `CUDA_HOME` and `LD_LIBRARY_PATH` environment variables. Alternatively, you can switch to another optimizer in `ds_config/ds_config_train.json`. Refer to the [DeepSpeed documentation](https://www.deepspeed.ai/) for more information.
+
+#### Monitoring Training
+
+TensorBoard is enabled by default for monitoring the training process. To view the training progress, launch TensorBoard with:
+
+```bash
+tensorboard --logdir=tensorboard_log/demo
+```
+
+
+
+
 
 
 
@@ -129,10 +204,10 @@ Note that, we use deepspeed stage 2 with cpu_adam for speeding up the train proc
 
 
 <!-- ROADMAP -->
-## Roadmap
+## TODO
 
-- [x] Open source model weights.
-- [x] Open source inference and training code.
+- [x] Release model weights.
+- [x] Release inference and training code.
 - [ ] Huggingface demo.
 - [ ] gradio application.
 

diff --git a/configs/t2v_inference_deepspeed.yaml b/configs/t2v_inference_deepspeed.yaml
@@ -120,13 +120,11 @@ use_div_loss: False
 
 # Model
 scale_factor: 0.18215  
-# cfg.use_checkpoint = True
-# cfg.use_sharded_ddp = False
 use_fsdp: False 
 use_fp16: True
 temporal_attention: True
 
-# cfg.guidances = []
+
 
 auto_encoder: {
     'type': 'AutoencoderKL',
@@ -153,12 +151,7 @@ negative_prompt: 'Distorted, discontinuous, Ugly, blurry, low resolution, motion
 # training and optimizer
 ema_decay: 0.9999
 
-# lr: 5e-5
 weight_decay: 0.0
-# betas: (0.9, 0.999)
-# eps: 1.0e-8
-# chunk_size: 16
-# decoder_bs: 8
 alpha: 0.7
 
 # scheduler