Please download [dataset] and unzip files like the following structure:
COCO
├── annotations
│ ├── train2017
│ ├── val2017
│ ├── stuff_train2017.json
│ ├── stuff_val2017.json
│ ├── thing_train2017.json
│ └── thing_val2017.json
├── train2017
├── val2017
├── train_filenames.pkl
├── val_filenames.pkl
└── categories.pkl
All the *.pkl
files are generated by datasets/COCO.py
utilizing datasets/COCO/annotations/*.json
, but those *.pkl
files have already been generated for you and you do not need to regenerate it again.
Please download [checkpoint]
and rename the downloaded *.pt
file to DDPM_pretrained.pt
and put it to path CS280_project/checkpoints
.
-
python scripts/train.py --batch_size 2 --freeze_ddpm --data_dir ./datasets/COCO --category_pickle ./datasets/COCO/categories.pkl --filename_pickle ./datasets/COCO/train_filenames_sieved.pkl --save_dir ./result --DDPM_dir ./checkpoints/DDPM_pretrained.pt
where
--batch_size
should be determined by your GPU -
Before training, you need to ensure you have
torchrun
installed andtorchrun
is located in your virtual environment instead of system environment.Check the location of
torchrun
with the following command:which torchrun
if the output is like
/home/username/anaconda3/bin/torchrun
, then you need to reinstalltorch
in your virtual environment:pip install --ignore-installed torch
After that, you can train the model with the following command:
torchrun --nproc-per-node $NUM_GPUS scripts/train.py --batch_size $BATCH_SIZE --freeze_ddpm --data_dir ./datasets/COCO --category_pickle ./datasets/COCO/categories.pkl --filename_pickle ./datasets/COCO/train_filenames_sieved.pkl --save_dir ./result --DDPM_dir ./checkpoints/DDPM_pretrained.pt
where
$NUM_GPUS
is the number of GPUs you want to use and$BATCH_SIZE
is the batch size for each GPU.
Note: If you encountered the error like:
ModuleNotFoundError: No module named 'improved_diffusion'
Then you have several options to solve this problem:
-
Add the path of
improved-diffusion
toPYTHONPATH
in your terminal (need to be done every time you open a new terminal)export PYTHONPATH=$PYTHONPATH:$(pwd)
-
Add the following code to the beginning of
train.py
(already added) (permanent):import os import sys sys.path.append(os.path.dirname(sys.path[0]))
-
Install
improved-diffusion
as a package in editable mode (permanent):pip install -e .
If you failed when installing the package mpi4py
on AI-cluster (10.15.89.191/192
), you can try to install the package with the following options:
-
The original installation with
pip
failed as it cannot find thempi
compilers. You can get the access of the compilers with following steps:Then you can choose thempi-selector-menu
openmpi
compiler foruser
. After that, you can install the package withpip
:pip install mpi4py
-
The installation with
conda
is much easier. You can install the package with the following command:conda install mpi4py