Train VGG-like and ResNet-like auto-encoder on image dataset like ImageNet
$imagenet-autoencoder
|──figs # result images
|── *.jpg
|──models
|──builder.py # build autoencoder models
|──resnet.py # resnet-like autoencoder
|──vgg.py # vgg-like autoencoder
|──run
|──eval.sh # command to evaluate single checkpoint
|──evalall.sh # command to evaluate all checkpoints in specific folder
|──train.sh # command to train auto-encoder
|──tools
|──decode.py # decode random latent code to images
|──encode.py # encode single image to latent code
|──generate_list.py # generate image list for training
|──reconstrust.py # reconstruct the images to see difference
|──dataloader.py # dataset and dataloader
|──eval.py # evaluate checkpoints
|──train.py # train models
|──utils.py # other utility function
|──requirements.txt
|──README.md
- Clone the project
git clone https://github.com/Horizon2333/imagenet-autoencoder
cd imagenet-autoencoder- Install dependencies
pip install -r requirements.txtYour dataset should looks like:
$your_dataset_path
|──class1
|──xxxx.jpg
|──...
|──class2
|──xxxx.jpg
|──...
|──...
|──classN
|──xxxx.jpg
|──...
The you can use tools/generate_list.py to generate list of training samples. Here we do not use torchvision.datasets.ImageFolder because it is very slow when dataset is pretty large. You can run
python tools/generate_list.py --name {name your dataset such as caltech256} --path {path to your dataset}Then two files will be generated under list folder, one *_list.txt save every image path and its class(here no use); one *_name.txt save index of every class and its class name.
For training
bash run/train.sh {model architecture such as vgg16} {you dataset name}
# For example
bash run/train.sh vgg16 caltech256For evaluating single checkpoint:
bash run/eval.sh {model architecture} {checkpoint path} {dataset name}
# For example
bash run/eval.sh vgg16 results/caltech256-vgg16/099.pth caltech101For evaluating all checkpoints under specific folder:
bash run/evalall.sh {model architecture} {checkpoints path} {dataset name}
# For example
bash run/evalall.sh vgg16 results/caltech256-vgg16/ caltech101When all checkpoints are evaluated, a scatter diagram figs/evalall.jpg will be generated to show the evaluate loss trend.
For model architecture, now we support vgg11,vgg13,vgg16,vgg19 and resnet18, resnet34, resnet50, resnet101, resnet152.
We provide several tools to better visualize the auto-encoder results.
reconstruct.py
Reconstruct images from original one. This code will sample 64 of them and save the comparison results to figs/reconstruction.jpg.
python tools/reconstruct.py --arch {model architecture} --resume {checkpoint path} --val_list {*_list.txt of your dataset}
# For example
python tools/reconstruct.py --arch vgg16 --resume results/caltech256-vgg16/099.pth --val_list caltech101_list.txtencode.py and decode.py
Encode image to latent code or decode latent code to images.
encode.py can transfer single image to latent code.
python tools/encode.py --arch {model architecture} --resume {checkpoint path} --img_path {image path}
# For example
python tools/encode.py --arch vgg16 --resume results/caltech256-vgg16/099.pth --img_path figs/reconstruction.jpgdecode.py can transform 128 random latent code to images.
python tools/decode.py --arch {model architecture} --resume {checkpoint path}
# For example
python tools/decode.py --arch vgg16 --resume results/caltech256-vgg16/099.pthThe decoded results will be save as figs/generation.jpg
| Dataset | VGG11 | VGG13 | VGG16 | VGG19 | ResNet18 | ResNet34 | ResNet50 | ResNet101 | ResNet152 |
|---|---|---|---|---|---|---|---|---|---|
| Caltech256 | link | link | link | link | link | link | link | link | link |
| Objects365 | link | link | link | link | |||||
| ImageNet | link |
Note that the size of Objects365 dataset is about half of ImageNet dataset(128 million images, much larger than Caltech256), so the performance may be comparable.
