Shizhan Zhu
Released on Oct 11, 2017
The complete demo is now updated. Please refer to here for details.
To facilitate future researches, we provide the indexing of our selected subset from the DeepFashion Dataset (attribute prediction task). It contains a .mat file which contains a 78979-dim indexing vector pointing to the index among the full set (the values are between 1 and 289222). We also provide the nameList of the selected subset. Download the indexing here.
This is the implementation of Shizhan Zhu et al.'s ICCV-17 work Be Your Own Prada: Fashion Synthesis with Structural Coherence. It is open source under BSD-3 license (see the LICENSE
file). Codes can be used freely only for academic purpose. If you want to apply it to industrial products, please send an email to Shizhan Zhu at zhshzhutah2@gmail.com
first.
The motivation of this work, as well as the training data used, are from the DeepFashion dataset. Please cite the following papers if you use the codes or data of this work:
@inproceedings{liuLQWTcvpr16DeepFashion,
author = {Ziwei Liu and Ping Luo and Shi Qiu and Xiaogang Wang and Xiaoou Tang},
title = {DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = June,
year = {2016}
}
@inproceedings{zhu2017be,
title={Be Your Own Prada: Fashion Synthesis with Structural Coherence},
author={Zhu, Shizhan and Fidler, Sanja and Urtasun, Raquel and Lin, Dahua and Chen, Change Loy},
booktitle={Proceedings of the IEEE Conference on International Conference on Computer Vision},
year={2017}
}
Matirx Visualization: The samples shown in the same row are generated from the same original person while the samples shown in the same collumn are generated from the same text description.
Walking the latent space: For each row, the first and the last images are the two samples that we will make the interpolation. We gradually change the input from the left image. In the first row, we only interpolate the input to the first stage and hence the generated results only change in shapes. In the second row, we only interpolate the input to the second stage and hence the results only change in textures. The last row interpolate the input for both the first and second stages and hence the generated interpolated results transfer smoothly from the left to the right.
The implementation is based on Torch. CuDNN is required.
- Step 1: Run the following command to obtain part of the training data and the off-the-shelf pre-trained model. Folders for models are also created here.
sh download.sh
This part of the data contains all the new annotations (languages and segmentation maps) on the subset of the DeepFashion dataset, as well as the benchmarking info (the train-test split and the image-language pairs of the test set). Compared to the full data, it does not contain the G2.h5
(which you need to obtain according to Step 2 below).
- Step 2: You can obtain
G2.h5
in the same way as obtaining the DeepFashion dataset. Please refer to this page for detailed instructions (e.g. sign up an agreement). After obtaining theG2.h5
, you need to put it into the directory of./data_release/supervision_signals/
before you can use the codes.
Formatting of the data stored in the .h5 files:
b_: The segmentation label for each image, e.g. 0 represents the background.
ih: The 128x128 images.
ih_mean: The mean image.
For any questions regarding obtaining the data (e.g. cannot obtain through the Dropbox via the link) please send an email to zhshzhutah2@gmail.com
.
All the testing codes are in the demo_release
folder. The GAN of our second stage has three options in our implementation.
- Run
demo_full.lua
with this line uncommented. The network structure is our original submitted version. - Run
demo_full.lua
as it is. It adds the skip connection technique proposed in Hour-glass and pix2pix. - Run
demo_p2p.lua
. The network structure completely follows pix2pix. The texture would be nice but cannot be controlled.
You can modify this block to switch different types of visualization.
- To train the first-stage-gan, enter the
sr1
folder and run thetrain.lua
file. - To train the second-stage-gan, enter the relevant folder to run the
train.lua
file. Folderih1
refers to our original submission. Filderih1_skip
refers to the second-stage-network coupled with skip connection. Folderih1_p2p
uses pix2pix as our second stage.
By using the complete demo, you can use your own image and language to serve as the inputs. Your own original image is not limited to be 128x128 but our output is 128x128. Your input sentence is assumed not to contain words that our model does not know.
To set up, get to the root directory of the repo and run the following commands:
sh download.sh
cd complete_demo
sh setup.sh
In addition, we also need the OpenPose library to detect the bounding box of the human inside the image. Please follow the instructions to also install the OpenPose library appropriately.
Please make sure that the Torch, PyTorch and matlab softwares are available on your system.
The complete demo can be run with the command OPENPOSE_DIR=\path\to\your\installed\openpose sh demo.sh
. The input
folder should contain at least two samples like the provided ones (apology for that due to the matlab's automatic squeezing of dimensions). After running the demo, the results are expected to be stored in the output
folder.
The complete demo uses the libraries of Dense CRF, OpenPose, as well as the dataset from ATR and LIP. Please cite these works if you are using our complete demo.
Please report to us if you find your Matlab does not support the function cp2tform
. Thanks!
Please refer to the language
folder for training and testing the initial language encoding model.
Suggestions and opinions of this work (both positive and negative) are greatly welcome. Please contact the author by sending email to zhshzhutah2@gmail.com
.
BSD-3, see LICENSE
file for details.