Global-Flow-Local-Attention

Website | ArXiv | Application

Global-Flow-Local-Attention

The source code for our paper "Deep Image Spatial Transformation for Person Image Generation" (to appear in CVPR2020)

Our model can be ﬂexibly applied to tasks requiring spatial transformations such as:

Pose-Guided Person Image Generation:

Left: generated results of our model; Right: Input source images.

Pose-Guided Person Image Animation

From Left to Right: Real Video; Extracted Pose; Animation Results.

Face Image Animation

Left: Input image; Right: Output results.

View Synthesis

Form Left to Right: Input image, Results of Appearance Flow, Results of Ours, Ground-truth images.

News

2020.3.15 We upload the code and trained models of the Face Animation and View Synthesis!
2020.3.3 Project Website and Paper are avaliable!
2020.2.29 Code for PyTorch is available now!

Installation

Requirements

Python 3
pytorch (1.0.0)
CUDA
visdom

Conda installation

# 1. Create a conda virtual environment.
conda create -n gfla python=3.6 -y
source activate gfla

# 2. Install dependency
pip install -r requirement.txt

# 3. Build pytorch Custom CUDA Extensions
./setup.sh

Note: The current code is tested with Tesla V100. If you use a different GPU, you may need to select correct nvcc_args for your GPU when you buil Custom CUDA Extensions. Comment or Uncomment --gencode in block_extractor/setup.py, local_attn_reshape/setup.py, and resample2d_package/setup.py. Please check here for details.

Pre-Trained Models

We provide the pre-trained weights of our models. These resources can be downloaded from:

Pose-Guided Person Image Generation
- Fashion
- Market
Pose Guided Person Image Animation
- Coming Soon
Face Image Animation
- Face Animation
Novel View Synthesis
- ShapeNet Car
- ShapeNet Chair

Pose-Guided Person Image Generation

Dataset

Market1501

Download the Market-1501 dataset from here. Rename bounding_box_train and bounding_box_test as train and test, and put them under the ./dataset/market directory
Download train/test key points annotations from Google Drive including market-pairs-train.csv, market-pairs-test.csv, market-annotation-train.csv, market-annotation-train.csv. Put these files under the ./dataset/market directory.

DeepFashion

Download img_highres.zip of the DeepFashion Dataset from In-shop Clothes Retrieval Benchmark.
Unzip img_highres.zip. You will need to ask for password from the dataset maintainers. Then put the obtained folder img_highres under the ./dataset/fashion directory.
Download train/test key points annotations and the dataset list from Google Drive including fashion-pairs-train.csv, fashion-pairs-test.csv, fashion-annotation-train.csv, fashion-annotation-train.csv, train.lst, test.lst. Put these files under the ./dataset/fashion directory.
Run the following code to split the train/test dataset.
```
python script/generate_fashion_datasets.py
```

Evaluation

Download the trained weights from Fashion, Market. Put the obtained checkpoints under ./result/pose_fashion_checkpoints and ./result/pose_market_checkpoints respectively.

Run the following codes to obtain the pose-based transformation results.

# Test the DeepFashion dataset 
python test.py \
--name=pose_fashion_checkpoints \
--model=pose \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=fashion \
--dataroot=./dataset/fashion \
--results_dir=./eval_results/fashion

# Test the market dataset
python test.py \
--name=pose_market_checkpoints \
--model=pose \
--attn_layer=2 \
--kernel_size=2=3 \
--gpu_id=0 \
--dataset_mode=market \
--dataroot=./dataset/market \
--results_dir=./eval_results/market

You can use the provided evaluation codes to evaluate the performance of our models.

# evaluate the performance (FID and LPIPS scores) over the DeepFashion dataset.
CUDA_VISIBLE_DEVICES=0 python -m  script.metrics \
--gt_path=./dataset/fashion/test_256 \
--distorated_path=./eval_results/fashion \
--fid_real_path=./dataset/fashion/train_256 \
--name=./fashion

# evaluate the performance (FID and LPIPS scores) over the Market dataset.
CUDA_VISIBLE_DEVICES=0 python -m  script.metrics \
--gt_path=./dataset/market/test_12864 \
--distorated_path=./eval_results/market \
--fid_real_path=./dataset/market/train_12864 \
--name=./market_12864

Note:

We calculate the LPIPS scores using the code provided by the official repository PerceptualSimilarity. Please clone their repository and put the folder PerceptualSimilarity to the folder script.
For FID, the real data distributions are calculated over the whole training set.

Training

We train our model in stages. The Flow Field Estimator is ﬁrst trained to generate ﬂow ﬁelds. Then we train the whole model in an end-to-end manner.

For example, If you want to train our model with the DeepFashion dataset. You can use the following code.

# First train the Flow Field Estimator.
python train.py \
--name=fashion \
--model=poseflownet \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=fashion \
--dataroot=./dataset/fashion 

# Then, train the whole model in an end-to-end manner.
python train.py \
--name=fashion \
--model=pose \
--attn_layer=2,3 \
--kernel_size=2=5,3=3 \
--gpu_id=0 \
--dataset_mode=fashion \
--dataroot=./dataset/fashion \
--continue_train

The visdom is required to show the temporary results. You can access these results with:

http://localhost:8096

Other Applications

Our model can be ﬂexibly applied to tasks requiring spatial transformation. We show two examples: Image Animation and View Synthesis.

Image Animation

Given an input source image and a guidance video sequence depicting the structure movements, our model generating a video containing the speciﬁc movements. We use the real video of the FaceForensics dataset. See IMAGE_ANIMATION.md for more details.

View synthesis

View synthesis requires generating novel views of objects or scenes based on arbitrary input views. In this task, we use the car and chair categories of the ShapeNet dataset. See VIEW_SYNTHESIS.md for more details.

Citation

@article{ren2020deep,
  title={Deep Image Spatial Transformation for Person Image Generation},
  author={Ren, Yurui and Yu, Xiaoming and Chen, Junming and Li, Thomas H and Li, Ge},
  journal={arXiv preprint arXiv:2003.00696},
  year={2020}
}

Acknowledgement

We build our project base on Vid2Vid. Some dataset preprocessing methods are derived from Pose-Transfer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Global-Flow-Local-Attention

News

Installation

Requirements

Conda installation

Pre-Trained Models

Pose-Guided Person Image Generation

Dataset

Market1501

DeepFashion

Evaluation

Training

Other Applications

Image Animation

View synthesis

Citation

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
dataset		dataset
model		model
options		options
result		result
script		script
util		util
IMAGE_ANIMATION.md		IMAGE_ANIMATION.md
LICENSE.md		LICENSE.md
README.md		README.md
VIEW_SYNTHESIS.md		VIEW_SYNTHESIS.md
requirement.txt		requirement.txt
setup.sh		setup.sh
test.py		test.py
train.py		train.py

License

calculusoflambdas/Global-Flow-Local-Attention

Folders and files

Latest commit

History

Repository files navigation

Global-Flow-Local-Attention

News

Installation

Requirements

Conda installation

Pre-Trained Models

Pose-Guided Person Image Generation

Dataset

Market1501

DeepFashion

Evaluation

Training

Other Applications

Image Animation

View synthesis

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages