This paper will be presented in International Conference on Robotics and Automation (ICRA) 2018 (Brisbane, Australia) and appear in proceedings of IEEE Robotics and Automation Letters.
We devise an unsupervised learning algorithm that trains a Deep Convolutional Neural Network to estimate planar homographies. We compare the proposed algorithm to traditional feature-based and direct methods, as well as a corresponding supervised learning algorithm. Our empirical results demonstrate that compared to traditional approaches, the unsupervised algorithm achieves faster inference speed, while maintaining comparable or better accuracy and robustness to illumination variation. In addition, on both a synthetic dataset and representative real-world aerial dataset, our unsupervised method has superior adaptability and performance compared to the supervised deep learning method.
If you use this code for research please cite:
@InProceedings{nguyen2017unsupervised,
title={Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model},
author={Nguyen, Ty and Chen, Steven W and Shivakumar, Shreyas S and Taylor, Camillo J and Kumar, Vijay},
booktitle={RA-L},
pages={},
year={2018},
organization={IEEE}
month = " ",
year = "2018",
url = "https://arxiv.org/abs/1709.03966"
}
Building and using requires the following libraries and programs
cuda 8.0.61 (required for gpu support)
python 2.7.12
tensorflow 1.2.1 (or higher)
opencv 3.4.0 (can be installed using: pip install opencv-python )\
We built our system on ubuntu 16.04. Tensorflow (CPU) and Tensorflow (GPU) can both work well; they are installed in virtualenv. Other methods to install tensorflow have not been tested.
Install required python packages (pip is required)
source virtualenv_name/bin/activate
pip install -r requirements.txt
git clone https://github.com/tynguyen/unsupervisedDeepHomographyRAL2018.git
Download at
https://drive.google.com/drive/folders/1Y9oNgbJTrAdkgf5-T1xONtU9n2ZqwDta?usp=sharing
Then, store the synthetic_models to folder models
https://drive.google.com/drive/folders/16RI7R0EVayiXfYoP2Ahhl4yN2sWhG76Z?usp=sharing
Note: you need to format your image data in a correct size in order to make use of this trained model. Please refer to the next sections to get how to format the raw images
Download MS-COCO dataset http://cocodataset.org/#download
We use 2014/Train to generate training data and 2014/Testing to generate test set. Store them into RAW_DATA_PATH and TEST_RAW_DATA_PATH which are repositories declared in generating synthetic data.
In the file code/utils/gen_synthetic_data.py, set important parameters as follows
RHO = 45 # The maximum value of pertubation. The higher it is, the larger displacement between
# two generated images is.
DATA_NUMBER = 100000 # number of pair of synthetic images in training dataset
TEST_DATA_NUMBER = 5000 # number of pair of synthetic images in test dataset
IM_PER_REAL = 2 # Generate 2 different synthetic images from one single real image
# Size of synthetic image
HEIGHT = 240
WIDTH = 320
# Size of crop
PATCH_SIZE = 128
# Directories to MS-COCO images
RAW_DATA_PATH = "/Earthbyte/tynguyen/rawdata/train/" # Real images used for generating synthetic data
TEST_RAW_DATA_PATH = "/Earthbyte/tynguyen/rawdata/test/" # Real images used for generating test synthetic data
# Synthetic data directories
DATA_PATH = "/home/tynguyen/pose_estimation/data/synthetic/" + str(RHO) + '/'
I_DIR = DATA_PATH + 'I/' # First large image in one pair
I_PRIME_DIR = DATA_PATH + 'I_prime/' # Second large image in one pair
# Since all generated images will be stored at the same location, we need .txt files to
# maintain training images and test images
FILENAMES_FILE = os.path.join(DATA_PATH,'train_synthetic.txt') # List of training images
TEST_FILENAMES_FILE = os.path.join(DATA_PATH,'test_synthetic.txt') # List of test images
GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'gt.txt') # (In training set): ground truth of homography parameters (delta movement of 4 corners)
PTS1_FILE = os.path.join(DATA_PATH,'pts1.txt') # (in training set): path to 4 corners on the first image
TEST_PTS1_FILE = os.path.join(DATA_PATH,'test_pts1.txt') # Test set: ground truth of homography parameters (delta movement of 4 corners)
TEST_GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'test_gt.txt') # Test: path to 4 corners on the first image
It will take a few hours to generate 100.000 data samples. You can choose a smaller number of data for debugging.
python utils/gen_synthetic_data.py --mode train --num_data [number of data]
python utils/gen_synthetic_data.py --mode test
In all training and testing processes, you can visualize images using either Tensorboard or just set --visual True in calling python functions. Tensorboard is highly recommended since it does not reduce the running speed as much as plotting using --visual flag. For example
python homography_CNN_synthetic.py --mode train --lr 5e-4 --loss_type h_loss --visual True
In the file code/homography_CNN_synthetic.py, set important parameters as follows
# Size of synthetic image and the pertubation range (RH0)
HEIGHT = 240 #
WIDTH = 320
RHO = 45 # The maximum value of pertubation. The higher it is, the larger displacement between
# two generated images is. Change this value to evaluate different levels of displacements
PATCH_SIZE = 128
# Synthetic data directories
DATA_PATH = "/home/tynguyen/pose_estimation/data/synthetic/" + str(RHO) + '/'
I_DIR = DATA_PATH + 'I/' # First large image in one pair
I_PRIME_DIR = DATA_PATH + 'I_prime/' # Second large image in one pair
# Since all generated images will be stored at the same location, we need .txt files to
# maintain training images and test images
FILENAMES_FILE = os.path.join(DATA_PATH,'train_synthetic.txt') # List of training images
TEST_FILENAMES_FILE = os.path.join(DATA_PATH,'test_synthetic.txt') # List of test images
GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'gt.txt') # (In training set): ground truth of homography parameters (delta movement of 4 corners)
PTS1_FILE = os.path.join(DATA_PATH,'pts1.txt') # (in training set): path to 4 corners on the first image
TEST_PTS1_FILE = os.path.join(DATA_PATH,'test_pts1.txt') # Test set: ground truth of homography parameters (delta movement of 4 corners)
TEST_GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'test_gt.txt') # Test: path to 4 corners on the first image
# Log and model directories
MAIN_LOG_PATH = '/media/tynguyen/'
LOG_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/logs/"
MODEL_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/models/"
# Where to save visualization images (for report)
RESULTS_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/results/synthetic/report/"
# List of augmentations to the data.
AUGMENT_LIST = ['normalize'] # 'normalize': standardize images
python homography_CNN_synthetic.py --mode train --lr 5e-4 --loss_type h_loss
python homography_CNN_synthetic.py --mode train --lr 1e-4 --loss_type l1_loss
python homography_CNN_synthetic.py --mode test --lr 5e-4 --loss_type h_loss
python homography_CNN_synthetic.py --mode test --lr 1e-4 --loss_type l1_loss
Due to the company's privacy, we cannot make our aerial dataset publically available. However, there is an alternative which readers might be interested in, from: https://github.com/OpenDroneMap/OpenDroneMap/tree/master/tests/test_data
These datasets are quite similar to ours.
For the supervised method, everything should be as same as in synthetic dataset. We use aerial images to generate synthetic images to train the model.
In our aerial dataset, images are recorded in time sequence. Thus, we consider two consecutive images as a pair and generate some new pair of training samples (by randomly cropping). Each training sample consists of: a pair of (HEIGHT x WIDTH) images and a pair of corresponding crops.
As mentioned in the paper, from these original pair of images, we first resize from (FULL_HEIGHT x FULL_WIDTH) to (HEIGHT x WIDTH) then crop each pair of resized images at the same location (y,x). From each pair of original images, we generate IM_PER_REAL training samples by keeping y constant and romdomizing x (with max pertubation = RHO).
It is recommended that the resizing and cropping are highly dependent on the level of displacement between a pair of original images. Our aerial dataset features a large displacement so we have to keep y constant and make a large crop (PATCH_SIZE/WIDTH). However, there are still border effect during the warping: the warped crop of the second image has a black area near its edge. For a better performance, ones can think of moving the crop window to the largest overlapping areas in the images other than just center-cropping.
RHO = 24 # Maximum range of pertubation
DATA_NUMBER = 10000
TEST_DATA_NUMBER = 1000
IM_PER_REAL = 20 # Generate 20 different pairs of images from one single real image
# Size of synthetic image
HEIGHT = 142 #
WIDTH = 190
PATCH_SIZE = 128
FULL_HEIGHT = 480 #
FULL_WIDTH = 640
# Directories to files
RAW_DATA_PATH = "/Earthbyte/tynguyen/real_rawdata/joe_data/train/" # Real images used for generating real dataset
TEST_RAW_DATA_PATH = "/Earthbyte/tynguyen/real_rawdata/joe_data/test/" # Real images used for generating real test dataset
# Data directories
DATA_PATH = "/Earthbyte/tynguyen/docker_folder/pose_estimation/data/synthetic/" + str(RHO) + '/'
I_DIR = DATA_PATH + 'I/' # Large image 240 x 320
I_PRIME_DIR = DATA_PATH + 'I_prime/' # Large image 240 x 320
FULL_I_DIR = DATA_PATH + 'FULL_I/' # Full image size 480 x 640
FULL_I_PRIME_DIR = DATA_PATH + 'FULL_I_prime/' # Full image size 480 x 640
PTS1_FILE = os.path.join(DATA_PATH,'pts1.txt')
FILENAMES_FILE = os.path.join(DATA_PATH,'train_real.txt')
GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'gt.txt')
TEST_PTS1_FILE = os.path.join(DATA_PATH,'test_pts1.txt')
TEST_FILENAMES_FILE = os.path.join(DATA_PATH,'test_real.txt')
# In real dataset, ground truth file consists of correspondences
# Each row in the file contains 8 numbers:[corr1, corr2]
TEST_GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'test_gt.txt')
python utils/gen_real_data.py --mode train --num_data [number of training data]
python utils/gen_real_data.py --mode test --num_data [number of test data]
In the file homography_CNN_real.py, set parameters as follows
# Size of synthetic image and the pertubation range (RH0)
HEIGHT = 142 #
WIDTH = 190
RHO = 24
PATCH_SIZE = 128
# Full image size (used for displaying)
FULL_HEIGHT = 240 #
FULL_WIDTH = 320
# Data directories
DATA_PATH = "/home/tynguyen/pose_estimation/data/real/" + str(RHO) + '/'
I_DIR = DATA_PATH + 'I/' # Large image
I_PRIME_DIR = DATA_PATH + 'I_prime/' # Large image
PTS1_FILE = os.path.join(DATA_PATH,'pts1.txt')
FILENAMES_FILE = os.path.join(DATA_PATH,'train_real.txt')
GROUND_TRUTH_FILE = None # There is no ground truth during training
FULL_I_DIR = DATA_PATH + 'FULL_I/' # Large image
FULL_I_PRIME_DIR = DATA_PATH + 'FULL_I_prime/' # Large image
TEST_PTS1_FILE = os.path.join(DATA_PATH,'test_pts1.txt')
TEST_FILENAMES_FILE = os.path.join(DATA_PATH,'test_real.txt')
# Correspondences in test set
TEST_GROUND_TRUTH_FILE = os.path.join(DATA_PATH,'test_gt.txt')
# Log and model directories
MAIN_LOG_PATH = '/media/tynguyen/DATA/'
LOG_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/logs/"
# Where to load model. This could be the location of the model trained on synthetic data
# or any other dataset
LOAD_MODEL_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/models/"
# Where to save new model. This is the location of the fine-tuned model
SAVE_MODEL_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/models/real_models/"
# Where to save visualization images (for report)
RESULTS_DIR = MAIN_LOG_PATH + "docker_folder/pose_estimation/results/synthetic/report/"
# list of augmentations to the data
AUGMENT_LIST = ['normalize']
For supervised method, after generating a new set of synthetic images using the aerial dataset, change DATA_PATH in homography_CNN_synthetic.py accordingly and run
python homography_CNN_synthetic.py --mode train --lr 5e-4 --loss_type h_loss
python homography_CNN_real.py --mode train --lr 1e-4 --loss_type l1_loss
There are a couple of options during the training.
python homography_CNN_real.py --mode train --lr 1e-4 --loss_type l1_loss --finetune False
python homography_CNN_real.py --mode train --lr 1e-4 --loss_type l1_loss --finetune True
python homography_CNN_real.py --mode train --lr 1e-4 --loss_type l1_loss --resume True
python homography_CNN_real.py --mode train --lr 1e-4 --loss_type l1_loss --resume True --retrain True