This repository is for our CoRL 2019 paper:
Alan Wu, AJ Piergiovanni, and Michael S. Ryoo
"Model-based Behavioral Cloning with Future Image Similarity Learning"
in CoRL 2019
If you find this repository useful for your research, please cite our paper:
@inproceedings{wu2019fisl,
title={Model-based Behavioral Cloning with Future Image Similarity Learning},
booktitle={Conference on Robot Learning (CoRL)},
author={Alan Wu, AJ Piergiovanni, and Michael S. Ryoo},
year={2019}
}
We present a visual imitation learning framework that enables learning of robot action policies solely based on expert samples without any robot trials. Robot exploration and on-policy trials in a real-world environment could often be expensive or dangerous. We present a new approach to address this problem by learning a future scene prediction model solely on a collection of expert trajectories consisting of unlabeled example videos and actions, and by enabling generalized action cloning using future image similarity. The robot learns to visually predict the consequences of taking an action, and obtains the policy by evaluating how similar the predicted future image is to an expert image. We develop a stochastic action-conditioned convolutional autoencoder, and present how we take advantage of future images for robot learning. We conduct experiments in simulated and real-life environments using a ground mobility robot with and without obstacles, and compare our models to multiple baseline methods.
Here is a sample of training videos from a real office environment with various targets:
And here is a sample of training videos from a simulated environment (Gazebo) with various obstacles:
Sample training data can be found in the folders /dataset/office_real and /dataset/gazebo_sim. The entire dataset can be downloaded by clicking the link here: Dataset. We use images of size 64x64.
Here is an illustration of the stochastic image predictor model. This model takes input of the current image and action, but also learns to generate a prior, zt, which varies based on the input sequence. This is further concatenated with the representation before future image prediction. The use of the prior allows for better modeling in stochastic environments and generates clearer images.
Predicted future images in the real-life lab (top) and simulation (bottom) environments taking different actions. Top two rows of each environment: deterministic model with linear and convolutional state representation, respectively. Bottom two rows: stochastic model with linear and convolutional state representation, respectively. Center image of each row is current image with each adjacent image to the left turning -5° and to the right turning +5°.
Sample predicted images from the real and simulation datasets. From left to right: current image; true next image; deterministic linear; deterministic convolutional; stochastic linear; stochastic convolutional.
High level description of action taken for each row starting from the top: turn right; move forward; move forward slightly; move forward and turn left; move forward and turn left.
High level description of action taken for each row starting from the top: moveforward and turn right; turn right slightly; turn right; move forward slightly; turn left slightly.
Using the stochastic future image predictor, we can generate realistic images to train a critic V_hat that helps select the optimal action:
We verified our future image prediction model and critic model in real life and in simulation environments. Here are some example trajectories from the real-life robot experiments comparing to baselines (Clone, Handcrafted Critic, and Forward Consistency). Our method is labeled as Critic-FutSim-Stoch. The red ‘X’ marks the location of the target object and the blue ‘∗’ marks the end of each robot trajectory.
Our code has been tested on Ubuntu 16.04 using python 3.5, PyTorch version 0.3.0 with a Titan X GPU.