This project is an initial attempt to apply Imitation Learning (IL) to train an agent to tackle the planning problem in Autonomous Driving (AD) or, more generally, robot control problem with visual input. Specifically, on a task simplified from AD, I propose a zero-trial model-based IL algorithm to train a agent to control a ground mobile robot on a target-approaching task. This algorithm includes an action-based future image predictor, an actor-critic framework, an IL reward function, and partial trajectory, an technique that I invent to accommodate the limitations of the state-transition model in hallucinating long trajectories. I discuss my reason behind using this problem setting and the advantages of my algorithm's components under this setting. Finally, I show that, experimentally, my method can achieve comparable performance with the Behavioral Cloning (BC) agent on Dynamic Time Warping tests.
future-image-similarity/
is adapted from the Github repo for the paper Model-based Behavioral Cloning with Future Image Similarity Learning (MBC)pytorch_ssim/
is adapted from the Github reposoft_dtw.py
is adapted from the Github repo
The rest of the .py
files is my own work.
DDPG.py
is my implementation of DDPG. It is tested inGym
's InvetedPendulum-v0, HalfCheetah-v0, and CarRacing-v0.DDPG_SQIL.py
is my core algorithm. It is implemented with the MBC action-based future image predictor, the DDPG actor-critic framework, the SQIL sparse-rewards reward function, and the Partial Trajectory technique.DTW_test.py
is the code for runing Dynamic Time Warping (DTW) tests, comparing the trajectories hallucinated in the image predictor by running DDPG_SQIL and MBC against the expert trajectories in testing set.GazeboAPI.py
is the unfinished code. I plan to use therospy
package to implement a bridge between the Gazebo simulation and my algorithm inDDPG_SQIL.py
. Hyperparameter tuning of my algorithm will be done after implementingGazeboAPI.py
.MBC.py
wraps MBC's critic to have the same API asDDPG_SQIL.py
.Wrapper.py
is used to runDDPG.py
in CarRacing-v0.gaz_IL.py
rewrites theclass Gazebo
custom dataset class fromfuture-image-similarity/data/gaz_value.py
to use the dataset collected in Gazebo in a more general way.migrate_models.py
loads saved models fromfuture-image-similarity/logs
and save them with whatever pytorch version you are using.predictor_env_wrapper.py
containsclass DreamGazeboEnv
, which wraps MBC's image predictor and the dataset loaded withgaz_IL.py
as aGym
environment.
- Clone this repository.
- Then download the dataset from MBC here (the dataset link in from MBC is currently unavailable). Put the zip file in
future-image-similarity/data
and unzip there. - Install the conda environment with
conda env create -f environment.yml
. - Docker image for Gazebo API is not ready...
- Run the training by
python DDPG_SQIL.py
. Expect training output to showEpisode x, length: 7 timesteps, reward: 0.0, moving average reward: 0.0, time used: 10.4
. Episodes all have length of 7 because of mypartial_traj
technique, the 0 reward is the SQIL on-policy reward, and each episode takes ~10s when running on a NVIDIA GTX 1060 6GB GPU. - Check training loss plots in real time by
tensorboard --logdir logs/expDDPG_SQIL.pyDreamGazebo
, and open the provided link in browser. You can check/uncheck runs that you want to see at the lower left corner. Remember to click the refresh button or set automatic reloading at the upper right corner. Expect both loss plots of the current run to follow theclipgrad_partialtraj
run. - Run testing by
python DTW_test.py
. Only thedream
test mode is implemented for now.