Learning-from-a-Driving-Simulator

Zero-trial Model-based Imitation Learning with Partial Trajectory

Abstract

This project is an initial attempt to apply Imitation Learning (IL) to train an agent to tackle the planning problem in Autonomous Driving (AD) or, more generally, robot control problem with visual input. Specifically, on a task simplified from AD, I propose a zero-trial model-based IL algorithm to train a agent to control a ground mobile robot on a target-approaching task. This algorithm includes an action-based future image predictor, an actor-critic framework, an IL reward function, and partial trajectory, an technique that I invent to accommodate the limitations of the state-transition model in hallucinating long trajectories. I discuss my reason behind using this problem setting and the advantages of my algorithm's components under this setting. Finally, I show that, experimentally, my method can achieve comparable performance with the Behavioral Cloning (BC) agent on Dynamic Time Warping tests.

References

future-image-similarity/ is adapted from the Github repo for the paper Model-based Behavioral Cloning with Future Image Similarity Learning (MBC)
pytorch_ssim/ is adapted from the Github repo
soft_dtw.py is adapted from the Github repo

The rest of the .py files is my own work.

DDPG.py is my implementation of DDPG. It is tested in Gym's InvetedPendulum-v0, HalfCheetah-v0, and CarRacing-v0.
DDPG_SQIL.py is my core algorithm. It is implemented with the MBC action-based future image predictor, the DDPG actor-critic framework, the SQIL sparse-rewards reward function, and the Partial Trajectory technique.
DTW_test.py is the code for runing Dynamic Time Warping (DTW) tests, comparing the trajectories hallucinated in the image predictor by running DDPG_SQIL and MBC against the expert trajectories in testing set.
GazeboAPI.py is the unfinished code. I plan to use the rospy package to implement a bridge between the Gazebo simulation and my algorithm in DDPG_SQIL.py. Hyperparameter tuning of my algorithm will be done after implementing GazeboAPI.py.
MBC.py wraps MBC's critic to have the same API as DDPG_SQIL.py.
Wrapper.py is used to run DDPG.py in CarRacing-v0.
gaz_IL.py rewrites the class Gazebo custom dataset class from future-image-similarity/data/gaz_value.py to use the dataset collected in Gazebo in a more general way.
migrate_models.py loads saved models from future-image-similarity/logs and save them with whatever pytorch version you are using.
predictor_env_wrapper.py contains class DreamGazeboEnv, which wraps MBC's image predictor and the dataset loaded with gaz_IL.py as a Gym environment.

Installation

Clone this repository.
Then download the dataset from MBC here (the dataset link in from MBC is currently unavailable). Put the zip file in future-image-similarity/data and unzip there.
Install the conda environment with conda env create -f environment.yml.
Docker image for Gazebo API is not ready...

Running

Run the training by python DDPG_SQIL.py. Expect training output to show Episode x, length: 7 timesteps, reward: 0.0, moving average reward: 0.0, time used: 10.4. Episodes all have length of 7 because of my partial_traj technique, the 0 reward is the SQIL on-policy reward, and each episode takes ~10s when running on a NVIDIA GTX 1060 6GB GPU.
Check training loss plots in real time by tensorboard --logdir logs/expDDPG_SQIL.pyDreamGazebo, and open the provided link in browser. You can check/uncheck runs that you want to see at the lower left corner. Remember to click the refresh button or set automatic reloading at the upper right corner. Expect both loss plots of the current run to follow the clipgrad_partialtraj run.
Run testing by python DTW_test.py. Only the dream test mode is implemented for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning-from-a-Driving-Simulator

Zero-trial Model-based Imitation Learning with Partial Trajectory

Abstract

References

Installation

Running

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
docker-turtlebot		docker-turtlebot
future_image_similarity		future_image_similarity
logs		logs
pytorch_ssim		pytorch_ssim
.gitignore		.gitignore
DDPG.py		DDPG.py
DDPG_SQIL.py		DDPG_SQIL.py
DTW_test.py		DTW_test.py
MBC.py		MBC.py
README.md		README.md
Wrapper.py		Wrapper.py
__init__.py		__init__.py
environment.yml		environment.yml
gaz_IL.py		gaz_IL.py
gazebo_API_client.py		gazebo_API_client.py
gazebo_env_wrapper.py		gazebo_env_wrapper.py
migrate_models.py		migrate_models.py
predictor_env_wrapper.py		predictor_env_wrapper.py
soft_dtw.py		soft_dtw.py

desaixie/Learning-from-a-Driving-Simulator

Folders and files

Latest commit

History

Repository files navigation

Learning-from-a-Driving-Simulator

Zero-trial Model-based Imitation Learning with Partial Trajectory

Abstract

References

Installation

Running

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages