Chenjia Bai, et al. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
This is a TensorFlow based implementation for our paper on "Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning".
VDM requires python3.6, tensorflow-gpu 1.13 or 1.14, tensorflow-probability 0.6.0, openAI baselines, openAI Gym, openAI Retro
The following command should train the variational dynamic model (VDM) for "noise mnist" MDP.
cd dvae_model/noise_mnist
python noise_mnist_model.py
This command will train VDM for 500 epochs. Actually, 200 epochs is enough to get good results. The weights of VDM saved in model/
. Then use following command to perform the conditional generation process to reproduce the figure in our paper.
python noise_mnist_test.py
The following command should train a pure exploration agent on "Breakout" with default experiment parameters.
python run.py --env BreakoutNoFrameskip-v4
The following command should train a pure exploration agent on "sticky Breakout" with a probability of 0.25
python run.py --env BreakoutNoFrameskip-v4 --stickyAtari
Download the ROM of Super Mario at Google Drive, unzip it, and run the following command to import the ROM of Mario.
cd mario
python -m retro.import .
There are several levels in Super Mario. The level is
specified in function make_mario_env
of wrappers.py
.
The following command should train a pure exploration agent in level 1 of Super Mario with default experiment parameters.
python run.py --env mario --env_kind mario
Download the ROM of Two-player Pong at Google Drive, unzip it, and run the following command to import the ROM of Two-player Pong.
cd multi-pong
python -m retro.import .
We use VDM to train a real UR5 robot arm.
We develop a robot environment
based on gym.Env
to provide the interface like Gym.
Our system uses the RGB-D image taken by an Intel® RealSense™ D435 Camera. We use a lightweight C++ executable package from librealsense SDK 2.0. The camera configuration process shows as follows.
-
Download and install librealsense SDK 2.0
-
Navigate to
realsense
and compilerealsense.cpp
:cd realsense cmake . make
-
Connect your RealSense camera with a USB 3.0 compliant cable
-
To start the TCP server and RGB-D streaming, run the following:
./realsense
We develop a robot environment and package it in Gym through the following command:
-
Download and Install OpenAI Gym
-
Clone user into
gym/envs
-
Add the following code in
__init__.py
register( id = 'GymRobot-v1', entry_point='gym.envs.user:GymRobotPushEnv', max_episode_steps=1000, )
-
After configuring the TCP, setting the action space, and connecting the camera, you can test the environment through following command,
import gym, baseline env = gym.make('GymRobot-v1') obs = env.reset() action = env.action_space.sample() next_obs, rew, done, info = env.step(action)
The training code for the robot arm is slightly different from this repository because of the action type and gym wrapper. The code can be downloaded here.
The following command should train a pure exploration agent on UR5 robot arm.
python run.py --env GymRobot-v1 --env_kind GymRobot-v1
In every run, the robot starts with 3 objects placed in front of it. If either the robot completes 100 interactions or there are no objects in front of it, the objects are replaced manually. We save the model every 1000 interactions.
We use Self-Supervised Exploration via Disagreement, ICML 2019 as a baseline. The official code has been slightly modified to run on our robot arm.
- ICM: We use the official code of "Curiosity-driven Exploration by Self-supervised Prediction, ICML 2017" and "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
- RFM: We use the official code of "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
- Disagreement: We use the official code of "Self-Supervised Exploration via Disagreement, ICML 2019".