Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Chenjia Bai, et al. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021

Introduction

This is a TensorFlow based implementation for our paper on "Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning".

Prerequisites

VDM requires python3.6, tensorflow-gpu 1.13 or 1.14, tensorflow-probability 0.6.0, openAI baselines, openAI Gym, openAI Retro

Installation and Usage

Noise mnist

The following command should train the variational dynamic model (VDM) for "noise mnist" MDP.

cd dvae_model/noise_mnist
python noise_mnist_model.py

This command will train VDM for 500 epochs. Actually, 200 epochs is enough to get good results. The weights of VDM saved in model/. Then use following command to perform the conditional generation process to reproduce the figure in our paper.

python noise_mnist_test.py

Atari games

The following command should train a pure exploration agent on "Breakout" with default experiment parameters.

python run.py --env BreakoutNoFrameskip-v4

Atari games with sticky actions

The following command should train a pure exploration agent on "sticky Breakout" with a probability of 0.25

python run.py --env BreakoutNoFrameskip-v4 --stickyAtari

Super Mario

Download the ROM of Super Mario at Google Drive, unzip it, and run the following command to import the ROM of Mario.

cd mario 
python -m retro.import .

There are several levels in Super Mario. The level is specified in function make_mario_env of wrappers.py.

The following command should train a pure exploration agent in level 1 of Super Mario with default experiment parameters.

python run.py --env mario --env_kind mario

Two-player Pong

Download the ROM of Two-player Pong at Google Drive, unzip it, and run the following command to import the ROM of Two-player Pong.

cd multi-pong 
python -m retro.import .

Running on a Real Robot (UR5)

We use VDM to train a real UR5 robot arm. We develop a robot environment based on gym.Env to provide the interface like Gym.

Setting up Camera System

Our system uses the RGB-D image taken by an Intel® RealSense™ D435 Camera. We use a lightweight C++ executable package from librealsense SDK 2.0. The camera configuration process shows as follows.

Download and install librealsense SDK 2.0
Navigate to realsense and compile realsense.cpp:
```
cd realsense
cmake .
make
```
Connect your RealSense camera with a USB 3.0 compliant cable
To start the TCP server and RGB-D streaming, run the following:
```
./realsense
```

Setting up Robot-Gym

We develop a robot environment and package it in Gym through the following command:

Download and Install OpenAI Gym
Clone user into gym/envs

Add the following code in __init__.py

 register(
     id = 'GymRobot-v1',
     entry_point='gym.envs.user:GymRobotPushEnv', 
     max_episode_steps=1000,
)

After configuring the TCP, setting the action space, and connecting the camera, you can test the environment through following command,

import gym, baseline
env = gym.make('GymRobot-v1')
obs = env.reset()
action = env.action_space.sample()
next_obs, rew, done, info = env.step(action)

Run VDM

The training code for the robot arm is slightly different from this repository because of the action type and gym wrapper. The code can be downloaded here.

The following command should train a pure exploration agent on UR5 robot arm.

python run.py --env GymRobot-v1 --env_kind GymRobot-v1

In every run, the robot starts with 3 objects placed in front of it. If either the robot completes 100 interactions or there are no objects in front of it, the objects are replaced manually. We save the model every 1000 interactions.

We use Self-Supervised Exploration via Disagreement, ICML 2019 as a baseline. The official code has been slightly modified to run on our robot arm.

Baselines

ICM: We use the official code of "Curiosity-driven Exploration by Self-supervised Prediction, ICML 2017" and "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
RFM: We use the official code of "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
Disagreement: We use the official code of "Self-Supervised Exploration via Disagreement, ICML 2019".

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dvae_model		dvae_model
README.md		README.md
__init__.py		__init__.py
auxiliary_tasks.py		auxiliary_tasks.py
cnn_policy.py		cnn_policy.py
cppo_agent.py		cppo_agent.py
dvae.py		dvae.py
mpi_utils.py		mpi_utils.py
play.py		play.py
recorder.py		recorder.py
rollouts.py		rollouts.py
run.py		run.py
utils.py		utils.py
vec_env.py		vec_env.py
wrappers.py		wrappers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Introduction

Prerequisites

Installation and Usage

Noise mnist

Atari games

Atari games with sticky actions

Super Mario

Two-player Pong

Running on a Real Robot (UR5)

Setting up Camera System

Setting up Robot-Gym

Run VDM

Baselines

About

Releases

Packages

Languages

Baichenjia/VDM

Folders and files

Latest commit

History

Repository files navigation

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Introduction

Prerequisites

Installation and Usage

Noise mnist

Atari games

Atari games with sticky actions

Super Mario

Two-player Pong

Running on a Real Robot (UR5)

Setting up Camera System

Setting up Robot-Gym

Run VDM

Baselines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages