Skip to content
/ VDM Public

Code for "Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning"

Notifications You must be signed in to change notification settings

Baichenjia/VDM

Repository files navigation

Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Chenjia Bai, et al. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021

Website

Introduction

This is a TensorFlow based implementation for our paper on "Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning".

Prerequisites

VDM requires python3.6, tensorflow-gpu 1.13 or 1.14, tensorflow-probability 0.6.0, openAI baselines, openAI Gym, openAI Retro

Installation and Usage

Noise mnist

The following command should train the variational dynamic model (VDM) for "noise mnist" MDP.

cd dvae_model/noise_mnist
python noise_mnist_model.py

This command will train VDM for 500 epochs. Actually, 200 epochs is enough to get good results. The weights of VDM saved in model/. Then use following command to perform the conditional generation process to reproduce the figure in our paper.

python noise_mnist_test.py

Atari games

The following command should train a pure exploration agent on "Breakout" with default experiment parameters.

python run.py --env BreakoutNoFrameskip-v4

Atari games with sticky actions

The following command should train a pure exploration agent on "sticky Breakout" with a probability of 0.25

python run.py --env BreakoutNoFrameskip-v4 --stickyAtari

Super Mario

Download the ROM of Super Mario at Google Drive, unzip it, and run the following command to import the ROM of Mario.

cd mario 
python -m retro.import .

There are several levels in Super Mario. The level is specified in function make_mario_env of wrappers.py.

The following command should train a pure exploration agent in level 1 of Super Mario with default experiment parameters.

python run.py --env mario --env_kind mario

Two-player Pong

Download the ROM of Two-player Pong at Google Drive, unzip it, and run the following command to import the ROM of Two-player Pong.

cd multi-pong 
python -m retro.import .

Running on a Real Robot (UR5)

We use VDM to train a real UR5 robot arm. We develop a robot environment based on gym.Env to provide the interface like Gym.

Setting up Camera System

Our system uses the RGB-D image taken by an Intel® RealSense™ D435 Camera. We use a lightweight C++ executable package from librealsense SDK 2.0. The camera configuration process shows as follows.

  1. Download and install librealsense SDK 2.0

  2. Navigate to realsense and compile realsense.cpp:

    cd realsense
    cmake .
    make
  3. Connect your RealSense camera with a USB 3.0 compliant cable

  4. To start the TCP server and RGB-D streaming, run the following:

    ./realsense

Setting up Robot-Gym

We develop a robot environment and package it in Gym through the following command:

  1. Download and Install OpenAI Gym

  2. Clone user into gym/envs

  3. Add the following code in __init__.py

     register(
         id = 'GymRobot-v1',
         entry_point='gym.envs.user:GymRobotPushEnv', 
         max_episode_steps=1000,
    )
  4. After configuring the TCP, setting the action space, and connecting the camera, you can test the environment through following command,

    import gym, baseline
    env = gym.make('GymRobot-v1')
    obs = env.reset()
    action = env.action_space.sample()
    next_obs, rew, done, info = env.step(action)

Run VDM

The training code for the robot arm is slightly different from this repository because of the action type and gym wrapper. The code can be downloaded here.

The following command should train a pure exploration agent on UR5 robot arm.

python run.py --env GymRobot-v1 --env_kind GymRobot-v1

In every run, the robot starts with 3 objects placed in front of it. If either the robot completes 100 interactions or there are no objects in front of it, the objects are replaced manually. We save the model every 1000 interactions.

We use Self-Supervised Exploration via Disagreement, ICML 2019 as a baseline. The official code has been slightly modified to run on our robot arm.

Baselines

  • ICM: We use the official code of "Curiosity-driven Exploration by Self-supervised Prediction, ICML 2017" and "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
  • RFM: We use the official code of "Large-Scale Study of Curiosity-Driven Learning, ICLR 2019".
  • Disagreement: We use the official code of "Self-Supervised Exploration via Disagreement, ICML 2019".

About

Code for "Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages