Skip to content

RLOpensource/IMPALA-Distributed-Tensorflow

Repository files navigation

Implementation of IMPALA with Distributed Tensorflow

Information

  • These results are from only 32 threads.
  • A total of 32 CPUs were used, 4 environments were configured for each game type, and a total of 8 games were learned.
  • Tensorflow Implementation
  • Use DQN model to inference action
  • Use distributed tensorflow to implement Actor
  • Training with 1 day
  • Same parameter of paper
start learning rate = 0.0006
end learning rate = 0
learning frame = 1e6
gradient clip norm = 40
trajectory = 20
batch size = 32
reward clipping = -1 ~ 1

Dependency

tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python

Overall Schema

Model Architecture

How to Run

  • show start.sh
  • Learning 8 types of games at a time, one of which uses 4 environments.

Result

Video

Breakout Pong Seaquest Space-Invader
Breakout Pong Seaquest Space-Invader
Boxing Star-Gunner KungFu Demon
Boxing Star-Gunner Kung-Fu Demon

Plotting

abs_one abs_one

Compare reward clipping method

Video

abs_one Pong
abs_one soft_asymmetric

Plotting

abs_one
abs_one
abs_one
soft_asymmetric
soft_asymmetric
soft_asymmetric

Is Attention Really Working?

abs_one
  • Above Blocks are ignored.
  • Ball and Bar are attentioned.
  • Empty space are attentioned because of less trained.

Todo

  • Only CPU Training method
  • Distributed tensorflow
  • Model fix for preventing collapsed
  • Reward Clipping Experiment
  • Parameter copying from global learner
  • Add Relational Reinforcement Learning
  • Add Action information to Model
  • Multi Task Learning
  • Add Recurrent Model
  • Training on GPU, Inference on CPU

Reference

  1. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
  2. deepmind/scalable_agent
  3. Asynchronous_Advatnage_Actor_Critic