- These results are from only 32 threads.
- A total of 32 CPUs were used, 4 environments were configured for each game type, and a total of 8 games were learned.
- Tensorflow Implementation
- Use DQN model to inference action
- Use distributed tensorflow to implement Actor
- Training with 1 day
- Same parameter of paper
start learning rate = 0.0006
end learning rate = 0
learning frame = 1e6
gradient clip norm = 40
trajectory = 20
batch size = 32
reward clipping = -1 ~ 1
tensorflow==1.14.0
gym[atari]
numpy
tensorboardX
opencv-python
- show start.sh
- Learning 8 types of games at a time, one of which uses 4 environments.
Breakout | Pong | Seaquest | Space-Invader |
Boxing | Star-Gunner | Kung-Fu | Demon |
abs_one | soft_asymmetric |
abs_one |
soft_asymmetric |
- Above Blocks are ignored.
- Ball and Bar are attentioned.
- Empty space are attentioned because of less trained.
- Only CPU Training method
- Distributed tensorflow
- Model fix for preventing collapsed
- Reward Clipping Experiment
- Parameter copying from global learner
- Add Relational Reinforcement Learning
- Add Action information to Model
- Multi Task Learning
- Add Recurrent Model
- Training on GPU, Inference on CPU