Open
Description
Hello,
I am currently trying to apply the IMPALA algorithm to Pong-v0 environment.
I first test my own V-Trace code to Cartpole-v0 and obtain maximum score(https://github.com/kimbring2/minecraft_ai/blob/master/CartPole-v0_IMPALA.ipynb).
The problem is that when I use the same code to Pong-v0(https://github.com/kimbring2/minecraft_ai/blob/master/Pong-v0_IMPALA.ipynb), I can not obtain the score of mentioned at IMPALA paper yet.
It usually take the 1 hour to reach maximum score when I use normal A2C algorithm.
I assume I should change the parameter. Can you give me some hint about that?
Thank you
Metadata
Metadata
Assignees
Labels
No labels