diff --git a/README.MD b/README.MD index 031baa2..25e11a4 100644 --- a/README.MD +++ b/README.MD @@ -72,7 +72,7 @@ python main.py --env Pong-v0 --workers 32 #A3C-GPU *training using machine with 4 V100 GPUs and 20core CPU for PongDeterministic-v4 took 10 minutes to converge* -To train agent in PongDeterministic-v4 environment with 32 different worker threads with new A3C-GPU: +To train agent in PongDeterministic-v4 environment with 32 different worker threads on 4 GPUs with new A3G: ``` python main.py --env PongDeterministic-v4 --workers 32 --gpu-ids 0 1 2 3 --amsgrad True @@ -88,7 +88,7 @@ To run a 100 episode gym evaluation with trained model ``` python gym_eval.py --env Pong-v0 --num-episodes 100 ``` -*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 3hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level* +*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 2hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level* *These training charts were done on a DGX Station using 4GPUs and 20core Cpu. I used 36 worker agents and a tau of 0.92 which is the lambda in Generalized Advantage Estimation equation to introduce more variance due to the more deterministic nature of using just a 4 frame skip environment and a 0-30 NoOp start* ![BeamRider Training](https://github.com/dgriff777/rl_a3c_pytorch/blob/master/demo/Figure_2-1.png)