update

dgriff777 · Feb 24, 2018 · 10f19fa · 10f19fa
1 parent 50414e7
commit 10f19fa
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/README.MD b/README.MD
@@ -72,7 +72,7 @@ python main.py --env Pong-v0 --workers 32
 #A3C-GPU
 *training using machine with 4 V100 GPUs and 20core CPU for PongDeterministic-v4 took 10 minutes to converge*
 
-To train agent in PongDeterministic-v4 environment with 32 different worker threads with new A3C-GPU:
+To train agent in PongDeterministic-v4 environment with 32 different worker threads on 4 GPUs with new A3G:
 
 ```
 python main.py --env PongDeterministic-v4 --workers 32 --gpu-ids 0 1 2 3 --amsgrad True
@@ -88,7 +88,7 @@ To run a 100 episode gym evaluation with trained model
 ```
 python gym_eval.py --env Pong-v0 --num-episodes 100
 ```
-*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 3hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level*
+*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 2hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level*
 
 *These training charts were done on a DGX Station using 4GPUs and 20core Cpu. I used 36 worker agents and a tau of 0.92 which is the lambda in Generalized Advantage Estimation equation to introduce more variance due to the more deterministic nature of using just a 4 frame skip environment and a 0-30 NoOp start*
 ![BeamRider Training](https://github.com/dgriff777/rl_a3c_pytorch/blob/master/demo/Figure_2-1.png)