Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
dgriff777 authored Feb 24, 2018
1 parent 50414e7 commit 10f19fa
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ python main.py --env Pong-v0 --workers 32
#A3C-GPU
*training using machine with 4 V100 GPUs and 20core CPU for PongDeterministic-v4 took 10 minutes to converge*

To train agent in PongDeterministic-v4 environment with 32 different worker threads with new A3C-GPU:
To train agent in PongDeterministic-v4 environment with 32 different worker threads on 4 GPUs with new A3G:

```
python main.py --env PongDeterministic-v4 --workers 32 --gpu-ids 0 1 2 3 --amsgrad True
Expand All @@ -88,7 +88,7 @@ To run a 100 episode gym evaluation with trained model
```
python gym_eval.py --env Pong-v0 --num-episodes 100
```
*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 3hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level*
*Notice BeamRiderNoFrameskip-v4 reaches scores over 50,000 in less than 2hrs of training compared to the gym v0 version this shows the difficulty of those versions but also the timelimit being a major factor in score level*

*These training charts were done on a DGX Station using 4GPUs and 20core Cpu. I used 36 worker agents and a tau of 0.92 which is the lambda in Generalized Advantage Estimation equation to introduce more variance due to the more deterministic nature of using just a 4 frame skip environment and a 0-30 NoOp start*
![BeamRider Training](https://github.com/dgriff777/rl_a3c_pytorch/blob/master/demo/Figure_2-1.png)
Expand Down

0 comments on commit 10f19fa

Please sign in to comment.