Reproducing Atari FPS #280

djbyrne · 2023-07-28T08:00:05Z

Hi guys, really liking the repo and found the paper very insightful! Very excited to see the potential of single node RL experimentation 😄

I am trying to reproduce the throughput shown in the paper, ~45k for System 1 and ~130k for System 2, However I am currently platauing at ~20k on a machine that surpasses system 2.

Would it be possible to share the optimal config for reproducing the max throughput?

Thanks so much,

Donal

alex-petrenko · 2023-07-30T07:09:18Z

Hi @djbyrne !

First of all, see this section in the documentation: https://www.samplefactory.dev/09-environment-integrations/vizdoom/#reproducing-paper-results

It's on VizDoom but I bet you can use similar configurations to reach very high throughput.
Specifically, the last one:

python -m sf_examples.vizdoom.train_vizdoom --env=doom_benchmark --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=72 --num_envs_per_worker=24 --num_policies=1 --batch_size=8192 --wide_aspect_ratio=False --experiment=doom_battle_appo_w72_v24 --policy_workers_per_policy=2

Replace Doom-related params with Atari, and you should be good to go.

The most important parameters for throughput:

num_workers: this should ideally be the same as number of logical CPUs on your machine

num_envs_per_worker: usually in 10-20 range, but if you see below 100% CPU utilization, increase a bit more?

worker_num_splits=2 to enable double buffering

alex-petrenko · 2023-07-30T07:10:31Z

You would also need to increase the batch size to accommodate so much data. Start in 2048-4096 range and go from here.

alex-petrenko · 2023-07-30T07:15:40Z

That said, there's actually a better way to work with Atari: https://www.samplefactory.dev/09-environment-integrations/envpool/

Envpool is a C++ vectorized env runner that supports atari and some other envs. It is even faster than running many envs in Python multiprocessing.
You need very different parameters for envpool, because it's essentially one very big vectorized environment, rather than hundreds of individual envs.

Here's my guess:

num_workers: 1-4?
num_envs_per_worker 1 or 2 if you use double buffering
worker_num_splits 1 or 2 for double buffering
env_agents=64 - how many params in a vector we have... I'm not sure what it should be, try as many as you have CPU cores and go from there!

djbyrne · 2023-07-30T12:29:48Z

Hey @alex-petrenko thank you for the insight! Apologies, I did not think to look at the other environments for this config 🙈

I will run with the what you have given above 😄

Yes, I have worked with envpool before, this is what I will try next. Have you done a benchmark comparison between envpool and standard atari on SampleFactory yet? I would imagine it gets similar speed up seen in the Sebulba PodRacer architecture, as it is also using a C++ based implementation for vectorising the environments.

alex-petrenko · 2023-07-30T16:38:18Z

I haven't done comparisons really, but I know Costa did.
He has some implementations here that maybe you can harvest for parameters https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py

There's also some info in their paper and repo: https://arxiv.org/abs/2206.10558
https://github.com/vwxyzjn/envpool-cleanrl

My guess is that you should be able to get 100+K easily with or without envpool, because you're probably going to be bottlenecked by the convnet backprop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing Atari FPS #280

Reproducing Atari FPS #280

djbyrne commented Jul 28, 2023

alex-petrenko commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023

djbyrne commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023

Reproducing Atari FPS #280

Reproducing Atari FPS #280

Comments

djbyrne commented Jul 28, 2023

alex-petrenko commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023

djbyrne commented Jul 30, 2023

alex-petrenko commented Jul 30, 2023