Multi-GPU learner #45

alex-petrenko · 2020-08-20T01:06:06Z

This is a very desirable feature, especially to push the throughput of single-agent training to 200K FPS and beyond.

Plan: use NCCL and/or Torch DistributedDataParallel.
We can spawn one learner process per GPU and then split the data equally (e.g. learner #3 gets all trajectories with index % 3 == 0).
Then we average the gradients. This will also help to parallelize the batching since there will be multiple processes doing this.

An alternative is to spawn the learner process (one per policy) and then have it spawn child processes for individual GPUs. This can be easier to implement.

To take full advantage of this, we also need to support policy workers on multiple GPUs. This requires exchanging the parameter vectors between learner and policy worker through CPU memory, rather than shared GPU memory. This can be a step 1 of the implementation.

alex-petrenko · 2020-08-20T01:07:01Z

@tushartk I assigned you for now, please let me know if you're interested in working on this. Could be a really cool feature to have.

tushartk · 2020-08-20T04:32:47Z

Sure, this looks interesting to work on .

github-actions · 2021-05-14T02:35:46Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2021-06-14T02:21:05Z

This issue is stale because it has been open for 30 days with no activity.

github-actions · 2021-06-29T02:10:07Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

alex-petrenko assigned tushartk Aug 20, 2020

github-actions bot added the stale label May 14, 2021

alex-petrenko removed the stale label May 14, 2021

alex-petrenko pinned this issue May 14, 2021

github-actions bot added the stale label Jun 14, 2021

github-actions bot closed this as completed Jun 29, 2021

KonstantinRamthun mentioned this issue Dec 8, 2021

Multiple GPU Support PG642/multi-sample-factory#26

Closed

6 tasks

alex-petrenko removed the stale label Jan 9, 2022

alex-petrenko reopened this Jan 9, 2022

alex-petrenko assigned alex-petrenko and unassigned tushartk Oct 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU learner #45

Multi-GPU learner #45

alex-petrenko commented Aug 20, 2020

alex-petrenko commented Aug 20, 2020

tushartk commented Aug 20, 2020

github-actions bot commented May 14, 2021

github-actions bot commented Jun 14, 2021

github-actions bot commented Jun 29, 2021

Multi-GPU learner #45

Multi-GPU learner #45

Comments

alex-petrenko commented Aug 20, 2020

alex-petrenko commented Aug 20, 2020

tushartk commented Aug 20, 2020

github-actions bot commented May 14, 2021

github-actions bot commented Jun 14, 2021

github-actions bot commented Jun 29, 2021