[rllib]Misleading use of sgd_batch_size & train_batch_size in multi_gpu_optimizer

### Describe the problem

The use of `sgd_batch_size` and `train_batch_size` in `multi_gpu_optimizer.py` is misleading. As per the [discussion](https://groups.google.com/forum/#!topic/ray-dev/k34oCTCJ6os), the intended use is indeed to do a number of epochs with minibatchs (M) sampled from the training batch within each outer-loop iteration (Currently used only by the Actor-Critic style PPO implementation)

The minibatch size (through the `sgd_minibatch_size` config parameter) is supposed to be << `train_batch_size` but, in the current implementation of `localMultiGPUOptimizer`, the `train_batch_size` is more or less ignored by the optimizer interface (?) and the `sgd_minibatch_size` is used as the batch_size to perform `num_sgd_iter` number of SGD updates.
 
##### Minimal example to illustrate the misuse:

`python ray/rllib/train.py --env=PongDeterministic-v4 --run=PPO --config '{"num_workers":2, "sample_batch_size":2, "sgd_minibatch_size":16, "train_batch_size":4, "num_gpus":2}`

Note that the `"sgd_minibatch_size":16` is >> `"train_batch_size":4` but RLlib trains it with no complains and the training batch size used in an iteration is not 64. This makes it difficult to compare performance.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[rllib]Misleading use of sgd_batch_size & train_batch_size in multi_gpu_optimizer #2957

Describe the problem

Minimal example to illustrate the misuse:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[rllib]Misleading use of sgd_batch_size & train_batch_size in multi_gpu_optimizer #2957

Description

Describe the problem

Minimal example to illustrate the misuse:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions