-
Couldn't load subscription status.
- Fork 6.8k
Description
Describe the problem
The use of sgd_batch_size and train_batch_size in multi_gpu_optimizer.py is misleading. As per the discussion, the intended use is indeed to do a number of epochs with minibatchs (M) sampled from the training batch within each outer-loop iteration (Currently used only by the Actor-Critic style PPO implementation)
The minibatch size (through the sgd_minibatch_size config parameter) is supposed to be << train_batch_size but, in the current implementation of localMultiGPUOptimizer, the train_batch_size is more or less ignored by the optimizer interface (?) and the sgd_minibatch_size is used as the batch_size to perform num_sgd_iter number of SGD updates.
Minimal example to illustrate the misuse:
python ray/rllib/train.py --env=PongDeterministic-v4 --run=PPO --config '{"num_workers":2, "sample_batch_size":2, "sgd_minibatch_size":16, "train_batch_size":4, "num_gpus":2}
Note that the "sgd_minibatch_size":16 is >> "train_batch_size":4 but RLlib trains it with no complains and the training batch size used in an iteration is not 64. This makes it difficult to compare performance.