Skip to content

[rllib]Misleading use of sgd_batch_size & train_batch_size in multi_gpu_optimizer #2957

@praveen-palanisamy

Description

@praveen-palanisamy

Describe the problem

The use of sgd_batch_size and train_batch_size in multi_gpu_optimizer.py is misleading. As per the discussion, the intended use is indeed to do a number of epochs with minibatchs (M) sampled from the training batch within each outer-loop iteration (Currently used only by the Actor-Critic style PPO implementation)

The minibatch size (through the sgd_minibatch_size config parameter) is supposed to be << train_batch_size but, in the current implementation of localMultiGPUOptimizer, the train_batch_size is more or less ignored by the optimizer interface (?) and the sgd_minibatch_size is used as the batch_size to perform num_sgd_iter number of SGD updates.

Minimal example to illustrate the misuse:

python ray/rllib/train.py --env=PongDeterministic-v4 --run=PPO --config '{"num_workers":2, "sample_batch_size":2, "sgd_minibatch_size":16, "train_batch_size":4, "num_gpus":2}

Note that the "sgd_minibatch_size":16 is >> "train_batch_size":4 but RLlib trains it with no complains and the training batch size used in an iteration is not 64. This makes it difficult to compare performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleThe issue is stale. It will be closed within 7 days unless there are further conversation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions