diff --git a/README.md b/README.md index dc872618..aed0e9d9 100644 --- a/README.md +++ b/README.md @@ -166,6 +166,7 @@ torchrun --nproc_per_node=4 --master_port= train.py \ ``` Note the given training script is meant to be simple and easy to use, and is not particularly optimized. +To run on more gpus, you may prefer to turn down `gradient_accumulation_steps` to keep a global batch size of 128. Global batch size has not been tested for optimality. ### Authors All grad students below contributed equally and the order is determined by random draw.