[BUG] Incorrect batch_size used for ThroughputTimer #2498
Closed
Description
Describe the bug
ThroughputTimer
is started/stopped for model steps, but in runtime/engine
it uses the micro_batch_size only:
here
Compare the above line to the following that uses the right value in runtime/pipe/engine
: here
To Reproduce
- Enable
wall_clock_breakdown
Expected behavior
This timer should use self.train_micro_batch_size_per_gpu() * self.gradient_accumulation_steps()
.
ds_report output
N/A
Screenshots
N/A
System info (please complete the following information):
- OS:
Amazon Linux 2
- GPU count and types:
8 x A100
- Interconnects (if applicable):
4 x 100 Gbps
- Python version:
3.8
- Any other relevant info about your setup:
Launcher context
Launching the experiment with the deepspeed
launcher.
Docker context
Can't share
Additional context
DeepSpeed v0.7.3
Activity