Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

jeffra · 2021-07-29T22:56:12Z

Correctness fix PP+ZeRO for gradient accumulation
Cherry picked round robin grad partitioning fixes from master
Cherry picked ignore overlap/contiguous grad settings for ZeRO-1 from master

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

* Make round robin gradient partitioning configurable (default False) * Use the correct default * Log config setting

tjruwase · 2021-07-30T03:59:50Z

deepspeed/runtime/engine.py


    def allreduce_gradients(self, bucket_size=MEMORY_OPT_ALLREDUCE_SIZE):
+        # Pass (PP) gas boundary flag to optimizer (required for zero)
+        self.optimizer.is_gradient_accumulation_boundary = self.is_gradient_accumulation_boundary(


Is self.optimizer guaranteed to have is_gradient_accumulation_boundary attribute?

We traced the break to this line. Crash went away after commenting this line.

jeffra and others added 4 commits July 28, 2021 18:37

ignore overlap/contiguous_gradients if using zero 1 (#1246)

e5ecdf5

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

Make round robin gradient partitioning configurable (default False) (#…

5bb09f8

…1256)

pass GAS boundary state from PP -> ZeRO

d370f53

Use correct default for round robin gradients (#1258)

0067c88

* Make round robin gradient partitioning configurable (default False) * Use the correct default * Log config setting

jeffra requested review from RezaYazdaniAminabadi, ShadenSmith, awan-10, cli99, conglongli, eltonzheng, minjiaz, niumanar, samyam and tjruwase as code owners July 29, 2021 22:56

formatting

624303f

ShadenSmith approved these changes Jul 30, 2021

View reviewed changes

tjruwase reviewed Jul 30, 2021

View reviewed changes

jeffra merged commit f93e22b into big-science Jul 30, 2021

jeffra deleted the jeffra/big-science-patches branch July 30, 2021 22:58

ShadenSmith mentioned this pull request Aug 9, 2021

Model with ZeRO-1 converges worse than model without ZeRO-1 #1217

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

Uh oh!

jeffra commented Jul 29, 2021

Uh oh!

tjruwase Jul 30, 2021

Uh oh!

tjruwase Jul 31, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

Correctness fix PP+ZeRO for gradient accumulation + updates from master #1263

Uh oh!

Conversation

jeffra commented Jul 29, 2021

Uh oh!

tjruwase Jul 30, 2021

Choose a reason for hiding this comment

Uh oh!

tjruwase Jul 31, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants