[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

quic-meetkuma · 2025-06-23T10:30:20Z

Disabling gradient is necessary when using gradient accumulation for > 1 with ddp enabled.
Currently, we are syncing gradient at every loss.backward() call. When using gradient accumulation, the weight update happens at every n>1 step by calling opt.step(). Only during that step the gradients across each devices should sync with each other.

with model.no_sync() --> context manager solves this issue.

Here, we are not using it but instead setting ddp_model.require_backward_grad_sync to True or False depending on which step we are.

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>

quic-meetkuma force-pushed the no_sync branch from 6b36ea7 to f5a350a Compare June 27, 2025 08:45

quic-meetkuma marked this pull request as ready for review June 27, 2025 08:50

quic-meetkuma requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners June 27, 2025 08:50

quic-meetkuma requested review from vbaddi, quic-mamta, quic-akuruvil and quic-swatia July 1, 2025 08:48

quic-meetkuma added 2 commits July 3, 2025 10:52

Updated training code to sync gradients only during backward step.

b21602c

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>

Fixed minor argument error.

e2a1d0b

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>

quic-meetkuma force-pushed the no_sync branch from f5a350a to e2a1d0b Compare July 3, 2025 05:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

quic-meetkuma commented Jun 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

Are you sure you want to change the base?

[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

Conversation

quic-meetkuma commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

quic-meetkuma commented Jun 23, 2025 •

edited

Loading