Skip to content

[QEff. Finetune]: Added support to sync gradients across devices during backward step only. #477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

quic-meetkuma
Copy link
Contributor

@quic-meetkuma quic-meetkuma commented Jun 23, 2025

Disabling gradient is necessary when using gradient accumulation for > 1 with ddp enabled.
Currently, we are syncing gradient at every loss.backward() call. When using gradient accumulation, the weight update happens at every n>1 step by calling opt.step(). Only during that step the gradients across each devices should sync with each other.

with model.no_sync() --> context manager solves this issue.

Here, we are not using it but instead setting ddp_model.require_backward_grad_sync to True or False depending on which step we are.

Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Signed-off-by: Meet Patel <meetkuma@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant