Skip to content

Conversation

@ShadenSmith
Copy link
Contributor

No description provided.

@ShadenSmith
Copy link
Contributor Author

This still needs FP32 and ZeRO. And unit tests :-).

self._config.gradient_accumulation_steps = new_gas

def _compute_global_grad_norm(self):
params = [p for p in self.module.parameters() if p.grad is not None]
Copy link
Collaborator

@stas00 stas00 Aug 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be a bit more efficient if tied params don't get calculated more than once? Probably, something like:

params = dict((p.data_ptr(), p) for p in self.module.parameters() if p.grad is not None).values()

or it might help to have a wrapper to do that, since this is a handy util.

But please verify that I got it right. Thanks.

@rocm-mici
Copy link

Can one of the admins verify this patch?

@jeffra
Copy link
Collaborator

jeffra commented Mar 24, 2023

This ended up being added in a later PR

@jeffra jeffra closed this Mar 24, 2023
@jeffra jeffra deleted the grad-norm-query branch March 24, 2023 03:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants