-
-
Notifications
You must be signed in to change notification settings - Fork 259
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
requires_grad is set up before the parameters are created:
| self.__setup_requires_grad(model, config) |
This means requires_grad is set to True even for frozen (filtered) parameters for the first training step. They aren't trained, but gradients are created and as much vram is used as would be necessary without "fused backpass"
What did you expect would happen?
Relevant log output
Generate and upload debug_report.log
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working