-
Couldn't load subscription status.
- Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.1.x
Description
Bug description
With the breaking change in the behaviour of the save_last flag in ModelCheckpoint (PR) it is now seemingly no longer possible to do a very simple and obvious thing: continue training from the last (actually last) epoch while saving the top_k checkpoints.
Am I missing an obvious flag or did you really remove this functionality? I already lost a few days of GPU time because of this.
I am filing this as a bug because I believe this is an unintended consequence of the above-mentioned change.
What version are you seeing the problem on?
v2.1
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.1.x