Skip to content

Option to save last checkpoint as copy instead of symlinking  #18995

@ad12

Description

@ad12

Description & Motivation

Saving the last.ckpt as a symlink on local file systems makes a lot of sense for most workflows. However, in a several cases, users often back up their checkpoints to cloud storage (AWS, GCP, etc.). In these scenarios, it is difficult to manage symlinks because they are often an all-or-nothing upload -- i.e. we cannot choose which symlinks to upload without being highly prescriptive on upload.

Checkpoints, especially last.ckpt, are critical for resuming runs, fine-tuning, etc. So we often want to back these up. However, when last.ckpt is a symlink, the backup process to cloud becomes much more involved.

Pitch

Add option `save_last=copy', where we save a copy of the last checkpoint

Alternatives

No response

Additional context

No response

cc @Borda @carmocca @awaelchli

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions