Skip to content

Conversation

wawltor
Copy link
Contributor

@wawltor wawltor commented Mar 26, 2021

PR types

New features

PR changes

APIs

Describe

Add the checkpoint for the pretrained model, those state will load from the checkpoint

  1. optimizer state
  2. learning_rate message
  3. random state
  4. global_step & checkpoint_time
  5. training arguments

@wawltor wawltor force-pushed the add_model_checkpoint branch from cfc86a1 to 601d2ae Compare March 26, 2021 06:35
@wawltor wawltor force-pushed the add_model_checkpoint branch from ed063b0 to 929d90d Compare March 26, 2021 06:57
Copy link
Member

@ZeyuChen ZeyuChen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need further discussion, seems not so natural.

model_to_save = model._layers if isinstance(
model, paddle.DataParallel) else model
model_to_save.save_pretrained(output_dir)
checkpoint.save_checkpoint(output_dir, global_step,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkpoint.save_checkpoint的动作语义就不太自然
应该是model.save_checkpoint比较合理,理想的语义是代码即注释。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这里的搭配用法非常奇怪,既要model_to_save, 又要checkpoint.save

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的checkpoint主要是checkpoint optimizer和learning_rate,random相关状态

@@ -0,0 +1,154 @@
# coding:utf-8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

总感觉单独的checkpoint设计,脱离Trainer的考虑不是特别自然

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的checkpoint主要是checkpoint optimizer和learning_rate,random相关状态

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants