PT: keep the same checkpoint behavior as TF #3191

njzjz · 2024-01-27T22:49:16Z

Set the default save_ckpt to model.ckpt as the prefix. When saving checkpoints, model.ckpt-100.pt will be saved, and model.ckpt.pt will be symlinked to model.ckpt-100.pt. A checkpoint file will be dedicated to record model.ckpt-100.pt.

This keeps the same behavior as the TF backend. One can do the below using the PT backend just like the TF backend:

dp --pt train input.json
# one can cancel the training before it finishes
dp --pt freeze

Set the default save_ckpt to `model.ckpt` as the prefix. When saving checkpoints, `model.ckpt-100.pt` will be saved, and `model.ckpt.pt` will be symlinked to `model.ckpt-100.pt`. A `checkpoint` file will be saved to record `model.ckpt-100.pt`. This keeps the same behavior as the TF backend. Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

deepmd/common.py

+        try:
+            # remove old one
+            os.remove(new_ff)
+        except OSError:


Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

codecov · 2024-01-27T22:56:14Z

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (3e4715f) 74.27% compared to head (968ae48) 74.27%.

Files	Patch %	Lines
deepmd/pt/entrypoints/main.py	0.00%	3 Missing ⚠️
deepmd/common.py	93.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel    #3191      +/-   ##
==========================================
- Coverage   74.27%   74.27%   -0.01%     
==========================================
  Files         343      343              
  Lines       31629    31634       +5     
  Branches     1592     1592              
==========================================
+ Hits        23494    23497       +3     
- Misses       7210     7212       +2     
  Partials      925      925

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

thangckt · 2024-05-03T08:53:56Z

hi @njzjz

Can I know why you need different file extension .pth and .pt when using pytorch?

The files *.pt are generated when run

dp --pt train input.json

and the file *.pth when run

dp --pt freeze

can we just use one of these ext for convenient when collect files in dpegen?

njzjz · 2024-05-03T20:33:45Z

No control flow is saved in the checkpoint file.

github-actions bot added the Python label Jan 27, 2024

github-advanced-security bot found potential problems Jan 27, 2024

View reviewed changes

deepmd/common.py

try:

# remove old one

os.remove(new_ff)

except OSError:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.

update docs

43d271d

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

forgot to push

968ae48

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>

njzjz mentioned this pull request Jan 27, 2024

[Feature Request] Support different backends for DeePMD-kit deepmodeling/dpgen#1462

Closed

wanghan-iapcm approved these changes Jan 28, 2024

View reviewed changes

wanghan-iapcm merged commit a8168b5 into deepmodeling:devel Jan 28, 2024

njzjz mentioned this pull request Apr 2, 2024

[TYPO] #3635

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PT: keep the same checkpoint behavior as TF #3191

PT: keep the same checkpoint behavior as TF #3191

Uh oh!

njzjz commented Jan 27, 2024 •

edited

Loading

Uh oh!

Check notice

codecov bot commented Jan 27, 2024 •

edited

Loading

Uh oh!

thangckt commented May 3, 2024

Uh oh!

njzjz commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

PT: keep the same checkpoint behavior as TF #3191

PT: keep the same checkpoint behavior as TF #3191

Uh oh!

Conversation

njzjz commented Jan 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check notice

codecov bot commented Jan 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

thangckt commented May 3, 2024

Uh oh!

njzjz commented May 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njzjz commented Jan 27, 2024 •

edited

Loading

codecov bot commented Jan 27, 2024 •

edited

Loading