fix: prevent second save in the end of training if last step was saved already #36219

NosimusAI · 2025-02-16T14:59:22Z

What does this PR do?

Fixes # (issue)

To resolve the issue where the model is saved twice when using save_strategy="epoch", we need to prevent the redundant save at the end of training. The save triggered by the end of the last epoch is sufficient, so we skip the final save when the strategy is set to epoch.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-02-17T15:35:40Z

cc @muellerzr @SunMarc

SunMarc

LGTM !

NosimusAI · 2025-02-18T19:07:02Z

Hi, these tests are flicking. Can we restart pipeline?

NosimusAI · 2025-02-19T08:48:59Z

@shethaadit @SunMarc can you merge? It says that 2 workflows awaiting approval, but they are completed.

SunMarc · 2025-02-19T12:07:37Z

just waiting a last review

SunMarc · 2025-02-19T12:27:02Z

Could you also try to add a test to this, that would be nice

muellerzr

Thanks for the fix, all green from me after we add a test please!

HuggingFaceDocBuilderDev · 2025-02-19T17:43:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NosimusAI · 2025-02-20T11:20:39Z

@SunMarc @muellerzr @shethaadit could you approve pls. Added unit test.

SunMarc

Thanks a lot !

Due to huggingface/transformers#36219.

* ci: try to fix test-full also use 3.12 to run full tests * fix mypy errors * hf: fix deprecated arguments in transformers.TrainingArguments * tests: adjust assertions in test_huggingface_log_model Due to huggingface/transformers#36219.

SunMarc approved these changes Feb 18, 2025

View reviewed changes

NosimusAI mentioned this pull request Feb 18, 2025

checkpoint will be saved twice at the end of training when save_strategy is epoch #36203

Closed

shethaadit approved these changes Feb 18, 2025

View reviewed changes

JaktensTid added 2 commits February 19, 2025 02:48

fix: prevent second save in the end of training

7aca13e

fix: prevent second save in the end of training

46a8f22

NosimusAI force-pushed the fix/second-save-on-steps-save branch from a03067b to 46a8f22 Compare February 18, 2025 22:48

SunMarc requested a review from muellerzr February 19, 2025 12:07

muellerzr approved these changes Feb 19, 2025

View reviewed changes

JaktensTid and others added 4 commits February 20, 2025 14:59

test: added test for no duplicate save on epoch save strategy

49210de

fix: removed TrainerControl

dfd2651

Merge branch 'main' into fix/second-save-on-steps-save

72f9900

chore: style formatting

a9ffe50

NosimusAI requested review from SunMarc, muellerzr and shethaadit February 20, 2025 15:10

shethaadit approved these changes Feb 20, 2025

View reviewed changes

SunMarc approved these changes Feb 20, 2025

View reviewed changes

SunMarc merged commit effaef3 into huggingface:main Feb 20, 2025
21 checks passed

skshetry added a commit to treeverse/dvclive that referenced this pull request Apr 29, 2025

tests: adjust assertions in test_huggingface_log_model

2c6d7b8

Due to huggingface/transformers#36219.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent second save in the end of training if last step was saved already #36219

fix: prevent second save in the end of training if last step was saved already #36219

Uh oh!

NosimusAI commented Feb 16, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Feb 17, 2025

Uh oh!

SunMarc left a comment

Uh oh!

NosimusAI commented Feb 18, 2025

Uh oh!

NosimusAI commented Feb 19, 2025 •

edited

Loading

Uh oh!

SunMarc commented Feb 19, 2025

Uh oh!

SunMarc commented Feb 19, 2025

Uh oh!

muellerzr left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 19, 2025

Uh oh!

NosimusAI commented Feb 20, 2025

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

fix: prevent second save in the end of training if last step was saved already #36219

fix: prevent second save in the end of training if last step was saved already #36219

Uh oh!

Conversation

NosimusAI commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Feb 17, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

NosimusAI commented Feb 18, 2025

Uh oh!

NosimusAI commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc commented Feb 19, 2025

Uh oh!

SunMarc commented Feb 19, 2025

Uh oh!

muellerzr left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Feb 19, 2025

Uh oh!

NosimusAI commented Feb 20, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

NosimusAI commented Feb 16, 2025 •

edited

Loading

NosimusAI commented Feb 19, 2025 •

edited

Loading