[Pretrain] Fix eval during pretrain #7827

DesmonDay · 2024-01-11T04:49:14Z

PR types

Bug fixes

PR changes

Others

Description

Change paddle.to_tensor to copy.deepcopy.

…nto develop

paddle-bot · 2024-01-11T04:49:19Z

Thanks for your contribution!

ZHUI · 2024-01-11T05:09:59Z

llm/run_pretrain.py

@@ -261,7 +262,7 @@ def print_dataset(data, mode="train"):
    def _collate_data(data, stack_fn=Stack()):
        tokens_ = stack_fn([x["text"] for x in data])

-        labels = tokens_[:, 1:]
+        labels = copy.deepcopy(tokens_)[:, 1:]


之前 to_tensor 的修改要回退吗？

PaddleNLP/llm/run_pretrain.py

Lines 267 to 270 in 4069f22

return {

"input_ids": paddle.to_tensor(tokens),

"labels": paddle.to_tensor(labels),

}

…nto fix_eval_during_pretrain

ZHUI

LGTM

codecov · 2024-01-11T05:30:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4069f22) 56.95% compared to head (2c6b9c8) 56.95%.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #7827   +/-   ##
========================================
  Coverage    56.95%   56.95%           
========================================
  Files          587      587           
  Lines        88628    88628           
========================================
  Hits         50480    50480           
  Misses       38148    38148

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* add unified checkpoint training args doc * fix eval during pretrain * fix

DesmonDay added 4 commits January 2, 2024 16:18

add unified checkpoint training args doc

4c0c1cb

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

89bfc31

…nto develop

fix eval during pretrain

6042b80

fix

64ca131

DesmonDay requested a review from ZHUI January 11, 2024 04:50

JunnYu previously approved these changes Jan 11, 2024

View reviewed changes

ZHUI reviewed Jan 11, 2024

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

2c6b9c8

…nto fix_eval_during_pretrain

DesmonDay dismissed JunnYu’s stale review via 2c6b9c8 January 11, 2024 05:17

DesmonDay force-pushed the fix_eval_during_pretrain branch from f640fc4 to 2c6b9c8 Compare January 11, 2024 05:17

ZHUI approved these changes Jan 11, 2024

View reviewed changes

wawltor merged commit ee4b9dd into PaddlePaddle:develop Jan 11, 2024
8 of 9 checks passed

JunnYu pushed a commit that referenced this pull request Jan 12, 2024

[Pretrain] Fix eval during pretrain (#7827)

331afb2

* add unified checkpoint training args doc * fix eval during pretrain * fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pretrain] Fix eval during pretrain #7827

[Pretrain] Fix eval during pretrain #7827

DesmonDay commented Jan 11, 2024

paddle-bot bot commented Jan 11, 2024

ZHUI Jan 11, 2024

ZHUI left a comment

codecov bot commented Jan 11, 2024 •

edited

Loading

	return {
	"input_ids": paddle.to_tensor(tokens),
	"labels": paddle.to_tensor(labels),
	}

[Pretrain] Fix eval during pretrain #7827

[Pretrain] Fix eval during pretrain #7827

Conversation

DesmonDay commented Jan 11, 2024

PR types

PR changes

Description

paddle-bot bot commented Jan 11, 2024

ZHUI Jan 11, 2024

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 11, 2024 • edited Loading

Codecov Report

codecov bot commented Jan 11, 2024 •

edited

Loading