-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoParallel] add pipeline.auto_parallel_profiler to auto_config #7343
Merged
From00
merged 22 commits into
PaddlePaddle:develop
from
AndSonder:auto_parallel_profiler
Dec 15, 2023
Merged
Changes from 9 commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
0d7e0f2
update
AndSonder fd28c62
update
AndSonder b312762
Fix init weight for llama modeling auto
From00 e2420f0
update
AndSonder ab16499
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
AndSonder 9bc05a9
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
AndSonder 718a27b
Merge branch 'fix-init-weight-for-lla-modeling-auto' of https://githu…
AndSonder d9f7523
add support for Llama2
AndSonder 4600c8d
recover codes
AndSonder f82ae55
remove training_args
AndSonder 950843b
fix
AndSonder 9b5c9e6
remove test config
AndSonder 96ea233
use guard
AndSonder 2d9e784
Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…
AndSonder 40506d4
change var name
AndSonder ea44d2a
add import
AndSonder 66cdde7
Merge branch 'develop' into auto_parallel_profiler
AndSonder 9b240f4
merge from develop
AndSonder 3e28ccb
merge from remote
AndSonder 4e3619c
fix import error
AndSonder 7dbd960
fix import error in run_pretrain_auto.py
AndSonder c4efefe
Merge branch 'develop' into auto_parallel_profiler
AndSonder File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -325,6 +325,10 @@ | |
Whether skip profile timer, timer will record time usage of forward/ backward/ step, etc. | ||
distributed_dataloader (`bool`, *optional*): | ||
Whether to use distributed dataloader. Default is `False`. | ||
job_schedule_profiler_start (`int`, *optional*): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 命令行参数可以复用 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
好像不行,pipeline_parallel_config 里面只能放布尔值 |
||
The start step of job schedule profiler. Default is `-1`. | ||
job_schedule_profiler_end (`int`, *optional*): | ||
The end step of job schedule profiler. Default is `-1`. | ||
""" | ||
|
||
output_dir: str = field( | ||
|
@@ -706,6 +710,16 @@ | |
metadata={"help": "Whether to unify hybrid parallel checkpoint."}, | ||
) | ||
|
||
job_schedule_profiler_start: Optional[int] = field( | ||
default=-1, | ||
metadata={"help": "The start step of job schedule profiler."}, | ||
) | ||
|
||
job_schedule_profiler_end: Optional[int] = field( | ||
default=-1, | ||
metadata={"help": "The end step of job schedule profiler."}, | ||
) | ||
|
||
def __post_init__(self): | ||
env_local_rank = int(os.environ.get("PADDLE_RANK_IN_NODE", -1)) | ||
if env_local_rank != -1 and env_local_rank != self.local_rank and paddle.distributed.get_world_size() > 1: | ||
|
@@ -1085,6 +1099,8 @@ | |
pipeline.accumulate_steps = self.gradient_accumulation_steps | ||
pipeline.micro_batch_size = self.per_device_train_batch_size | ||
pipeline.schedule_mode = "1F1B" | ||
pipeline.job_schedule_profiler_start = self.job_schedule_profiler_start | ||
pipeline.job_schedule_profiler_end = self.job_schedule_profiler_end | ||
|
||
if self.amp_master_grad: | ||
warnings.warn("`amp_master_grad` is not supported NOW in AutoParallel!") | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里可否写成guard的形式,类似
nvprof_guard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我有考虑过实现成 nvprof_guard 的方式,但是 nvprof_guard 的实现里面是直接调用的 c++ api,因为它只需要做push和pop即可,但是我们需要在开启的step之后通过改变传入参数的方式去启动 profiler,感觉 nvprof 的方式就不太适用了