-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fit sharding optimization for auto parallel llama #8021
Fit sharding optimization for auto parallel llama #8021
Conversation
Thanks for your contribution! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8021 +/- ##
===========================================
- Coverage 56.55% 56.54% -0.01%
===========================================
Files 592 592
Lines 91036 91067 +31
===========================================
+ Hits 51484 51493 +9
- Misses 39552 39574 +22 ☔ View full report in Codecov by Sentry. |
…nto fit-sharding-optimation-for-auto-llama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有空可以也写一下中文文档,在这里 https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md?plain=1#L540-L553 先approve了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Bug fixes
PR changes
Models
Description
静半Llama组网适配sharding优化相关开关,包括:
data_parallel_config
:新增dp相关优化开关,目前支持enable_allreduce_avg_in_gradinent_scale
配置,用于使用allreduce_avg通信算子做dp通信同步。框架相关实现PR:PaddlePaddle/Paddle#61622sharding_parallel_config
:适配enable_stage1_tensor_fusion
、enable_stage1_overlap
、enable_stage2_overlap
配置,其中enable_stage1_tensor_fusion
用于开启sharding通信fusion优化,enable_stage1_overlap
和enable_stage2_overlap
用于开启sharding通信overlap优化。目前静半下通信fusion和overlap策略均只在stage2下才会起作用,stage1没有相关实现。