Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fast_rmsnorm #8680

Merged
merged 5 commits into from
Jul 4, 2024
Merged

add fast_rmsnorm #8680

merged 5 commits into from
Jul 4, 2024

Conversation

deepllz
Copy link
Contributor

@deepllz deepllz commented Jun 28, 2024

PR types

Performance optimization

PR changes

Others

Description

基于fast_ln,支持了fast_rms_norm。
对性能的影响:
使得rms_norm算子速度提升了1倍,模型吞吐如下:

模型 并行策略 pr前吞吐 pr后吞吐
Llama-2 7B gbs8, sharding8-mbs1-acc1 4454.693 4490.384
Llama-2 13B gbs8, pp4sharding2-vpp5-mbs1-acc4 2229.921 2252.541

对精度的影响:
修改前后保证了fast_ln的结果不变:
具体测试是打印了此算子前向和反向的md5sum值,结果不变,具体如下:
218893447940b45c822ff7d49832b50b
PR前的结果:
098c78b8100bb4f62f01c983746e4564

fast_rms_norm和fused_rms_norm无法做到诸位对齐。但不影响收敛,收敛的验证是通过TE来验证的,TE中用的就是fast_rms_norm,已知bf16精度的情况下,开关TE不影响收敛。
具体的精度测试结果如下:
8eb72b6772b2050055a2e963a75d1fda
image
可以看到,前向反向的md5sum值对不上,tensor值不完全相同,从diff上看,两边值几乎相同,对于shape=[10, 4096]的输出tensor,通过print(paddle.nonzero(output1 - output2)),可以看到有462个元素的值结果不同,占比1.1%,元素在1e-4精度有diff。反向亦如此

端到端影响:
控制相同输入和参数初始化
d0cea16953536549dbd904fdd2689c24
1303be85e1cf8dc5fcd50c3ddfa349eb
只看第一个loss的话,绝对误差1e-3,相对误差在1e-5

Copy link

paddle-bot bot commented Jun 28, 2024

Thanks for your contribution!

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

Attention: Patch coverage is 22.22222% with 7 lines in your changes missing coverage. Please review.

Project coverage is 55.74%. Comparing base (c574d6d) to head (0a7af50).
Report is 222 commits behind head on develop.

Files with missing lines Patch % Lines
paddlenlp/transformers/llama/fusion_ops.py 25.00% 6 Missing ⚠️
paddlenlp/transformers/llama/modeling.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8680   +/-   ##
========================================
  Coverage    55.74%   55.74%           
========================================
  Files          623      623           
  Lines        97454    97457    +3     
========================================
+ Hits         54323    54331    +8     
+ Misses       43131    43126    -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ZHUI
Copy link
Collaborator

ZHUI commented Jul 1, 2024

测试精度的结果,PR里面展示一下吧。

@deepllz deepllz closed this Jul 1, 2024
@deepllz deepllz reopened this Jul 1, 2024
@DesmonDay DesmonDay self-requested a review July 3, 2024 04:57
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZHUI ZHUI merged commit fd01043 into PaddlePaddle:develop Jul 4, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants