Skip to content

Conversation

@jeng1220
Copy link
Collaborator

PR types

Bug fixes

PR changes

OPs

Description

Fix #56100 and turn on fast_ln_fwd_kernel

@paddle-bot
Copy link

paddle-bot bot commented Aug 18, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added contributor External developers status: proposed labels Aug 18, 2023
}

if (WARPS_N > 1) {
__syncthreads();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sync needed? or line 302 is enough?

Copy link
Collaborator Author

@jeng1220 jeng1220 Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is necessary. Without sync, some threads are reading smem at L274, and some are writing smem at L300 in parallel.

compute-sanitizer also can point out they have read-write race.

Copy link
Contributor

@zhaoyinglia zhaoyinglia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
the unittest in #56100 is OK.

@jeng1220
Copy link
Collaborator Author

PR-CI-Coverage was failed but nothing is related to this PR.

Log:

2023-08-18 18:36:15 [ 81%] Linking CXX executable prim_op_test
2023-08-18 18:37:05 collect2: fatal error: ld terminated with signal 9 [Killed]
2023-08-18 18:37:05 compilation terminated.
2023-08-18 18:37:05 paddle/fluid/distributed/fleet_executor/test/CMakeFiles/compute_interceptor_run_op_test.dir/build.make:537: recipe for target 'paddle/fluid/distributed/fleet_executor/test/compute_interceptor_run_op_test' failed
2023-08-18 18:37:05 make[2]: *** [paddle/fluid/distributed/fleet_executor/test/compute_interceptor_run_op_test] Error 1

@jeng1220
Copy link
Collaborator Author

jeng1220 commented Aug 21, 2023

@zhiqiu

ALL CI pipelines are passed. It is ready to be merged.

@zhaoyinglia zhaoyinglia merged commit 1f987a7 into PaddlePaddle:develop Aug 21, 2023
BeingGod pushed a commit to BeingGod/Paddle that referenced this pull request Sep 9, 2023
@jeng1220 jeng1220 deleted the bugfix_github_fast_ln_fwd_race branch September 12, 2023 09:19
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers NVIDIA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fast layer norm has non-deterministic problem

5 participants