Skip to content

Conversation

@JamesLim-sy
Copy link
Contributor

@JamesLim-sy JamesLim-sy commented Mar 21, 2023

PR types

Performance optimization

PR changes

APIs

Describe

  • Feature:

    1. 除掉无需的__syncthreads()操作
    2. 避免无需的cudamemcpyAsync()操作,在AF2大模型中DropoutNd OP本身是一个OP在Host端的开销问题,PR51479 已经优化了Kernel,但是没有搞定Host开销问题。本次优化主要完成Host端的优化问题,释放前一个优化PR的性能.
  • 性能:

    1. 优化后AlphaFold2模型的单卡性能提升 0.55%.

@JamesLim-sy JamesLim-sy requested a review from zhangboSJTU March 22, 2023 07:01
@JamesLim-sy JamesLim-sy changed the title Optimization for DropoutNd Optimization for DropoutNd on host Mar 22, 2023
@JamesLim-sy JamesLim-sy changed the title Optimization for DropoutNd on host Optimization for DropoutNd on Host Side Mar 22, 2023
@JamesLim-sy JamesLim-sy changed the title Optimization for DropoutNd on Host Side Optimization for DropoutNd on Host side Mar 22, 2023
Copy link
Contributor

@ZzSean ZzSean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JamesLim-sy JamesLim-sy merged commit 101c9bb into PaddlePaddle:develop Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants