Skip to content

Conversation

@Aurelius84
Copy link
Contributor

@Aurelius84 Aurelius84 commented Jun 28, 2021

PR types

Others

PR changes

Others

Describe

PR 内容

  1. 优化了 valid_vars 的Fake_vars逻辑,强制为empty的force_cpu tensor,避免引入额外 cuda memcpysyc(会阻塞kernel拉起)
  2. 新增入口函数Tensor的stop_gradient判断,减少grad_op的计算量(如conv2d_grad_op)
  3. 优化了temp_scope_var的创建逻辑,放在__init__,仅创建一次,减少foward的开销
  4. 优化了grad_var的valid_vars逻辑
  5. 移除对nn.Layer的继承,改为__call__直接调用

性能收益

模型 优化前 优化后 提升
ResNet50_bs32 302 312 3.3%
ResNet50_bs128 340 345 1.4%
ResNet152_bs32 136 138 1.4%

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@Aurelius84 Aurelius84 requested a review from zhhsplendid June 29, 2021 12:30
@Aurelius84 Aurelius84 changed the title [Dy2Stat] Refine temp_scope_vec logic [Dy2Stat] Refine PartialProgramLayer logic Jun 29, 2021
Copy link
Member

@zhhsplendid zhhsplendid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants