Skip to content

SE-ResNeXt Optimization #8990

Closed
Closed
@jacquesqiao

Description

@jacquesqiao

Background

project: https://github.com/PaddlePaddle/Paddle/projects/55
Profiling script:

Optimization methods and result

  1. Delete unused GPU memory during training.
  2. remove program.clone in Executor. (25% speedup) [Speed]speed up python executor in fluid #8729
  3. initialize NCCL once. (5%~6% speedup) [Speed]Avoid init_nccl for every steps. #8758
  4. use constant folding at compile time to reduce the number of calls to elementwise_mul ops at optimization time (5%~10% speedup) optimize optimizer learning rate #8873
  5. optimize elementwise related op -- use our own implementations, no longer depend on Eigen (speedup x10 for single op) [Speed] Optimize elementwise_mul_op gradient functor #8811

Status

  1. multi cards training has not been fully tested.
  2. need to profile acceleration ratio for multi cards.

Plan

Give a total profile after all the optimization is merged (@chengduoZH )

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions