Blas optimized elementwise_add forward and backward passes #10913

tpatejko · 2018-05-24T13:18:12Z

This PR implements optimization of elementwse_add forward and backward passes.
It includes for forward pass:

MKL VML-based optimization with v?Add then MKL/MKLDNN are used
Blas-based optimization with VCopy and SAXPY operations when MKL is disabled

For backward pass:

Blas level 1 VCopy is used for copying dx and dy vectors.

When integral or float16 types, or GPU device are used, the implementation falls back to the default (generic) elementwise_add operation.

…is used

…float16 and/or GPU

…fall back to default impl

tpatejko · 2018-05-24T21:23:01Z

This PR implements the following issue #10786.

luotao1

LGTM！Thanks for this speedup, I test it on OCR CRNN_CTC model, the total elapsed time (repeat 100 times of model) of elementwise_add op is from 467ms to 428ms.

tpatejko · 2018-05-25T07:42:25Z

@luotao1 Thanks for this information. Does the model converge?

luotao1 · 2018-05-25T07:50:08Z

@tpatejko I only test the inference speed, but in our unit-tests, https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/tests/book will test the model converge.

Tomasz Patejko added 7 commits May 24, 2018 15:16

MKL elementwise add: elementwise_add uses vAdd VML function when MKL …

e43c8f3

…is used

MKL elementwise_add: BLAS version compiles with integral types

6f93248

MKL elementwise add: default implementation used for integral types, …

01fb2be

…float16 and/or GPU

MKL elementwise add backward: Initial implementation with vector copy

5a622c2

MKL optimized elementwise add backward: coding style fixes

996d12f

MKL elementwise add backward: grad inputs copied when they are not null

fde47aa

MKL elementwise add backward: backward works for integral types with …

9241011

…fall back to default impl

tpatejko requested review from luotao1 and tensor-tang May 24, 2018 13:18

tpatejko added the Intel label May 24, 2018

MKL optimized elementwise add: fix style check

3e876b3

luotao1 approved these changes May 25, 2018

View reviewed changes

luotao1 merged commit bab1196 into PaddlePaddle:develop May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blas optimized elementwise_add forward and backward passes #10913

Blas optimized elementwise_add forward and backward passes #10913

Uh oh!

tpatejko commented May 24, 2018

Uh oh!

tpatejko commented May 24, 2018

Uh oh!

luotao1 left a comment

Uh oh!

tpatejko commented May 25, 2018

Uh oh!

luotao1 commented May 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Blas optimized elementwise_add forward and backward passes #10913

Blas optimized elementwise_add forward and backward passes #10913

Uh oh!

Conversation

tpatejko commented May 24, 2018

Uh oh!

tpatejko commented May 24, 2018

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

tpatejko commented May 25, 2018

Uh oh!

luotao1 commented May 25, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants