-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821
Conversation
Convolution uses convention data_name_[min|max] which is object oriented and more readable.
Hey @anko-intel , Thanks for submitting the PR
CI supported jobs: [windows-gpu, unix-gpu, centos-gpu, clang, miscellaneous, unix-cpu, windows-cpu, website, sanity, centos-cpu, edge] Note: |
278905a
to
88170d0
Compare
@mxnet-bot run ci [unix-cpu, windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu, unix-cpu] |
@mxnet-bot run ci [windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
@mxnet-bot run ci [centos-cpu] |
Jenkins CI successfully triggered : [centos-cpu] |
@DominikaJedynak , @PawelGlomski-Intel please review |
@mxnet-bot run ci[all] |
Jenkins CI successfully triggered : [unix-cpu, unix-gpu, edge, windows-gpu, website, windows-cpu, centos-gpu, centos-cpu, sanity, miscellaneous, clang] |
@mxnet-bot run ci[windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
@mxnet-bot run ci[windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
Fix for fusing already fused FC + relu/activation for floating point is added. Fusing elemwise_add with FC with already fused relu/activation is blocked due to accuracy issues.
Description
This change fuses FullyConnected operator with elemwise_add after it if possible. It is done for both float and quantized path.
The change well optimize calculation on quantized graph with full quantization mode. Below are the measured results of the following command:
benchmark/python/dnnl/run.sh benchmark/python/dnnl/fc_add.py
run before and after this PR. Measurements are done on AWS EC2 instance c6i.16xlarge (Xeon(R) Platinum 8375C CPU).
elemwise_add, float
elemwise_add, mode = smart, granularity = tensor-wise
elemwise_add, mode = smart, granularity = channel-wise
elemwise_add, mode = full, granularity = tensor-wise
elemwise_add, mode = full, granularity = channel-wise
* - before this PR fuzing FC with add in full quantize mode is broken, so results are taken from the first commit (0c38ca7) of this PR which fix the issue
Checklist
Essentials