Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

Merged
merged 6 commits into from
Jan 28, 2022

Conversation

anko-intel
Copy link
Contributor

@anko-intel anko-intel commented Jan 14, 2022

Description

This change fuses FullyConnected operator with elemwise_add after it if possible. It is done for both float and quantized path.

The change well optimize calculation on quantized graph with full quantization mode. Below are the measured results of the following command:
benchmark/python/dnnl/run.sh benchmark/python/dnnl/fc_add.py
run before and after this PR. Measurements are done on AWS EC2 instance c6i.16xlarge (Xeon(R) Platinum 8375C CPU).

elemwise_add, float

Shape Hidden Before [ms] After [ms] Improvement
( 1, 224) 512 0.165 0.151 8%
( 1, 224) 4096 0.169 0.150 11%
( 16,1024) 1024 0.274 0.245 11%
( 32,4096) 1024 0.634 0.611 4%
( 32,4096) 4096 2.352 2.299 2%
( 512, 512) 4096 1.517 1.414 7%

elemwise_add, mode = smart, granularity = tensor-wise

Shape Hidden Before [ms] After [ms] Improvement
( 1, 224) 512 0.182 0.173 5%
( 1, 224) 4096 0.184 0.169 8%
( 16,1024) 1024 0.246 0.235 4%
( 32,4096) 1024 0.328 0.317 3%
( 32,4096) 4096 0.573 0.571 0%
( 512, 512) 4096 0.819 0.730 11%

elemwise_add, mode = smart, granularity = channel-wise

Shape Hidden Before [ms] After [ms] Improvement
( 1, 224) 512 0.164 0.138 16%
( 1, 224) 4096 0.152 0.143 6%
( 16,1024) 1024 0.213 0.199 7%
( 32,4096) 1024 0.300 0.285 5%
( 32,4096) 4096 0.545 0.542 1%
( 512, 512) 4096 0.778 0.689 11%

elemwise_add, mode = full, granularity = tensor-wise

Shape Hidden Before* [ms] After [ms] Improvement
( 1, 224) 512 0.169 0.154 9%
( 1, 224) 4096 0.208 0.158 24%
( 16,1024) 1024 0.270 0.212 21%
( 32,4096) 1024 0.359 0.293 18%
( 32,4096) 4096 0.602 0.542 10%
( 512, 512) 4096 0.873 0.652 25%

elemwise_add, mode = full, granularity = channel-wise

Shape Hidden Before* [ms] After [ms] Improvement
( 1, 224) 512 0.135 0.125 7%
( 1, 224) 4096 0.178 0.130 27%
( 16,1024) 1024 0.239 0.180 25%
( 32,4096) 1024 0.327 0.262 20%
( 32,4096) 4096 0.575 0.512 11%
( 512, 512) 4096 0.852 0.625 27%

* - before this PR fuzing FC with add in full quantize mode is broken, so results are taken from the first commit (0c38ca7) of this PR which fix the issue

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Convolution uses convention data_name_[min|max] which is object
oriented and more readable.
@mxnet-bot
Copy link

Hey @anko-intel , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-gpu, unix-gpu, centos-gpu, clang, miscellaneous, unix-cpu, windows-cpu, website, sanity, centos-cpu, edge]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jan 14, 2022
@anko-intel anko-intel changed the title [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN [WIP] [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN Jan 14, 2022
@anko-intel
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, unix-cpu]

@anko-intel
Copy link
Contributor Author

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@anko-intel anko-intel changed the title [WIP] [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN Jan 18, 2022
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 18, 2022
@anko-intel
Copy link
Contributor Author

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [centos-cpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 18, 2022
@anko-intel
Copy link
Contributor Author

@DominikaJedynak , @PawelGlomski-Intel please review

@bgawrych
Copy link
Contributor

@mxnet-bot run ci[all]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu, unix-gpu, edge, windows-gpu, website, windows-cpu, centos-gpu, centos-cpu, sanity, miscellaneous, clang]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 25, 2022
@bgawrych
Copy link
Contributor

@mxnet-bot run ci[windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Jan 26, 2022
@anko-intel
Copy link
Contributor Author

@mxnet-bot run ci[windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jan 26, 2022
Fix for fusing already fused FC + relu/activation for floating point is
added.
Fusing elemwise_add with FC with already fused relu/activation is
blocked due to accuracy issues.
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2022
@bgawrych bgawrych merged commit e9840b8 into apache:master Jan 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants