[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

anko-intel · 2022-01-14T13:08:20Z

Description

This change fuses FullyConnected operator with elemwise_add after it if possible. It is done for both float and quantized path.

The change well optimize calculation on quantized graph with full quantization mode. Below are the measured results of the following command:
benchmark/python/dnnl/run.sh benchmark/python/dnnl/fc_add.py
run before and after this PR. Measurements are done on AWS EC2 instance c6i.16xlarge (Xeon(R) Platinum 8375C CPU).

elemwise_add, float

Shape	Hidden	Before [ms]	After [ms]	Improvement
( 1, 224)	512	0.165	0.151	8%
( 1, 224)	4096	0.169	0.150	11%
( 16,1024)	1024	0.274	0.245	11%
( 32,4096)	1024	0.634	0.611	4%
( 32,4096)	4096	2.352	2.299	2%
( 512, 512)	4096	1.517	1.414	7%

elemwise_add, mode = smart, granularity = tensor-wise

Shape	Hidden	Before [ms]	After [ms]	Improvement
( 1, 224)	512	0.182	0.173	5%
( 1, 224)	4096	0.184	0.169	8%
( 16,1024)	1024	0.246	0.235	4%
( 32,4096)	1024	0.328	0.317	3%
( 32,4096)	4096	0.573	0.571	0%
( 512, 512)	4096	0.819	0.730	11%

elemwise_add, mode = smart, granularity = channel-wise

Shape	Hidden	Before [ms]	After [ms]	Improvement
( 1, 224)	512	0.164	0.138	16%
( 1, 224)	4096	0.152	0.143	6%
( 16,1024)	1024	0.213	0.199	7%
( 32,4096)	1024	0.300	0.285	5%
( 32,4096)	4096	0.545	0.542	1%
( 512, 512)	4096	0.778	0.689	11%

elemwise_add, mode = full, granularity = tensor-wise

Shape	Hidden	Before* [ms]	After [ms]	Improvement
( 1, 224)	512	0.169	0.154	9%
( 1, 224)	4096	0.208	0.158	24%
( 16,1024)	1024	0.270	0.212	21%
( 32,4096)	1024	0.359	0.293	18%
( 32,4096)	4096	0.602	0.542	10%
( 512, 512)	4096	0.873	0.652	25%

elemwise_add, mode = full, granularity = channel-wise

Shape	Hidden	Before* [ms]	After [ms]	Improvement
( 1, 224)	512	0.135	0.125	7%
( 1, 224)	4096	0.178	0.130	27%
( 16,1024)	1024	0.239	0.180	25%
( 32,4096)	1024	0.327	0.262	20%
( 32,4096)	4096	0.575	0.512	11%
( 512, 512)	4096	0.852	0.625	27%

* - before this PR fuzing FC with add in full quantize mode is broken, so results are taken from the first commit (0c38ca7) of this PR which fix the issue

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Convolution uses convention data_name_[min|max] which is object oriented and more readable.

mxnet-bot · 2022-01-14T13:08:24Z

Hey @anko-intel , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-gpu, unix-gpu, centos-gpu, clang, miscellaneous, unix-cpu, windows-cpu, website, sanity, centos-cpu, edge]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

anko-intel · 2022-01-17T21:25:36Z

@mxnet-bot run ci [unix-cpu, windows-gpu]

mxnet-bot · 2022-01-17T21:25:43Z

Jenkins CI successfully triggered : [windows-gpu, unix-cpu]

anko-intel · 2022-01-18T08:37:10Z

@mxnet-bot run ci [windows-gpu]

mxnet-bot · 2022-01-18T08:37:16Z

Jenkins CI successfully triggered : [windows-gpu]

anko-intel · 2022-01-18T12:32:15Z

@mxnet-bot run ci [centos-cpu]

mxnet-bot · 2022-01-18T12:32:22Z

Jenkins CI successfully triggered : [centos-cpu]

anko-intel · 2022-01-18T14:55:41Z

@DominikaJedynak , @PawelGlomski-Intel please review

src/operator/subgraph/build_subgraph.cc

src/operator/subgraph/dnnl/dnnl_fc.cc

bgawrych · 2022-01-25T13:19:27Z

@mxnet-bot run ci[all]

mxnet-bot · 2022-01-25T13:19:37Z

Jenkins CI successfully triggered : [unix-cpu, unix-gpu, edge, windows-gpu, website, windows-cpu, centos-gpu, centos-cpu, sanity, miscellaneous, clang]

bgawrych · 2022-01-26T06:58:22Z

@mxnet-bot run ci[windows-gpu]

mxnet-bot · 2022-01-26T06:58:27Z

Jenkins CI successfully triggered : [windows-gpu]

anko-intel · 2022-01-26T11:40:17Z

@mxnet-bot run ci[windows-gpu]

mxnet-bot · 2022-01-26T11:40:23Z

Jenkins CI successfully triggered : [windows-gpu]

Fix for fusing already fused FC + relu/activation for floating point is added. Fusing elemwise_add with FC with already fused relu/activation is blocked due to accuracy issues.

anko-intel added 3 commits January 13, 2022 12:58

Fix elemwise_add post quantization pass

0c38ca7

Align naming convention with convolution operator

d7d537d

Convolution uses convention data_name_[min|max] which is object oriented and more readable.

Fuse FC with elemwise_add

af045b8

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Jan 14, 2022

anko-intel changed the title ~~[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN~~ [WIP] [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN Jan 14, 2022

anko-intel added 2 commits January 17, 2022 15:49

[TEST] Add functional tests for FC + add operators fusion

d8fad4e

[TESTS] Disable quantization check for not supported cases

88170d0

anko-intel force-pushed the anko_FC_add branch from 278905a to 88170d0 Compare January 17, 2022 14:50

anko-intel changed the title ~~[WIP] [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN~~ [FEAUTURE] Fuses FC + elemwise_add operators for oneDNN Jan 18, 2022

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 18, 2022

PawelGlomski-Intel reviewed Jan 19, 2022

View reviewed changes

src/operator/subgraph/build_subgraph.cc Show resolved Hide resolved

PawelGlomski-Intel approved these changes Jan 19, 2022

View reviewed changes

DominikaJedynak reviewed Jan 21, 2022

View reviewed changes

src/operator/subgraph/dnnl/dnnl_fc.cc Show resolved Hide resolved

DominikaJedynak approved these changes Jan 24, 2022

View reviewed changes

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 25, 2022

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-work-in-progress PR is still work in progress labels Jan 26, 2022

bgawrych approved these changes Jan 27, 2022

View reviewed changes

Take into account already fused elemwise operation

af70198

Fix for fusing already fused FC + relu/activation for floating point is added. Fusing elemwise_add with FC with already fused relu/activation is blocked due to accuracy issues.

mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-review PR is waiting for code review and removed pr-awaiting-review PR is waiting for code review pr-awaiting-testing PR is reviewed and waiting CI build and test labels Jan 27, 2022

bgawrych merged commit e9840b8 into apache:master Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

anko-intel commented Jan 14, 2022 •

edited

Loading

mxnet-bot commented Jan 14, 2022

anko-intel commented Jan 17, 2022

mxnet-bot commented Jan 17, 2022

anko-intel commented Jan 18, 2022

mxnet-bot commented Jan 18, 2022

anko-intel commented Jan 18, 2022

mxnet-bot commented Jan 18, 2022

anko-intel commented Jan 18, 2022

bgawrych commented Jan 25, 2022

mxnet-bot commented Jan 25, 2022

bgawrych commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

anko-intel commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

[FEAUTURE] Fuses FC + elemwise_add operators for oneDNN #20821

Conversation

anko-intel commented Jan 14, 2022 • edited Loading

Description

Checklist

Essentials

mxnet-bot commented Jan 14, 2022

anko-intel commented Jan 17, 2022

mxnet-bot commented Jan 17, 2022

anko-intel commented Jan 18, 2022

mxnet-bot commented Jan 18, 2022

anko-intel commented Jan 18, 2022

mxnet-bot commented Jan 18, 2022

anko-intel commented Jan 18, 2022

bgawrych commented Jan 25, 2022

mxnet-bot commented Jan 25, 2022

bgawrych commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

anko-intel commented Jan 26, 2022

mxnet-bot commented Jan 26, 2022

anko-intel commented Jan 14, 2022 •

edited

Loading