DNN: Further optimization of Conv2D #22401

zihaomu · 2022-08-19T03:54:35Z

The optimization point of this PR:

Bring and adapt the latest the Ficus OpConv.fx code.
Fused Conv+Add+Activation. (Currently only for Conv2D, in the future FastConv can support Conv1D and Conv3D, we will also support both.)
FastConv branch still reuses the weightsMat, and remove the fastWeights.
Optimize the Winograd_F63: pack tile 12 and adjust the data pipeline. (About 1 ms speedup on M1 chip, testing model: ResNet50).

TODO List:

~~Find out why MobileNetv2 is slow.~~ It is caused by the difference between OpenMP and GCD.

Performance Test on ARM (Appel M1 Chip, 4 threads)

Model Name	Wthout Patch	With Patch
ReseNet 50	26.8 ms	24.6 ms
MobileNetv2	5.43 ms	GCD: 5.5 ms, OpenMP: 4.7 ms

Performance Test on ARM (Raspberry Pi 4, A72, 4 threads)

Model Name	Without Patch	With Patch	NCNN's Benchmark
ReseNet 50	440.90 ms	400.6 ms (10% faster)	330 ms
MobileNetv2	51.64 ms	51.2 ms	71 ms

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

…st code from ficus OpConv.fx.

asenyaev · 2022-08-29T07:21:50Z

Seems that merging this PR a build with coverage started to fail (log).

zihaomu · 2022-08-29T07:26:27Z

Seems that merging this PR a build with coverage started to fail (log).

Thanks, @asenyaev. I will submit a PR to fix it.

asenyaev · 2022-08-29T07:30:07Z

@zihaomu, thank you!

…st code from ficus OpConv.fx. (opencv#22401)

zihaomu requested a review from vpisarev August 19, 2022 03:54

zihaomu force-pushed the conv2d_optimize branch from c21c1b9 to 33a2bb6 Compare August 19, 2022 05:44

zihaomu added category: dnn optimization labels Aug 19, 2022

zihaomu mentioned this pull request Aug 19, 2022

DNN: FP16 support on Convolution 2D #22275

Merged

6 tasks

zihaomu changed the title ~~DNN: Further optimization of Conv2D, fused Conv_Add_Activation.~~ DNN: Further optimization of Conv2D Aug 19, 2022

zihaomu force-pushed the conv2d_optimize branch from f78ec6f to 63e83aa Compare August 19, 2022 09:46

zihaomu marked this pull request as ready for review August 23, 2022 11:09

zihaomu force-pushed the conv2d_optimize branch from 8bf0661 to 0f8212e Compare August 23, 2022 12:49

Further optimization of Conv2D, fused Conv_Add_Activation, bring late…

a1ba979

…st code from ficus OpConv.fx.

zihaomu force-pushed the conv2d_optimize branch from 0f8212e to a1ba979 Compare August 26, 2022 02:06

vpisarev approved these changes Aug 26, 2022

View reviewed changes

vpisarev merged commit bb64db9 into opencv:4.x Aug 26, 2022

zihaomu mentioned this pull request Aug 29, 2022

DNN: replace v_add with plus #22440

Merged

6 tasks

alalek mentioned this pull request Aug 31, 2022

DNN: Try to be compatible with win32 #22454

Merged

6 tasks

alalek mentioned this pull request Jan 8, 2023

(5.x) Merge 4.x #23113

Merged

asmorkalov added this to the 4.7.0 milestone Jan 23, 2023

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023

Further optimization of Conv2D, fused Conv_Add_Activation, bring late…

40c9927

…st code from ficus OpConv.fx. (opencv#22401)

opencv-alalek mentioned this pull request Aug 15, 2023

DNN: fix the issue in layer_fuse #24156

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DNN: Further optimization of Conv2D #22401

DNN: Further optimization of Conv2D #22401

Uh oh!

zihaomu commented Aug 19, 2022 •

edited

Loading

Uh oh!

asenyaev commented Aug 29, 2022

Uh oh!

zihaomu commented Aug 29, 2022

Uh oh!

asenyaev commented Aug 29, 2022

Uh oh!

Uh oh!

Uh oh!

DNN: Further optimization of Conv2D #22401

DNN: Further optimization of Conv2D #22401

Uh oh!

Conversation

zihaomu commented Aug 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The optimization point of this PR:

Performance Test on ARM (Appel M1 Chip, 4 threads)

Performance Test on ARM (Raspberry Pi 4, A72, 4 threads)

Pull Request Readiness Checklist

Uh oh!

asenyaev commented Aug 29, 2022

Uh oh!

zihaomu commented Aug 29, 2022

Uh oh!

asenyaev commented Aug 29, 2022

Uh oh!

Uh oh!

zihaomu commented Aug 19, 2022 •

edited

Loading