[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

allnes · 2024-04-18T18:31:47Z

CVS-138509
CVS-137575
CVS-147625
CVS-148130

dmitry-gorokhov · 2024-05-27T09:01:17Z

@EgorDuplensky Could you please start the review? Thanks!

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/matmul.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_utils.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

EgorDuplensky · 2024-06-19T15:06:09Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

+        aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape);
+    }
+
+    tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate(


Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.

Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.hpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

EgorDuplensky · 2024-06-26T17:22:53Z

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

+            OperationType::FullyConnected,
+            ShapeTolerance::Agnostic,
+            // supports
+            [](const FCConfig& config) -> bool {


let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.

@EgorDuplensky I'll disable it when review will be ended

EgorDuplensky · 2024-06-26T17:24:22Z

src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/arm/matmul.cpp

+const std::vector<ShapeRelatedParams> IS = {
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}},
+        {static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}},


We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.

@EgorDuplensky created issue CVS-145273

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

eshoguli · 2024-08-06T14:14:48Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_utils.cpp

    // Issue: CVS-123514
    static arm_compute::Mutex mtx_config;
    arm_compute::lock_guard<arm_compute::Mutex> _lock{mtx_config};
    config();
 }
+
+bool getActivationLayerInfo(Algorithm algorithm,


getActivationLayerInfo function returns boolean value and updates arm_compute::ActivationLayerInfo instance.
Single responsibility: separate to two different functions, please:

getActivationLayerInfo function has to return arm_compute::ActivationLayerInfo instance

checkActivationLayerInfo should check element-wise operation type.

Note, please, arm_compute::GEMMInfo has different from arm_compute::FullyConnectedLayerInfo, not possible to use in suggested way. Additionally, after fix you don't need to create temporary instance in ACLFullyConnected::supports method.

@eshoguli corrected

eshoguli · 2024-08-06T14:28:41Z

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp

+    VERIFY(one_of(srcType(config), ov::element::f16, ov::element::f32),     UNSUPPORTED_SRC_PRECISIONS);
+    VERIFY(one_of(weiType(config), ov::element::f16, ov::element::f32),     UNSUPPORTED_WEI_PRECISIONS);
+    VERIFY(postOpsNumbers(config) < 2,                                      UNSUPPORTED_NUMBER_OF_POSTOPS);
+    VERIFY(checkAndInitPostOps(config.postOps, tmpFullyConnectedLayerInfo), UNSUPPORTED_TYPE_OF_POSTOPS);


Is there a way to provide supported activations before activation fusing? If not, those it mean, that we use reference FullyConnected implementation if not supported activation is fused, instead avoid fusing and use ACL FullyConnected implementation and execute not supported activation separatelly?

@eshoguli corrected

github-actions bot added the category: CPU OpenVINO CPU plugin label Apr 18, 2024

allnes added platform: arm OpenVINO on ARM / ARM64 no_stale Do not mark as stale labels Apr 23, 2024

allnes marked this pull request as ready for review May 13, 2024 13:33

allnes requested review from a team as code owners May 13, 2024 13:33

allnes requested review from alvoron and EgorDuplensky May 13, 2024 13:33

allnes assigned EgorDuplensky May 13, 2024

allnes force-pushed the an/fc_acl_executor branch from db355f7 to 0482f4a Compare May 13, 2024 13:36

allnes assigned alvoron May 13, 2024

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from f946770 to 9004e85 Compare May 14, 2024 14:38

dmitry-gorokhov added this to the 2024.3 milestone May 22, 2024

allnes force-pushed the an/fc_acl_executor branch from 7d3ba52 to 4f4e832 Compare May 24, 2024 17:47

eshoguli reviewed Jun 5, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

allnes requested a review from eshoguli June 17, 2024 17:58

alvoron reviewed Jun 18, 2024

View reviewed changes

allnes requested a review from alvoron June 19, 2024 08:40

EgorDuplensky reviewed Jun 19, 2024

View reviewed changes

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 18b2f19 to 3a13983 Compare June 21, 2024 19:41

eshoguli reviewed Jun 24, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.cpp Outdated Show resolved Hide resolved

eshoguli reviewed Jun 24, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_executor.hpp Outdated Show resolved Hide resolved

allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 0ba1be0 to e0b96ec Compare June 25, 2024 14:13

EgorDuplensky reviewed Jun 26, 2024

View reviewed changes

allnes requested a review from maxnick June 26, 2024 20:07

allnes assigned dmitry-gorokhov and unassigned maxnick Aug 1, 2024

allnes requested a review from dmitry-gorokhov August 1, 2024 12:32

dmitry-gorokhov reviewed Aug 2, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp Outdated Show resolved Hide resolved

dmitry-gorokhov reviewed Aug 2, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp Show resolved Hide resolved

dmitry-gorokhov reviewed Aug 2, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp Outdated Show resolved Hide resolved

dmitry-gorokhov reviewed Aug 2, 2024

View reviewed changes

src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp Outdated Show resolved Hide resolved

allnes force-pushed the an/fc_acl_executor branch from ca90d15 to 8d548b5 Compare August 5, 2024 12:06

allnes requested a review from dmitry-gorokhov August 5, 2024 13:13

dmitry-gorokhov approved these changes Aug 6, 2024

View reviewed changes

eshoguli reviewed Aug 6, 2024

View reviewed changes

allnes force-pushed the an/fc_acl_executor branch from d78cb87 to 948edf2 Compare August 6, 2024 16:41

allnes requested review from eshoguli August 6, 2024 16:43

allnes force-pushed the an/fc_acl_executor branch 5 times, most recently from aa77008 to fef6788 Compare August 6, 2024 17:17

Add FullyConnected ACL executor

972a62f

allnes force-pushed the an/fc_acl_executor branch from fef6788 to 972a62f Compare August 6, 2024 18:51

allnes added 5 commits August 6, 2024 20:51

Merge branch 'master' into an/fc_acl_executor

202be30

Merge branch 'master' into an/fc_acl_executor

3bfea24

Merge branch 'master' into an/fc_acl_executor

31482c3

Merge branch 'master' into an/fc_acl_executor

12b633e

Merge branch 'master' into an/fc_acl_executor

db3c971

dmitry-gorokhov added this pull request to the merge queue Aug 13, 2024

Merged via the queue into openvinotoolkit:master with commit 8d1cd4e Aug 13, 2024
132 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

allnes commented Apr 18, 2024 •

edited

Loading

dmitry-gorokhov commented May 27, 2024

EgorDuplensky Jun 19, 2024

EgorDuplensky Jun 28, 2024

EgorDuplensky Jun 26, 2024 •

edited

Loading

allnes Jun 26, 2024

EgorDuplensky Jun 26, 2024

allnes Jun 26, 2024

eshoguli Aug 6, 2024 •

edited

Loading

allnes Aug 6, 2024

eshoguli Aug 6, 2024

allnes Aug 6, 2024

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

Conversation

allnes commented Apr 18, 2024 • edited Loading

dmitry-gorokhov commented May 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EgorDuplensky Jun 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eshoguli Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

allnes commented Apr 18, 2024 •

edited

Loading

EgorDuplensky Jun 26, 2024 •

edited

Loading

eshoguli Aug 6, 2024 •

edited

Loading