-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123
[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123
Conversation
db355f7
to
0482f4a
Compare
f946770
to
9004e85
Compare
7d3ba52
to
4f4e832
Compare
@EgorDuplensky Could you please start the review? Thanks! |
src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/matmul.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape); | ||
} | ||
|
||
tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.
18b2f19
to
3a13983
Compare
0ba1be0
to
e0b96ec
Compare
src/plugins/intel_cpu/src/nodes/executors/acl/acl_common_executor.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
OperationType::FullyConnected, | ||
ShapeTolerance::Agnostic, | ||
// supports | ||
[](const FCConfig& config) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorDuplensky I'll disable it when review will be ended
const std::vector<ShapeRelatedParams> IS = { | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}}, | ||
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EgorDuplensky created issue CVS-145273
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/executors/acl/acl_fullyconnected.cpp
Outdated
Show resolved
Hide resolved
ca90d15
to
8d548b5
Compare
// Issue: CVS-123514 | ||
static arm_compute::Mutex mtx_config; | ||
arm_compute::lock_guard<arm_compute::Mutex> _lock{mtx_config}; | ||
config(); | ||
} | ||
|
||
bool getActivationLayerInfo(Algorithm algorithm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getActivationLayerInfo
function returns boolean value and updates arm_compute::ActivationLayerInfo
instance.
Single responsibility: separate to two different functions, please:
getActivationLayerInfo
function has to returnarm_compute::ActivationLayerInfo
instancecheckActivationLayerInfo
should check element-wise operation type.
Note, please, arm_compute::GEMMInfo
has different from arm_compute::FullyConnectedLayerInfo
, not possible to use in suggested way. Additionally, after fix you don't need to create temporary instance in ACLFullyConnected::supports
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshoguli corrected
VERIFY(one_of(srcType(config), ov::element::f16, ov::element::f32), UNSUPPORTED_SRC_PRECISIONS); | ||
VERIFY(one_of(weiType(config), ov::element::f16, ov::element::f32), UNSUPPORTED_WEI_PRECISIONS); | ||
VERIFY(postOpsNumbers(config) < 2, UNSUPPORTED_NUMBER_OF_POSTOPS); | ||
VERIFY(checkAndInitPostOps(config.postOps, tmpFullyConnectedLayerInfo), UNSUPPORTED_TYPE_OF_POSTOPS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to provide supported activations before activation fusing? If not, those it mean, that we use reference FullyConnected
implementation if not supported activation is fused, instead avoid fusing and use ACL FullyConnected
implementation and execute not supported activation separatelly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eshoguli corrected
d78cb87
to
948edf2
Compare
aa77008
to
fef6788
Compare
fef6788
to
972a62f
Compare
CVS-138509
CVS-137575
CVS-147625
CVS-148130