Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ARM CPU] Add ACL FC executor for FP32/FP16 precision #24123

Merged
merged 6 commits into from
Aug 13, 2024

Conversation

allnes
Copy link
Contributor

@allnes allnes commented Apr 18, 2024

@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Apr 18, 2024
@allnes allnes added platform: arm OpenVINO on ARM / ARM64 no_stale Do not mark as stale labels Apr 23, 2024
@allnes allnes marked this pull request as ready for review May 13, 2024 13:33
@allnes allnes requested review from a team as code owners May 13, 2024 13:33
@allnes allnes requested review from alvoron and EgorDuplensky May 13, 2024 13:33
@allnes allnes force-pushed the an/fc_acl_executor branch from db355f7 to 0482f4a Compare May 13, 2024 13:36
@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from f946770 to 9004e85 Compare May 14, 2024 14:38
@dmitry-gorokhov dmitry-gorokhov added this to the 2024.3 milestone May 22, 2024
@allnes allnes force-pushed the an/fc_acl_executor branch from 7d3ba52 to 4f4e832 Compare May 24, 2024 17:47
@dmitry-gorokhov
Copy link
Contributor

@EgorDuplensky Could you please start the review? Thanks!

@allnes allnes requested a review from eshoguli June 17, 2024 17:58
@allnes allnes requested a review from alvoron June 19, 2024 08:40
aclMemoryInfoMap[ARG_WEI]->set_tensor_shape(temp_weights_shape);
}

tensorsInfoValidateStatus = arm_compute::NEFullyConnectedLayer::validate(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not oneDNN use weights packing feature for ACL integration?
https://arm-software.github.io/ComputeLibrary/v23.02.1/classarm__compute_1_1_n_e_fully_connected_layer.xhtml#a19aa329510cbef84acc16335c2099908
Just asking.
Because, if not, later we better to try to use it by ourselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed. oneDNN does use has_opt_impl feature (basically weights packing).
So, the oneDNN logic needs to be replicated for ACLFCExecutor to ensure no performance drop.
We can merge the PR with no weights packing support, as soon as all the tests are passed, but completely disable the ACLFCExecutor for now.

@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 18b2f19 to 3a13983 Compare June 21, 2024 19:41
@allnes allnes force-pushed the an/fc_acl_executor branch 2 times, most recently from 0ba1be0 to e0b96ec Compare June 25, 2024 14:13
OperationType::FullyConnected,
ShapeTolerance::Agnostic,
// supports
[](const FCConfig& config) -> bool {
Copy link
Contributor

@EgorDuplensky EgorDuplensky Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's ensure the tests are passed and disable the executor for now.
There is no rush to enable it and replace the oneDNN one.
We need to make sure we don't have degradations first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorDuplensky I'll disable it when review will be ended

Comment on lines +95 to +104
const std::vector<ShapeRelatedParams> IS = {
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, false}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, false}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {false, true}},
{static_shapes_to_test_representation({{1, 2, 32, 120}, {120, 5}}), {true, true}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to actually complete the tests refactoring for FullyConnected node (to add common tests, etc). Let's do it in scope of a follow up PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorDuplensky created issue CVS-145273

@allnes allnes requested a review from maxnick June 26, 2024 20:07
@allnes allnes assigned dmitry-gorokhov and unassigned maxnick Aug 1, 2024
@allnes allnes requested a review from dmitry-gorokhov August 1, 2024 12:32
@allnes allnes force-pushed the an/fc_acl_executor branch from ca90d15 to 8d548b5 Compare August 5, 2024 12:06
@allnes allnes requested a review from dmitry-gorokhov August 5, 2024 13:13
// Issue: CVS-123514
static arm_compute::Mutex mtx_config;
arm_compute::lock_guard<arm_compute::Mutex> _lock{mtx_config};
config();
}

bool getActivationLayerInfo(Algorithm algorithm,
Copy link
Contributor

@eshoguli eshoguli Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getActivationLayerInfo function returns boolean value and updates arm_compute::ActivationLayerInfo instance.
Single responsibility: separate to two different functions, please:

  • getActivationLayerInfo function has to return arm_compute::ActivationLayerInfo instance
  • checkActivationLayerInfo should check element-wise operation type.

Note, please, arm_compute::GEMMInfo has different from arm_compute::FullyConnectedLayerInfo, not possible to use in suggested way. Additionally, after fix you don't need to create temporary instance in ACLFullyConnected::supports method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eshoguli corrected

VERIFY(one_of(srcType(config), ov::element::f16, ov::element::f32), UNSUPPORTED_SRC_PRECISIONS);
VERIFY(one_of(weiType(config), ov::element::f16, ov::element::f32), UNSUPPORTED_WEI_PRECISIONS);
VERIFY(postOpsNumbers(config) < 2, UNSUPPORTED_NUMBER_OF_POSTOPS);
VERIFY(checkAndInitPostOps(config.postOps, tmpFullyConnectedLayerInfo), UNSUPPORTED_TYPE_OF_POSTOPS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to provide supported activations before activation fusing? If not, those it mean, that we use reference FullyConnected implementation if not supported activation is fused, instead avoid fusing and use ACL FullyConnected implementation and execute not supported activation separatelly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eshoguli corrected

@allnes allnes force-pushed the an/fc_acl_executor branch from d78cb87 to 948edf2 Compare August 6, 2024 16:41
@allnes allnes requested review from eshoguli August 6, 2024 16:43
@allnes allnes force-pushed the an/fc_acl_executor branch 5 times, most recently from aa77008 to fef6788 Compare August 6, 2024 17:17
@allnes allnes force-pushed the an/fc_acl_executor branch from fef6788 to 972a62f Compare August 6, 2024 18:51
@dmitry-gorokhov dmitry-gorokhov added this pull request to the merge queue Aug 13, 2024
Merged via the queue into openvinotoolkit:master with commit 8d1cd4e Aug 13, 2024
132 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin no_stale Do not mark as stale platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants