Skip to content

Conversation

@andrula-song
Copy link
Contributor

Add hifi3 & hifi4 version implementation of mixer processing functions.
The hifi version functions can save at least 47% cycles than C version.

Signed-off-by: Andrula Song xiaoyuan.song@intel.com

@andrula-song
Copy link
Contributor Author

Since hifi3 and hifi4 will use the same instructions, so named the hifi version of mixer as mixer_hifi.c.
compared with the original C version, the functions can save at least 47% cycles, here is the result:
mix_n_s16 can save about 67% cycles than C version;
mix_n_s24 can save about 51% cycles than C version;
mix_n_s32 can save about 47% cycles than C version;
mixer-new

@andrula-song andrula-song force-pushed the mixer branch 2 times, most recently from bbf9476 to c6a9f5c Compare August 12, 2022 07:16
@XiaoyunWu6666
Copy link
Contributor

SOFCI_TEST

@XiaoyunWu6666
Copy link
Contributor

SOFCI TEST

Copy link
Contributor

@ShriramShastry ShriramShastry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

@lgirdwood
Copy link
Member

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

@ShriramShastry
Copy link
Contributor

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

It appears to aid optimization by providing additional information to the compiler.

[ From HiFi User Guide documentation ]
3.4 Standard C/C++ Auto-Vectorization
Auto-vectorization of scalar C code can produce effective results on simple loop nests, but has its limits. It can be improved through the use of compiler pragmas and options, and effective data marshalling to make data accesses (loads and stores) regular and aligned.

Pragma is widely used in Nature DSP Library functions.

@lgirdwood
Copy link
Member

Is it possible to add check for __Pragma("no unroll"), __Pragma("no reorder"), and __Pragma("no simd") in the for loop?

Why ?

It appears to aid optimization by providing additional information to the compiler.

That's correct, but in this case @andrula-song is hand writing the intrinsics and the loops are complex. The autovectorizer works best on simple small loops, and the pragma suggestions above are not applicable here (and would probably make performance worse).

@andrula-song andrula-song marked this pull request as ready for review August 17, 2022 07:21
Copy link
Collaborator

@singalsu singalsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great HiFi optimization work! Some small issues only:

@lyakh
Copy link
Collaborator

lyakh commented Aug 18, 2022

approval condition to addressing comments from @singalsu of course

@andrula-song
Copy link
Contributor Author

hi @wszypelt , can you help to check internel CI? Thanks.

@lgirdwood
Copy link
Member

@lrudyX can you check CI, its showing a blank page. Thanks !

@andrula-song
Copy link
Contributor Author

SOFCI_TEST

Add hifi3 & hifi4 version implementation of mixer processing functions.
The hifi version functions can save at least 47% cycles than C version.

Signed-off-by: Andrula Song <xiaoyuan.song@intel.com>
@lgirdwood lgirdwood merged commit 100144a into thesofproject:main Aug 24, 2022
@andrula-song andrula-song deleted the mixer branch October 19, 2022 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants