-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio Math: Improve audio format saturation functions. #9344
Audio Math: Improve audio format saturation functions. #9344
Conversation
Performance results using Generic HiFi compilers
example usage in FW:
|
Each line should be it's own commit, please split this. Please read the section on atomic comic convention here |
098e7df
to
5e6e50c
Compare
Done ! Could you please take another look? |
@ShriramShastry can you add data that shows the before results (I think these are all "after" data). That ways its easier to see the benefits. Thanks ! |
Thank you; please look for the updated results table.
Summary |
@singalsu good performance updates from @ShriramShastry pls review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved the C code changes, @singalsu will need to approve the maths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see performance improvement from this with MTL xcc builds of EQs, volume, DRC, TDFB. But also there wasn't any degrade of performance, same results within a fraction of MCPS. I think it's because most of the code is sat_int32() that wasn't improved (at least much) by looks of the code though you reported some improve. Can you check especially the s24 case.
src/include/sof/audio/format_hifi3.h
Outdated
#include <xtensa/config/defs.h> | ||
#include <xtensa/tie/xt_hifi3.h> | ||
#include <stdint.h> | ||
#include <xtensa/tie/xt_hifi2.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder what here needs HiFi2 headers? There should be HiFi3 operation available for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C Syntax recommendation
#include <xtensa/tie/xt_hifi2.h>
I am OK, to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it compiles without that, I think better to remove to avoid confusion.
static inline int32_t sat_int32(int64_t x) | ||
{ | ||
return (ae_int32)AE_ROUND32F48SSYM(AE_SLAI64(x, 16)); | ||
/* Shift left by 32 bits with saturation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes an overflow with very large input values. But wouldn't this be a simpler fix:
return (ae_int32)AE_ROUND32F48SSYM(AE_SLAI64S(x, 16));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you have idea why your version is faster in your test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/thesofproject/sof/issues/9285
With input values of -9223372036854775808 and 9223372036854775807 . It is expected to return -2147483648 and 2147483647 but we are observing 0 and -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is possible overflow with AE_SLAI64() instead of AE_SLAI64S(). The proper link to issue is: #9285.
Not sure, but without looking disasembly I'm guessing your version has the small speed advantage from easier type casts. Maybe (int32_t)AE_MOVINT32_FROMINT64() is just a normal store operation. I'm OK with this if you can change the typecast from (int) to (int32_t), if the code still compiles, since the that's type declared type of the function.
src/include/sof/audio/format_hifi3.h
Outdated
|
||
/* Convert 64-bit integers to ae_f64 format */ | ||
d0 = int64_rtor_ae_f64(x); | ||
d1 = int64_rtor_ae_f64(y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would just C typecast d0 = (ae_f64)x;
work ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, (ae_f64)
is similar int64_rtor_ae_f64
.
HiFi-5
d0 = (ae_f64)x;
d1 = (ae_f64)y;
Performance Comparison:
Function | Array Size | Branching Cycles | HiFi Cycles | Bit True? |
-------------------+--------------+-------------------+-------------------+------------+
review_sat_int32 | 32 | 214 | 105 | Yes |
d0 = int64_rtor_ae_f64(x);
d1 = int64_rtor_ae_f64(y);
Performance Comparison:
Function | Array Size | Branching Cycles | HiFi Cycles | Bit True? |
-------------------+--------------+-------------------+-------------------+------------+
review_sat_int32 | 32 | 241 | 105 | Yes |
Thanks !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'd prefer that since it's more readable version. int64_rtor_ae_f64() makes readers wonder what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "Branching Cycles" and why does it differ for these ways? Is (ae_f64) typecast better with smaller number? Still the HiFi cycles are the same. Not critical but if typecast way is good for you, I prefer it.
Its probably difficult to see the gain from logs if its hidden in larger infra especially as we as saving single instructions in some cases. If logs are marginal AND maths is correct then we should trust the tooling that these are valid improvements and merge as future use cases may hit these APIs harder where benefits in the logs will be more obvious. |
There is no answers to my questions yet, not removing my reject. I have especially concern about 24 bit saturate intrinsics compatibility with the tricky S24_4LE format in SOF. I can check it today myself of course. |
src/include/sof/audio/format_hifi3.h
Outdated
ae_f64 shifted = AE_SLAI64S(x, 32); | ||
|
||
/* Shift right by 32 bits */ | ||
return (int)AE_MOVINT32_FROMINT64(AE_SRAI64(shifted, 32)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the typecast should be (int32_t).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for examining the code. I've included my responses. Please have a look.
static inline int32_t sat_int32(int64_t x) | ||
{ | ||
return (ae_int32)AE_ROUND32F48SSYM(AE_SLAI64(x, 16)); | ||
/* Shift left by 32 bits with saturation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/thesofproject/sof/issues/9285
With input values of -9223372036854775808 and 9223372036854775807 . It is expected to return -2147483648 and 2147483647 but we are observing 0 and -1
src/include/sof/audio/format_hifi3.h
Outdated
#include <xtensa/config/defs.h> | ||
#include <xtensa/tie/xt_hifi3.h> | ||
#include <stdint.h> | ||
#include <xtensa/tie/xt_hifi2.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C Syntax recommendation
#include <xtensa/tie/xt_hifi2.h>
I am OK, to remove.
src/include/sof/audio/format_hifi3.h
Outdated
|
||
/* Convert 64-bit integers to ae_f64 format */ | ||
d0 = int64_rtor_ae_f64(x); | ||
d1 = int64_rtor_ae_f64(y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, (ae_f64)
is similar int64_rtor_ae_f64
.
HiFi-5
d0 = (ae_f64)x;
d1 = (ae_f64)y;
Performance Comparison:
Function | Array Size | Branching Cycles | HiFi Cycles | Bit True? |
-------------------+--------------+-------------------+-------------------+------------+
review_sat_int32 | 32 | 214 | 105 | Yes |
d0 = int64_rtor_ae_f64(x);
d1 = int64_rtor_ae_f64(y);
Performance Comparison:
Function | Array Size | Branching Cycles | HiFi Cycles | Bit True? |
-------------------+--------------+-------------------+-------------------+------------+
review_sat_int32 | 32 | 241 | 105 | Yes |
Thanks !!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for responses, many resolved issues. Some still left, mainly commit text improvement need I think
src/include/sof/audio/format_hifi3.h
Outdated
#include <xtensa/config/defs.h> | ||
#include <xtensa/tie/xt_hifi3.h> | ||
#include <stdint.h> | ||
#include <xtensa/tie/xt_hifi2.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it compiles without that, I think better to remove to avoid confusion.
static inline int32_t sat_int32(int64_t x) | ||
{ | ||
return (ae_int32)AE_ROUND32F48SSYM(AE_SLAI64(x, 16)); | ||
/* Shift left by 32 bits with saturation */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is possible overflow with AE_SLAI64() instead of AE_SLAI64S(). The proper link to issue is: #9285.
Not sure, but without looking disasembly I'm guessing your version has the small speed advantage from easier type casts. Maybe (int32_t)AE_MOVINT32_FROMINT64() is just a normal store operation. I'm OK with this if you can change the typecast from (int) to (int32_t), if the code still compiles, since the that's type declared type of the function.
src/include/sof/audio/format_hifi3.h
Outdated
|
||
/* Convert 64-bit integers to ae_f64 format */ | ||
d0 = int64_rtor_ae_f64(x); | ||
d1 = int64_rtor_ae_f64(y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'd prefer that since it's more readable version. int64_rtor_ae_f64() makes readers wonder what it does.
src/include/sof/audio/format_hifi3.h
Outdated
|
||
/* Convert 64-bit integers to ae_f64 format */ | ||
d0 = int64_rtor_ae_f64(x); | ||
d1 = int64_rtor_ae_f64(y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the "Branching Cycles" and why does it differ for these ways? Is (ae_f64) typecast better with smaller number? Still the HiFi cycles are the same. Not critical but if typecast way is good for you, I prefer it.
- Changed year to 2024 and added author information. - Updated the license identifier to BSD-3-Clause. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Reshuffle standard headers order. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Enhanced function documentation. - Replaced AE_SLAI64 with AE_SLAI64S for better saturation control. - Added intermediary shift left by 32 bits with saturation. - Final shift right by 32 bits for accurate rounding and conversion. - No performance degradation observed. This check-in ensures more accurate and reliable results by using more appropriate intrinsic functions and thorough step-by-step operations. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added to saturate and pack two 64-bit integers into 32x2 vector. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added detailed function documentation. - Changed from AE_SRAI32(AE_SLAI32S(x, 8), 8) to AE_SAT24S for more accurate 24-bit saturation. - Used ae_f32x2 type to ensure correct input handling. - No performance degradation observed. This check-in enhances the sat_int24 function by improving its logic and documentation to ensure more accurate 32-bit to 24-bit conversion. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added to saturate and pack two 32-bit integers into a 32x2 vector. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added detailed function documentation. - Changed from AE_SAT16X4(x, x) to a more descriptive process: - Moved the 32-bit value to ae_int32x2 type for accurate input handling. - Used AE_SAT16X4 to saturate and pack values. - Extracted the first 16-bit value with AE_MOVAD16_0. - No performance degradation observed. This check-in enhances the sat_int16 function by improving its logic and documentation to ensure more accurate 32-bit to 16-bit conversion. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added to saturate and pack four 32-bit integers into 16x4 vector. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added detailed function documentation. - Changed from if-else checks to bitwise operations for efficiency: - Shifted left by 24 bits to prepare for saturation. - Shifted right by 24 bits to sign-extend the value. - Extracted the 8-bit value using AE_MOVAD32_L. - No performance degradation observed. This check-in refines the sat_int8 function by improving its logic and documentation to ensure a more efficient and accurate 32-bit to 8-bit conversion process. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
- Added to pack two 32-bit integers in an 8x2 vector. Signed-off-by: Shriram Shastry <malladi.sastry@intel.com>
8e2b48f
to
c6af29a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have address the review comments. Please take look,
Thank you.
@singalsu good for you ? |
@@ -53,9 +53,23 @@ static inline ae_int32x2 vec_sat_int32x2(int64_t x, int64_t y) | |||
/* Round and saturate both 64-bit values to 32-bit and pack them */ | |||
return (ae_int32x2)AE_ROUND32X2F64SSYM(d0, d1); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit about commit text, the new version isn't more accurate, it's mote efficient.
sat_int32
to support 64-bit to 32-bit saturation with proper bitwise shifting.vec_sat_int32x2
function to saturate and pack two 64-bit integers into32x2
vector.sat_int24
for 32-bit to 24-bit saturation.vec_sat_int24x2
to saturate and pack two 32-bit integers into a32x2
vector.sat_int16
to return a single 16-bit saturated integer.vec_sat_int16x4
function to saturate and pack four 32-bit integers into16x4
vector.sat_int8
to properly extract the lower 8 bits of a 32-bit integer.vec_sat_int8x2
to pack two 32-bit integers in an8x2
vector.