-
Notifications
You must be signed in to change notification settings - Fork 66
common/trinary: Accelerate trit tryte conversion with SIMD SSE4.2 #1443
base: develop
Are you sure you want to change the base?
Conversation
@semenov-vladyslav, can you help review this pull request? |
The CI tests failed. |
the test failed on x86_64 toolchain, you can run it by |
The command
Can I ask for more information about this command? |
|
I fixed a bug and refactored some code in the updated branch. The bug was not detected since the input data size of Here comes with 2 questions:
|
yes, please.
It's caused by mysql/mariadb which may not installed on your system. since you are working on You can run binaries in the
|
I found another bug and fixed it in the updated branch. I also increased the input data size of trit tryte conversion for testing SIMD SSE4.2 acceleration. After the bug is fixed, most of the errors disappear.
with the testing command
and error log
|
you can test the here is output on my system:
|
These two cases are failed on system toolchain as well when I run
|
If I change the intrinsic function from The main difference of the intrinsic function is the requirement of address alignment to 16-byte boundary. @oopsmonk thank you for your advise and debugging help. |
Conclusion: The load and store intrinsic function are unified to the unalignment type since we can not control the input pointer value.
The input pointer value of these two cases are 8-byte alignment but not 16-byte alignment, which cause the I believe the unalignment intrinsic function is a better choice. The pull request can be merged without any problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use snake_case in variables for consistency.
@marktwtn Could you write benchmarks for |
It seems that there are no other benchmark in the other directories for me to reference. |
yep, printing out the time consumed by functions looks good, like DLTcollab/dcurl#92 (comment) |
Hey, @marktwtn how are you? do you need any help on Bazel or else? |
@oopsmonk I found out that performance might not behave as I expected. The input size and testing scenario would effect the performance result, hence I am still working on it. |
@oopsmonk I will forced push later. The modification includes:
|
The acceleration is enabled when the compiler option -msse4.2 is used and the input size of trit/tryte for conversion is larger than 384/128-bit. 128-bit is the basic operation unit of the most SSE instructions. The implementation is complex but it does accelerate the conversion speed with large input size.
In the original testing, the input trit/tryte data size are too small to test the SIMD SSE4.2 trit tryte conversion acceleration. trytes_to_trits minimum requirement: 128-bit = 16-tryte trits_to_trytes minimum requirement: 384-bit = 48-trit
The benchmark displays the minimum, maximum and average value of trit tryte conversion function of different input size. The range of input size can be modified in bench_trit_tryte.c. The default input/output tryte size range is 16 ~ 2048.
The threshold value is determined by the execution time difference. The time difference should be at least 500 nano second. The threshold experiment is run on the CPU: AMD Ryzen 5 2400G with Radeon Vega Graphics. TODO: trytes_to_trits() rarely slower when using SSE4.2 acceleration with large input.
The trit tryte SSE4.2 accleration testing is added since the threshold is added. Otherwise, the acceleration would not be tested or the testing input data need to be larger than the threshold value.
The acceleration is enabled when the compiler option -msse4.2 is used
and the input size of trit/tryte for conversion is larger than 128-bit,
which is the basic operation unit of the most SSE instructions.
The implementation is complex but it does acclerate the conversion speed
as the following experiment result reveals:
trits_to_trytes()
Input size Without SIMD SSE4.2(avg nsec) With SIMD SSE4.2(avg nsec)
81, 406.3, 261.5
243, 444.6, 162.7
6561, 53166.93, 27290.99
trytes_to_trits()
Input size Without SIMD SSE4.2(avg nsec) With SIMD SSE4.2(avg nsec)
81, 355.6, 167.1
2592, 5752.3, 1751.8
2673, 6273.0, 2098.8
For more detailed experiment result, please reference:
DLTcollab/dcurl#92
Test Plan:
$ bazel test --test_output=all --copt=-msse4.2 //common/trinary/tests/...
The command is for verifying the correctness.