Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

portable implementation of crc64ecma for both x86_64/AArch64 using SIMD (SSE/NEON) #697

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

lihuiba
Copy link
Collaborator

@lihuiba lihuiba commented Jan 15, 2025

Translated from the assembly implementation of ISA-L.

Its performance compared to the asm implementations is similar in X86_64, and ~2x faster in M1.

in MBP with M1 ARM CPU:

Note: Google Test filter = TestChecksumBig.crc64*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from TestChecksumBig
[ RUN      ] TestChecksumBig.crc64ecma_hw
crc64ecma_hw_portable time spent: 1246389 us (40.12 GB/s
[       OK ] TestChecksumBig.crc64ecma_hw (1337 ms)
[ RUN      ] TestChecksumBig.crc64ecma_hw_asm
crc64ecma_hw_asm(crc64_ecma_refl_pmull) time spent: 2177113 us (22.97 GB/s)
[       OK ] TestChecksumBig.crc64ecma_hw_asm (2182 ms)
[----------] 2 tests from TestChecksumBig (3520 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (3520 ms total)
[  PASSED  ] 2 tests.

in a server with Intel Xeon X86_64 CPU:

Note: Google Test filter = TestChecksumBig.crc64*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from TestChecksumBig
[ RUN      ] TestChecksumBig.crc64ecma_hw
crc64ecma_hw_portable time spent: 3841106 us (13.02 GB/s)
[       OK ] TestChecksumBig.crc64ecma_hw (3919 ms)
[ RUN      ] TestChecksumBig.crc64ecma_hw_asm
crc64ecma_hw_asm(crc64_ecma_refl_by8) time spent: 4064074 us (12.30 GB/s)
[       OK ] TestChecksumBig.crc64ecma_hw_asm (4102 ms)
[----------] 2 tests from TestChecksumBig (8021 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (8021 ms total)
[  PASSED  ] 2 tests.

in a server with Neoverse-N1 ARM CPU:

Note: Google Test filter = TestChecksumBig.crc64*
[==========] Running 2 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 2 tests from TestChecksumBig
[ RUN      ] TestChecksumBig.crc64ecma_hw
crc64ecma_hw_portable time spent: 5248720 us (9.53 GB/s)
[       OK ] TestChecksumBig.crc64ecma_hw (5393 ms)
[ RUN      ] TestChecksumBig.crc64ecma_hw_asm
crc64ecma_hw_asm(crc64_ecma_refl_pmull) time spent: 5277467 us (9.47 GB/s)
[       OK ] TestChecksumBig.crc64ecma_hw_asm (5287 ms)
[----------] 2 tests from TestChecksumBig (10681 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test suite ran. (10681 ms total)
[  PASSED  ] 2 tests.

@lihuiba lihuiba requested review from beef9999 and Coldwings January 15, 2025 16:03
@lihuiba lihuiba force-pushed the crc64-hw-cpp branch 3 times, most recently from 9a50f0b to 335b005 Compare January 15, 2025 16:37
Copy link
Collaborator

@Coldwings Coldwings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

elseif (${ARCH} STREQUAL aarch64)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcpu=generic+crc -fsigned-char -fno-stack-protector -fomit-frame-pointer")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcpu=native -fsigned-char -fno-stack-protector -fomit-frame-pointer")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should make those mcpu like args be a configurable option, so users can make their own choice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! You can make it later, as well as an option for the assemblies and nasm in test-checksum.

@lihuiba lihuiba merged commit eae4779 into alibaba:main Jan 16, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants