-
Notifications
You must be signed in to change notification settings - Fork 1.2k
crypto: add dispatcher and implement hardware acceleration for Echo512 and Shavite512, improve benchmark stability #6852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
ec733b5 to
58965d4
Compare
WalkthroughAdds an X11 crypto subsystem with runtime dispatch (SapphireAutoDetect) and many architecture-specific backends (ARM AES/NEON, x86 SSSE3/SSE4.1/AVX2/AES-NI/SHA-NI). Build system updates (configure.ac, Makefile.am) probe CPU features and expose per-feature flags and SPHLIB_FLAGS. Several C crypto sources were replaced or moved to C++ (ECHO and SHAvite reworked), new AES utilities and T‑boxes added, and multiple optimized implementations (echo/shavite) for different ISAs introduced. Public APIs for ECHO/SHAvite contexts were tightened to typed context structs. Bench, init, and tests call SapphireAutoDetect; lint excludes the new crypto/x11 tree. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Nitpick comments (12)
test/lint/lint-whitespace.py (1)
21-21: Add trailing slash to the new exclude and confirm scope.Use a trailing "/" for consistency with most entries and to avoid pathspec edge cases. Also, excluding the entire directory may hide regressions; consider limiting exclusion to the specific subdirs/files that require it, or split tab vs. trailing-whitespace excludes if that was the intent.
Apply this diff:
- "src/crypto/x11", + "src/crypto/x11/",src/crypto/x11/sph_types.h (1)
336-342: Tighten the gating condition (portability/readability).Use
#if defined(SPH_UPTR) && defined(SPH_UNALIGNED) && SPH_UNALIGNEDto avoid relying on undefined-macro-to-0 semantics. Also consider#if __has_attribute(no_sanitize)around the expansion site so this doesn’t expand on compilers lacking the attribute.src/bench/bench_bitcoin.cpp (1)
65-66: Consider adding error handling for auto-detection.While the current implementation is functional, consider adding error handling or logging for the auto-detection process, similar to how SHA256AutoDetect returns a string describing the selected implementation.
Consider enhancing with diagnostic output:
- SapphireAutoDetect(); - SHA256AutoDetect(); + SapphireAutoDetect(); + std::string sha256_algo = SHA256AutoDetect(); + if (!argsman.GetBoolArg("-sanity-check", false)) { + tfm::format(std::cout, "Using the '%s' SHA256 implementation\n", sha256_algo); + }This would provide visibility into which implementations are being benchmarked, helpful for performance analysis.
src/crypto/x11/dispatch.cpp (1)
111-113: Consider validating both SSE4.1 and AES-NI are actually available togetherWhile the code checks both bits, it might be worth adding a comment explaining that AES-NI requires SSE4.1 as a prerequisite on x86 platforms.
#if defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI) const bool use_sse_4_1 = ((ecx >> 19) & 1); const bool use_aes_ni = ((ecx >> 25) & 1); + // AES-NI instructions require SSE4.1 support as a prerequisite if (use_sse_4_1 && use_aes_ni) {src/crypto/x11/arm_neon/echo.cpp (2)
53-79: Consider using std::swap for clarity in shift operationsWhile the current implementation is correct, using std::swap could make the circular shift operations more readable and potentially allow for compiler optimizations.
+#include <utility> + void ALWAYS_INLINE ShiftRow1(uint8x16_t& Wa, uint8x16_t& Wb, uint8x16_t& Wc, uint8x16_t& Wd) { - uint8x16_t tmp = Wa; - Wa = Wb; - Wb = Wc; - Wc = Wd; - Wd = tmp; + uint8x16_t tmp = std::move(Wa); + Wa = std::move(Wb); + Wb = std::move(Wc); + Wc = std::move(Wd); + Wd = std::move(tmp); }
86-127: Consider loop-based load/store to reduce code duplicationThe repetitive load and store operations could be simplified with a loop, improving maintainability without sacrificing performance.
- w[0] = vreinterpretq_u8_u64(vld1q_u64(&W[0][0])); - w[1] = vreinterpretq_u8_u64(vld1q_u64(&W[1][0])); - // ... 14 more lines - w[15] = vreinterpretq_u8_u64(vld1q_u64(&W[15][0])); + for (int i = 0; i < 16; i++) { + w[i] = vreinterpretq_u8_u64(vld1q_u64(&W[i][0])); + } // ... transformation code ... - vst1q_u64(&W[0][0], vreinterpretq_u64_u8(w[0])); - vst1q_u64(&W[1][0], vreinterpretq_u64_u8(w[1])); - // ... 14 more lines - vst1q_u64(&W[15][0], vreinterpretq_u64_u8(w[15])); + for (int i = 0; i < 16; i++) { + vst1q_u64(&W[i][0], vreinterpretq_u64_u8(w[i])); + }src/crypto/x11/ssse3/echo.cpp (1)
86-101: Consider using loops to reduce code duplicationSimilar to the ARM NEON version, the repetitive load/store operations could benefit from loop-based implementation.
- w[0] = _mm_load_si128((const __m128i*)&W[0][0]); - w[1] = _mm_load_si128((const __m128i*)&W[1][0]); - // ... repeated 14 more times - w[15] = _mm_load_si128((const __m128i*)&W[15][0]); + for (int i = 0; i < 16; i++) { + w[i] = _mm_load_si128((const __m128i*)&W[i][0]); + } // ... transformation code ... - _mm_store_si128((__m128i*)&W[0][0], w[0]); - _mm_store_si128((__m128i*)&W[1][0], w[1]); - // ... repeated 14 more times - _mm_store_si128((__m128i*)&W[15][0], w[15]); + for (int i = 0; i < 16; i++) { + _mm_store_si128((__m128i*)&W[i][0], w[i]); + }Also applies to: 112-127
src/crypto/x11/dispatch.h (1)
5-6: Header guard naming conventionThe header guard uses
BITCOIN_prefix but this is Dash Core code. Consider usingDASH_prefix for consistency.-#ifndef BITCOIN_CRYPTO_X11_DISPATCH_H -#define BITCOIN_CRYPTO_X11_DISPATCH_H +#ifndef DASH_CRYPTO_X11_DISPATCH_H +#define DASH_CRYPTO_X11_DISPATCH_HAnd at the end:
-#endif // BITCOIN_CRYPTO_X11_DISPATCH_H +#endif // DASH_CRYPTO_X11_DISPATCH_Hsrc/crypto/x11/shavite.cpp (1)
102-106: Minor: namespace closing comment typoThe closing comment says // namespace shavite_soft but the namespace is soft_shavite. Harmless, but please fix to avoid confusion in future diffs.
Also applies to: 246-248
src/crypto/x11/util/consts_aes.hpp (1)
65-77: T-tables look correct; one small robustness nitUsing std::rotl on uint32_t words to build all four tables is neat. Consider documenting the polynomial (0x11B) and generator (3) choice near gf8_pow/log for maintainers. No change required.
Also applies to: 78-82
configure.ac (1)
680-697: Feature probes look fine; gate cross-compilation assumptionsARM AES/SHA and CRC tests include intrinsics that require specific -march flags. When cross-compiling, these probes can mis-detect if the target flags are incompatible with the build host. Consider using AC_TRY_LINK with explicit -target or honoring user-provided CXXFLAGS to bypass probes when cross-compiling.
Also applies to: 739-755
src/crypto/x11/util/util.hpp (1)
34-48: Consider adding static_assert for type sizes.While the implementations look correct, adding compile-time assertions would provide additional safety guarantees for the SIMD types being used.
Add static assertions after the includes to verify expected sizes:
#if defined(ENABLE_ARM_AES) || defined(ENABLE_ARM_NEON) #include <arm_neon.h> +static_assert(sizeof(uint8x16_t) == 16, "uint8x16_t must be 16 bytes"); +static_assert(sizeof(uint32x4_t) == 16, "uint32x4_t must be 16 bytes"); #endif // ENABLE_ARM_AES || ENABLE_ARM_NEON #if defined(ENABLE_SSSE3) || (defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI)) #include <immintrin.h> +static_assert(sizeof(__m128i) == 16, "__m128i must be 16 bytes"); #endif // ENABLE_SSSE3 || (ENABLE_SSE41 && ENABLE_X86_AESNI)Also applies to: 67-83
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
configure.ac(7 hunks)src/Makefile.am(4 hunks)src/bench/bench_bitcoin.cpp(2 hunks)src/bench/pow_hash.cpp(12 hunks)src/crypto/x11/aes.cpp(1 hunks)src/crypto/x11/aes.h(1 hunks)src/crypto/x11/aes_helper.c(0 hunks)src/crypto/x11/arm_crypto/echo.cpp(1 hunks)src/crypto/x11/arm_crypto/shavite.cpp(1 hunks)src/crypto/x11/arm_neon/echo.cpp(1 hunks)src/crypto/x11/dispatch.cpp(1 hunks)src/crypto/x11/dispatch.h(1 hunks)src/crypto/x11/echo.cpp(4 hunks)src/crypto/x11/shavite.c(0 hunks)src/crypto/x11/shavite.cpp(1 hunks)src/crypto/x11/sph_echo.h(4 hunks)src/crypto/x11/sph_shavite.h(4 hunks)src/crypto/x11/sph_types.h(9 hunks)src/crypto/x11/ssse3/echo.cpp(1 hunks)src/crypto/x11/util/consts_aes.hpp(1 hunks)src/crypto/x11/util/util.hpp(1 hunks)src/crypto/x11/x86_aesni/echo.cpp(1 hunks)src/crypto/x11/x86_aesni/shavite.cpp(1 hunks)src/init/common.cpp(2 hunks)src/test/util/setup_common.cpp(2 hunks)test/lint/lint-whitespace.py(1 hunks)
💤 Files with no reviewable changes (2)
- src/crypto/x11/shavite.c
- src/crypto/x11/aes_helper.c
🧰 Additional context used
📓 Path-based instructions (4)
src/crypto/{ctaes,x11}/**
📄 CodeRabbit inference engine (CLAUDE.md)
Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11
Files:
src/crypto/x11/aes.cppsrc/crypto/x11/dispatch.cppsrc/crypto/x11/x86_aesni/shavite.cppsrc/crypto/x11/arm_crypto/echo.cppsrc/crypto/x11/dispatch.hsrc/crypto/x11/arm_crypto/shavite.cppsrc/crypto/x11/arm_neon/echo.cppsrc/crypto/x11/sph_echo.hsrc/crypto/x11/aes.hsrc/crypto/x11/x86_aesni/echo.cppsrc/crypto/x11/sph_shavite.hsrc/crypto/x11/echo.cppsrc/crypto/x11/util/util.hppsrc/crypto/x11/ssse3/echo.cppsrc/crypto/x11/shavite.cppsrc/crypto/x11/util/consts_aes.hppsrc/crypto/x11/sph_types.h
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/crypto/x11/aes.cppsrc/bench/pow_hash.cppsrc/crypto/x11/dispatch.cppsrc/crypto/x11/x86_aesni/shavite.cppsrc/crypto/x11/arm_crypto/echo.cppsrc/bench/bench_bitcoin.cppsrc/crypto/x11/dispatch.hsrc/crypto/x11/arm_crypto/shavite.cppsrc/init/common.cppsrc/test/util/setup_common.cppsrc/crypto/x11/arm_neon/echo.cppsrc/crypto/x11/sph_echo.hsrc/crypto/x11/aes.hsrc/crypto/x11/x86_aesni/echo.cppsrc/crypto/x11/sph_shavite.hsrc/crypto/x11/echo.cppsrc/crypto/x11/util/util.hppsrc/crypto/x11/ssse3/echo.cppsrc/crypto/x11/shavite.cppsrc/crypto/x11/util/consts_aes.hppsrc/crypto/x11/sph_types.h
src/bench/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
Performance benchmarks should be placed in src/bench/ and use nanobench
Files:
src/bench/pow_hash.cppsrc/bench/bench_bitcoin.cpp
src/{test,wallet/test,qt/test}/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
Unit tests for C++ code should be placed in src/test/, src/wallet/test/, or src/qt/test/ and use Boost::Test or Qt 5 for GUI tests
Files:
src/test/util/setup_common.cpp
🧠 Learnings (1)
📚 Learning: 2025-01-06T09:51:03.167Z
Learnt from: kwvg
PR: dashpay/dash#6516
File: depends/patches/gmp/include_ldflags_in_configure.patch:557-621
Timestamp: 2025-01-06T09:51:03.167Z
Learning: The `GMP_GCC_ARM_UMODSI` macro checks only the compiler version, and `GMP_GCC_MIPS_O32` relies on the `-mabi=32` flag. Therefore, `$LDFLAGS` is irrelevant to these tests.
Applied to files:
configure.ac
🧬 Code graph analysis (17)
src/crypto/x11/dispatch.cpp (3)
src/crypto/x11/echo.cpp (4)
FullStateRound(123-145)FullStateRound(123-123)ShiftAndMix(147-154)ShiftAndMix(147-147)src/crypto/x11/shavite.cpp (2)
Compress(106-246)Compress(106-106)src/compat/cpuid.h (1)
GetCPUID(16-23)
src/crypto/x11/x86_aesni/shavite.cpp (2)
src/crypto/x11/arm_crypto/shavite.cpp (2)
CompressElement(16-30)CompressElement(16-17)src/crypto/x11/shavite.cpp (2)
CompressElement(72-99)CompressElement(72-73)
src/crypto/x11/arm_crypto/echo.cpp (1)
src/crypto/x11/x86_aesni/echo.cpp (2)
StateRound(15-30)StateRound(15-15)
src/bench/bench_bitcoin.cpp (1)
src/crypto/x11/dispatch.cpp (2)
SapphireAutoDetect(100-180)SapphireAutoDetect(100-100)
src/crypto/x11/dispatch.h (1)
src/crypto/x11/dispatch.cpp (2)
SapphireAutoDetect(100-180)SapphireAutoDetect(100-100)
src/crypto/x11/arm_crypto/shavite.cpp (2)
src/crypto/x11/shavite.cpp (2)
CompressElement(72-99)CompressElement(72-73)src/crypto/x11/x86_aesni/shavite.cpp (2)
CompressElement(17-30)CompressElement(17-18)
src/init/common.cpp (1)
src/crypto/x11/dispatch.cpp (2)
SapphireAutoDetect(100-180)SapphireAutoDetect(100-100)
src/test/util/setup_common.cpp (1)
src/crypto/x11/dispatch.cpp (2)
SapphireAutoDetect(100-180)SapphireAutoDetect(100-100)
src/crypto/x11/arm_neon/echo.cpp (1)
src/crypto/x11/util/util.hpp (4)
Xor(34-34)Xor(34-34)Xor(67-67)Xor(67-67)
src/crypto/x11/sph_echo.h (1)
src/crypto/x11/echo.cpp (6)
sph_echo512_init(327-331)sph_echo512_init(328-328)sph_echo512(334-338)sph_echo512(335-335)sph_echo512_close(341-345)sph_echo512_close(342-342)
src/crypto/x11/aes.h (1)
src/crypto/x11/dispatch.h (1)
sapphire(12-19)
src/crypto/x11/x86_aesni/echo.cpp (1)
src/crypto/x11/arm_crypto/echo.cpp (2)
StateRound(14-29)StateRound(14-14)
src/crypto/x11/sph_shavite.h (1)
src/crypto/x11/shavite.cpp (6)
sph_shavite512_init(342-346)sph_shavite512_init(343-343)sph_shavite512(349-353)sph_shavite512(350-350)sph_shavite512_close(356-361)sph_shavite512_close(357-357)
src/crypto/x11/echo.cpp (4)
src/crypto/x11/ssse3/echo.cpp (4)
MixColumn(28-51)MixColumn(28-28)ShiftAndMix(83-128)ShiftAndMix(83-83)src/crypto/x11/aes.cpp (4)
Round(12-21)Round(12-14)RoundKeyless(23-27)RoundKeyless(23-24)src/crypto/x11/dispatch.cpp (6)
FullStateRound(43-43)FullStateRound(64-64)FullStateRound(73-73)ShiftAndMix(52-52)ShiftAndMix(58-58)ShiftAndMix(74-74)src/crypto/x11/x86_aesni/echo.cpp (2)
FullStateRound(34-68)FullStateRound(34-34)
src/crypto/x11/ssse3/echo.cpp (2)
src/crypto/x11/util/util.hpp (4)
Xor(34-34)Xor(34-34)Xor(67-67)Xor(67-67)src/crypto/x11/echo.cpp (2)
MixColumn(56-77)MixColumn(56-56)
src/crypto/x11/shavite.cpp (2)
src/crypto/x11/aes.cpp (2)
RoundKeyless(23-27)RoundKeyless(23-24)src/crypto/x11/sph_types.h (1)
sph_enc32le(605-633)
src/crypto/x11/util/consts_aes.hpp (1)
src/crypto/x11/util/util.hpp (6)
pack_le(24-30)pack_le(24-24)pack_le(36-39)pack_le(36-36)pack_le(72-75)pack_le(72-72)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: linux64_tsan-test / Test source
- GitHub Check: linux64_multiprocess-build / Build source
🔇 Additional comments (44)
src/crypto/x11/sph_types.h (1)
485-486: Attribute placement is valid — UBSan CI uses Clangci/test/00_setup_env_native_ubsan.sh sets BITCOIN_CONFIG with --with-sanitizers=undefined and CC=clang-18 CXX=clang++-18 (line 14), so UBSan jobs in CI use Clang; no GCC-based UBSan job was found in the inspected CI scripts.
src/bench/pow_hash.cpp (1)
31-31: Good addition for benchmark stability.The consistent addition of
minEpochIterations(20)across all benchmark functions is a solid improvement for measurement stability. This ensures each benchmark runs for at least 20 epochs before collecting results, which helps reduce variance from startup effects and provides more reliable measurements.Also applies to: 41-41, 53-53, 65-65, 77-77, 89-89, 101-101, 113-113, 125-125, 137-137, 149-149, 161-161
src/test/util/setup_common.cpp (1)
16-16: LGTM! Consistent initialization of Sapphire dispatch.The addition of
SapphireAutoDetect()beforeSHA256AutoDetect()properly initializes the X11 crypto dispatch mechanism for tests. The ordering is appropriate as it mirrors the production initialization sequence.Also applies to: 182-182
src/init/common.cpp (1)
12-12: LGTM! Proper initialization order for crypto dispatchers.The addition of
SapphireAutoDetect()at the beginning ofSetGlobals()ensures X11 crypto optimizations are detected and initialized early in the startup sequence, before SHA256 detection. This is the correct initialization order.Also applies to: 30-30
src/bench/bench_bitcoin.cpp (1)
9-9: LGTM! Consistent initialization across all entry points.The addition of
SapphireAutoDetect()beforeSHA256AutoDetect()ensures benchmarks use the same optimized crypto implementations as production code. This is essential for accurate performance measurements.Also applies to: 65-65
src/crypto/x11/sph_shavite.h (2)
1-117: Extern "C" linkage removed — verify ABI compatibilityThe header src/crypto/x11/sph_shavite.h no longer wraps declarations with extern "C"; this introduces C++ name mangling and can break any C consumers or public-API users.
- Confirm no C consumers: grep -R --include='*.c' -n 'sph_shavite.h' .
- Confirm no references to these symbols anywhere (repo or installed/public headers): rg -n 'sph_shavite512|sph_shavite512_init|sph_shavite512_close|sph_shavite512_addbits_and_close' . || true
- If any C or external/public users exist, re-add the C linkage wrapper:
#ifdef __cplusplus
extern "C" {
#endif
...declarations...
#ifdef __cplusplus
}
#endif
78-78: Type-safety change accepted — callers not verifiedChange from void* to sph_shavite512_context* is correct. The provided rg produced no output; unable to confirm all call sites were updated. Re-run these checks locally and paste results:
rg -n -C3 -uu -P "sph_shavite512_(init|close|addbits_and_close)\s*\(" rg -n -uu "sph_shavite512_" rg -n -uu "sph_shavite512_context"src/crypto/x11/arm_crypto/echo.cpp (1)
1-72: Vendored crypto implementation - no changes allowedPer the coding guidelines, files in
src/crypto/x11/**are vendored dependencies and should not be modified. This restriction applies to all crypto implementations in this directory.src/crypto/x11/aes.cpp (1)
1-30: Vendored crypto implementation - no changes allowedPer the coding guidelines, files in
src/crypto/x11/**are vendored dependencies and should not be modified.src/crypto/x11/aes.h (1)
1-21: Vendored crypto implementation - no changes allowedPer the coding guidelines, files in
src/crypto/x11/**are vendored dependencies and should not be modified.src/crypto/x11/x86_aesni/shavite.cpp (1)
1-186: Vendored crypto implementation - no changes allowedPer the coding guidelines, files in
src/crypto/x11/**are vendored dependencies and should not be modified.src/crypto/x11/x86_aesni/echo.cpp (1)
1-73: Vendored crypto implementation - no changes allowedPer the coding guidelines, files in
src/crypto/x11/**are vendored dependencies and should not be modified.src/crypto/x11/dispatch.cpp (5)
1-2: Copyright year inconsistency flaggedThe copyright year states 2025, but based on the PR summary (created 2025-09-21) and current context (September 2025), this appears to be accurate. However, I want to verify this is intentional.
85-90: Good error handling for sysctlbynameThe implementation correctly checks for sysctlbyname success and verifies the value is non-zero, providing proper feature detection on macOS.
100-180: Well-structured platform detection and dispatch initializationThe auto-detection function properly initializes with soft implementations as defaults, then conditionally upgrades to optimized versions based on runtime feature detection. The platform-specific detection code is properly guarded with appropriate preprocessor directives.
132-133: Good coverage for macOS NEON detection quirksThe code properly handles the macOS sysctl naming inconsistency for NEON/AdvSIMD detection, including the less common "hw.optional.arm.AdvSIMD" key. The reference to the cpu_features issue #390 provides valuable context.
96-99: Incorrect — definitions exist for these externs; do not change to staticDefinitions found: src/crypto/x11/echo.cpp (lines 158–159) define echo_round and echo_shift_mix; src/crypto/x11/shavite.cpp (line 250) defines shavite_c512. Leave the externs as-is.
Likely an incorrect or invalid review comment.
src/crypto/x11/arm_neon/echo.cpp (2)
15-26: Correct GF(2^8) multiplication implementationThe implementation correctly performs multiplication by x (i.e., multiplication by 2) in GF(2^8) with the reduction polynomial x^8 + x^4 + x^3 + x + 1 (represented by 0x1b). The bit manipulation and conditional XOR are properly implemented.
28-51: Well-optimized MixColumn operationThe MixColumn implementation efficiently uses NEON intrinsics and follows the correct AES MixColumns-like transformation pattern. The operations are properly sequenced and the XOR operations are correctly nested.
src/crypto/x11/ssse3/echo.cpp (2)
15-26: Efficient GF(2^8) multiplication using SSEGood use of
_mm_add_epi8for left shift (doubling) instead of a shift instruction, which is efficient for byte-wise operations in SSE.
86-127: Resolved — W is 16‑byte aligned where allocated.The state array is declared with alignas(16) (DECL_STATE_BIG → src/crypto/x11/echo.cpp:162), so uses of _mm_load_si128/_mm_store_si128 are safe.
src/crypto/x11/dispatch.h (1)
14-17: Function pointer typedefs are well-definedThe dispatch function pointer types properly match the expected signatures for the Echo and Shavite operations. Good use of typedefs for clarity.
src/crypto/x11/arm_crypto/shavite.cpp (4)
42-55: Proper endianness handlingThe code correctly handles both little-endian and big-endian systems, with appropriate byte-order conversion when needed.
58-119: Complex key schedule generation with proper state injectionThe key schedule generation correctly implements the Shavite-512 specification with appropriate counter injections at rounds 32, 164, 316, and 440. The logic matches the reference implementation.
149-155: Macro usage for WROT is appropriate hereWhile macros are generally discouraged in modern C++, the WROT macro is well-scoped (defined and undefined within the function) and provides clear intent for the rotation operation.
121-123: Verify alignment of vld1q_u32 operationThe
vld1q_u32operation requires proper alignment. Sincerkis declared with alignas(16), this should be fine, but verify that the indexing maintains alignment.The code looks correct since
rkis 16-byte aligned and we're loading at 4-word (16-byte) boundaries.src/crypto/x11/shavite.cpp (2)
250-251: Nice: clear dispatch hook for software backendBinding shavite_c512 to sapphire::soft_shavite::Compress cleanly separates dispatch from the implementation.
299-339: UBSan alignment: write path uses sph_enc32le; confirm alignment suppression matches SPH_UPTR policyGiven the PR notes about UBSan alignment, this close path writes into a byte buffer via sph_enc32le. That’s fine, but please confirm the build toggles (SPH_UPTR/SPH_UNALIGNED/UBSan suppress) are consistently defined so sanitizer builds won’t flag false positives on platforms that allow unaligned accesses.
src/crypto/x11/util/consts_aes.hpp (1)
48-64: Good: constexpr S-box generation with affine transformComputing the S-box at compile time and validating with a static_assert improves trust and portability.
configure.ac (2)
381-394: SPHLIB_FLAGS introduced; verify it’s actually used for x11/sph sourcesYou set SPHLIB_FLAGS (e.g., -O3) and export it via AC_SUBST, but ensure src/Makefile.am consumes it for the relevant objects; otherwise it’s dead.
Also applies to: 1957-1969
1896-1906: AM_CONDITIONALs exported — ensure Makefile.am uses them to split per-ISA objectsThese conditionals are defined; verify the per-backend objects/libraries are actually wrapped in the corresponding if ENABLE_* blocks so we don’t compile incompatible TU sets by default.
src/crypto/x11/echo.cpp (4)
123-145: Round/key schedule update logic looks correctTwo AES-based rounds per state element with T32 counter increments per block aligns with ECHO’s spec. Good use of soft_aes::Round/Keyless.
147-155: State mixing is clear; alignment handledShiftRows + MixColumn on alignas(16) W is straightforward, and the dispatch indirection via echo_round/echo_shift_mix is clean.
Also applies to: 161-178
164-173: Sanity: FINAL_BIG uses sph_dec64le_aligned on sc->buf — confirm SPH_ macros guard unaligned reads*This mirrors SPH’s original pattern. Given the PR’s UBSan notes, please confirm sph_dec64le_aligned resolves to a safe path on platforms without guaranteed alignment.
Also applies to: 180-189
328-352: API refactor to typed context is consistent with headerThe updated definitions match sph_echo.h’s typed signatures and avoid void*.
src/crypto/x11/sph_echo.h (1)
79-100: ABI change: C linkage removed and signatures tightened — confirm external C consumerssph_echo512 is defined in src/crypto/x11/echo.cpp and used only from C++ in this repo (src/hash_x11.h, src/bench/pow_hash.cpp); no .c callers found. If this header is part of your public C ABI, restore extern "C" wrappers or document the ABI break in the release notes.
src/crypto/x11/util/util.hpp (3)
24-30: LGTM! Portable byte packing utility is well-implemented.The constexpr
pack_lefunction correctly implements little-endian packing with appropriate type casts to avoid implicit conversion warnings.
50-62: Efficient ARM AES round implementation.The implementation correctly emulates x86 AESENC using ARM intrinsics as described in the referenced blog post. The no-key variant (
aes_round_nk) appropriately optimizes by skipping the XOR operation.
72-75: Ordering verified — no change required.
_mm_set_epi32(w3, w2, w1, w0) produces 32-bit lanes 0..3 = w0..w3; ARM's uint32x4_t{w0,w1,w2,w3} and the corresponding unpack (extract/lane 0..3) match, so behaviour is consistent across implementations.src/Makefile.am (5)
747-749: Correct usage of SPHLIB_FLAGS for small footprint compilation.The SPH crypto library is correctly configured with SPHLIB_FLAGS and small footprint options for CubeHash and JH, which is appropriate for resource-constrained environments.
786-794: Well-structured platform-specific crypto backends.The new crypto backend libraries are properly configured with:
- Correct static linking flags
- Appropriate platform-specific compiler flags (ARM_AES, ARM_NEON, SSSE3, X86_AESNI)
- Proper preprocessor definitions for feature detection
The separation of concerns with distinct libraries per instruction set is clean and maintainable.
Also applies to: 797-804, 807-814, 817-824
1090-1094: DISABLE_OPTIMIZED_SHA256 correctly applied to libdashconsensus.The consensus library appropriately disables optimizations to ensure deterministic behavior across platforms, which is critical for consensus code.
754-782: Resolve: X11 crypto sources converted to .cpp — no leftover .c references found.Search shows aes.cpp, echo.cpp, shavite.cpp under src/crypto/x11 (and arch-specific subdirs); the only regex hits were echo.capnp files (unrelated).
78-105: Verify conditional compilation hierarchy for crypto extensions.configure.ac declares independent AM_CONDITIONALs for ENABLE_X86_AESNI and ENABLE_X86_SHANI (and matching src/crypto/x11 modules), but src/Makefile.am nests AESNI/SHANI inside the ENABLE_SSE41 block — this mismatch can prevent AESNI/SHANI from being built when configure enables them but SSE4.1 is disabled. Align intent: either lift ENABLE_X86_AESNI/ENABLE_X86_SHANI to top-level conditionals in Makefile.am or make configure.ac only enable them when SSE4.1 is present.
Location: src/Makefile.am (lines 78–105); configure.ac AM_CONDITIONALs.
| alignas(16) uint8x16_t rk_words[448/4]; | ||
| alignas(16) uint32_t rk[448]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large stack allocations may cause issues
The function allocates ~7KB on the stack (448 * 16 bytes for rk_words + 448 * 4 bytes for rk). This could cause stack overflow on embedded systems or threads with limited stack space.
Consider using heap allocation or reducing the stack footprint:
- alignas(16) uint8x16_t rk_words[448/4];
- alignas(16) uint32_t rk[448];
+ // Consider using std::vector or std::unique_ptr for large allocations
+ alignas(16) std::array<uint8x16_t, 448/4> rk_words;
+ alignas(16) std::array<uint32_t, 448> rk;Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/crypto/x11/arm_crypto/shavite.cpp around lines 39-40, the two large stack
arrays (alignas(16) uint8x16_t rk_words[448/4] and alignas(16) uint32_t rk[448])
allocate roughly 7KB on the stack which can overflow constrained stacks; change
them to heap or static storage and preserve alignment: allocate aligned memory
on the heap (e.g., posix_memalign/aligned_alloc or aligned new /
std::aligned_alloc wrapped in RAII/unique_ptr) or make them function-static if
thread-safety/recency allows, and ensure you free or use RAII to avoid leaks;
update usages to reference the allocated buffers and handle allocation failure
paths.
| #include <crypto/x11/aes.h> | ||
| #include <crypto/x11/dispatch.h> | ||
|
|
||
| #include "sph_echo.h" | ||
| #include <attributes.h> | ||
|
|
||
| #ifdef __cplusplus | ||
| extern "C"{ | ||
| #endif | ||
| #include <cstddef> | ||
| #include <cstring> | ||
|
|
||
| #if SPH_SMALL_FOOTPRINT && !defined SPH_SMALL_FOOTPRINT_ECHO | ||
| #define SPH_SMALL_FOOTPRINT_ECHO 1 | ||
| #endif | ||
| #include "sph_echo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Policy concern: modifying vendored x11 code
Per coding guidelines, src/crypto/x11 is treated as vendored. This file significantly refactors that code. If this is an intentional fork, please document the divergence and ensure future upstream sync isn’t expected, or move wrappers/dispatch into a non-vendored path while keeping upstream files pristine.
Also applies to: 53-57
🤖 Prompt for AI Agents
In src/crypto/x11/echo.cpp around lines 33-41 (and similarly for changes at
lines 53-57), you refactored code inside the vendored src/crypto/x11 tree;
either treat this as an intentional fork and document the divergence or move
your changes out of the vendored path into a non-vendored wrapper/dispatch
layer. Fix by restoring the original upstream echo.cpp in src/crypto/x11, create
a new file under a non-vendored directory (e.g., src/crypto/x11_wrappers/ or
src/crypto/dispatch/) that implements your refactor/dispatch, update build files
to prefer the wrapper, and add a short note in the repo docs (README or
CONTRIBUTORS) describing the fork/divergence and maintenance expectations if you
intend to keep the vendored file modified.
| #if defined(__clang__) | ||
| #define SPH_NO_SANITIZE(...) __attribute__((no_sanitize(__VA_ARGS__))) | ||
| #else | ||
| #define SPH_NO_SANITIZE(...) | ||
| #endif // __clang__ | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not patch vendored x11 headers; handle UBSan at build level instead.
This lives under src/crypto/x11 (vendored). Per guidelines, we should not modify vendored code. Please revert these additions and suppress alignment UBSan outside this header.
Safer alternatives:
- Preferred: compile only the SPH/X11 TUs with -fno-sanitize=alignment when building with UBSan.
- Or: in sanitized builds, force -DSPH_UNALIGNED=0 for those TUs to take the bytewise paths (slower but sanitizer-clean).
If this change is nevertheless kept (not recommended here), at minimum: guard with #if __has_attribute(no_sanitize) and #ifndef SPH_NO_SANITIZE to avoid overriding an externally provided definition and to cover older Clang variants.
🤖 Prompt for AI Agents
In src/crypto/x11/sph_types.h around lines 330 to 335, revert the added
SPH_NO_SANITIZE macro (do not modify vendored x11 headers) and instead handle
UBSan at build time: either compile the SPH/X11 translation units with
-fno-sanitize=alignment when UBSan is enabled, or for sanitized builds add a
TU-level define -DSPH_UNALIGNED=0 to force bytewise (unaligned-safe) code paths;
if you must keep a macro in-source (not recommended), ensure you wrap it with
checks for __has_attribute(no_sanitize) and guard with #ifndef SPH_NO_SANITIZE
so you don’t override external definitions.
| crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static | ||
| crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS = $(AM_CPPFLAGS) | ||
| crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS += $(X86_AESNI_CXXFLAGS) | ||
| crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS += -DENABLE_SSE41 -DENABLE_X86_AESNI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
🧩 Analysis chain
Compound preprocessor definition needs attention.
The X86 AES-NI module defines both ENABLE_SSE41 and ENABLE_X86_AESNI, indicating a dependency on SSE4.1 for the AES-NI implementation. This is architecturally correct as AES-NI requires SSE4.1, but ensure this dependency is documented.
🏁 Script executed:
#!/bin/bash
# Verify SSE4.1 is properly checked before enabling X86 AES-NI in configure
echo "Checking configure.ac for SSE4.1/AES-NI dependency..."
rg -A10 -B10 "X86_AESNI" configure.ac | grep -E "(SSE41|SSE4\.1)"Length of output: 227
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Search repo for SSE4/AES-NI mentions in docs (README, INSTALL, docs/, doc/, CHANGELOG, NEWS):"
rg -n -S -i 'sse4|sse4\.1|sse41|aes-?ni|aesni' README* INSTALL* docs/** doc/** CHANGELOG* NEWS* || true
echo
echo "Search whole repo (excluding common noise dirs) for SSE4/AES-NI:"
rg -n -S -i 'sse4|sse4\.1|sse41|aes-?ni|aesni' --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' || true
echo
echo "Show configure.ac context for SSE41/X86_AESNI:"
rg -n -C3 -S -i 'x86_aesni|aesni|aes-?ni|sse4|sse4\.1|sse41' configure.ac || true
echo
echo "Show src/Makefile.am lines 800-840:"
sed -n '800,840p' src/Makefile.am || trueLength of output: 7068
Document SSE4.1 dependency for AES-NI support
Configure.ac correctly checks for SSE4.1 intrinsics before enabling AES-NI (AX_CHECK_COMPILE_FLAG for -msse4.1 then AES-NI check with -msse4.1 -maes), but I found no mention of this requirement in INSTALL or README. Add a note that building with AES-NI support requires SSE4.1 (compiler and hardware support).
🤖 Prompt for AI Agents
In src/Makefile.am around line 821, the build enables AES-NI but Configure.ac
only enables that when SSE4.1 intrinsics are available; update project
documentation to state that building with AES-NI support requires SSE4.1 (both
compiler support and CPU hardware), and show the relevant compiler flag and
configure behavior. Edit INSTALL and README to add a short note that AES-NI
support requires SSE4.1 (e.g., compilers must accept -msse4.1 and CPUs must
expose SSE4.1), mention that the autoconf checks will pass -msse4.1 during
detection and that builds will fall back to non-AES-NI if SSE4.1 is absent, and
include guidance on forcing or disabling AES-NI via configure flags or
environment variables.
Also lets us get rid of an extra unnecessary pack/unpack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/crypto/x11/x86_aesni/echo.cpp (2)
15-30: Consider loop unrolling for better performance.The StateRound helper function is well-implemented. However, since you're already unrolling the loop in FullStateRound (lines 36-67), you might consider using compiler pragmas or attributes to further optimize the AES operations. Benchmark the current implementation against a version with vectorization hints to confirm whether additional optimization is beneficial.
34-68: Consider reducing code duplication with a macro or template.The unrolled loop pattern (pack_le → StateRound) is repeated 16 times. While manual unrolling can improve performance, it makes the code harder to maintain.
Consider using a macro or constexpr loop to reduce duplication while maintaining performance:
void FullStateRound(uint64_t W[16][2], uint32_t& k0, uint32_t& k1, uint32_t& k2, uint32_t& k3) { - __m128i key = util::pack_le(k0, k1, k2, k3); - StateRound(W, 0, key, k0, k1, k2, k3); - key = util::pack_le(k0, k1, k2, k3); - StateRound(W, 1, key, k0, k1, k2, k3); - // ... repeated 14 more times + #define PROCESS_ROUND(idx) \ + do { \ + __m128i key = util::pack_le(k0, k1, k2, k3); \ + StateRound(W, idx, key, k0, k1, k2, k3); \ + } while(0) + + PROCESS_ROUND(0); + PROCESS_ROUND(1); + PROCESS_ROUND(2); + // ... etc + #undef PROCESS_ROUND }src/crypto/x11/dispatch.cpp (1)
148-159: Verify FreeBSD elf_aux_info error handling pattern.The FreeBSD code correctly checks the return value of elf_aux_info. However, the pattern differs slightly between arm and aarch64 - the arm version combines the check in one line while aarch64 splits it. Both are correct but consider using consistent style.
Consider using the same pattern for both architectures for consistency:
#if defined(__arm__) - have_arm_aes = ((::elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2)) == 0) && ((hwcap2 & HWCAP2_AES) != 0)); - have_arm_neon = ((::elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap)) == 0) && ((hwcap & HWCAP_NEON) != 0)); + if (::elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2)) == 0) { + have_arm_aes = ((hwcap2 & HWCAP2_AES) != 0); + } + if (::elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap)) == 0) { + have_arm_neon = ((hwcap & HWCAP_NEON) != 0); + } #endif // __arm__src/crypto/x11/arm_crypto/shavite.cpp (1)
110-119: Potential off-by-one in loop termination check.The inner loop has a redundant break condition at line 116-117 that will never be reached since the outer loop already breaks when u == 448 at line 109.
The inner break is unreachable. Consider removing it:
for (int s = 0; s < 8; s++) { rk[u + 0] = rk[u - 32] ^ rk[u - 7]; rk[u + 1] = rk[u - 31] ^ rk[u - 6]; rk[u + 2] = rk[u - 30] ^ rk[u - 5]; rk[u + 3] = rk[u - 29] ^ rk[u - 4]; u += 4; - if (u == 448) - break; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
configure.ac(7 hunks)src/Makefile.am(4 hunks)src/bench/bench_bitcoin.cpp(2 hunks)src/crypto/x11/aes.cpp(1 hunks)src/crypto/x11/aes.h(1 hunks)src/crypto/x11/aes_helper.c(0 hunks)src/crypto/x11/arm_crypto/echo.cpp(1 hunks)src/crypto/x11/arm_crypto/shavite.cpp(1 hunks)src/crypto/x11/arm_neon/echo.cpp(1 hunks)src/crypto/x11/dispatch.cpp(1 hunks)src/crypto/x11/dispatch.h(1 hunks)src/crypto/x11/echo.cpp(4 hunks)src/crypto/x11/shavite.c(0 hunks)src/crypto/x11/shavite.cpp(1 hunks)src/crypto/x11/sph_echo.h(4 hunks)src/crypto/x11/sph_shavite.h(4 hunks)src/crypto/x11/ssse3/echo.cpp(1 hunks)src/crypto/x11/util/consts_aes.hpp(1 hunks)src/crypto/x11/util/util.hpp(1 hunks)src/crypto/x11/x86_aesni/echo.cpp(1 hunks)src/crypto/x11/x86_aesni/shavite.cpp(1 hunks)src/init/common.cpp(2 hunks)src/test/util/setup_common.cpp(2 hunks)test/lint/lint-whitespace.py(1 hunks)
💤 Files with no reviewable changes (2)
- src/crypto/x11/shavite.c
- src/crypto/x11/aes_helper.c
🚧 Files skipped from review as they are similar to previous changes (12)
- src/test/util/setup_common.cpp
- src/bench/bench_bitcoin.cpp
- src/init/common.cpp
- src/crypto/x11/aes.h
- src/crypto/x11/x86_aesni/shavite.cpp
- src/crypto/x11/util/consts_aes.hpp
- src/crypto/x11/aes.cpp
- src/crypto/x11/arm_neon/echo.cpp
- src/crypto/x11/arm_crypto/echo.cpp
- src/crypto/x11/dispatch.h
- src/crypto/x11/sph_shavite.h
- test/lint/lint-whitespace.py
🧰 Additional context used
📓 Path-based instructions (2)
src/crypto/{ctaes,x11}/**
📄 CodeRabbit inference engine (CLAUDE.md)
Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11
Files:
src/crypto/x11/arm_crypto/shavite.cppsrc/crypto/x11/x86_aesni/echo.cppsrc/crypto/x11/echo.cppsrc/crypto/x11/util/util.hppsrc/crypto/x11/dispatch.cppsrc/crypto/x11/shavite.cppsrc/crypto/x11/sph_echo.hsrc/crypto/x11/ssse3/echo.cpp
src/**/*.{cpp,h,cc,cxx,hpp}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction
Files:
src/crypto/x11/arm_crypto/shavite.cppsrc/crypto/x11/x86_aesni/echo.cppsrc/crypto/x11/echo.cppsrc/crypto/x11/util/util.hppsrc/crypto/x11/dispatch.cppsrc/crypto/x11/shavite.cppsrc/crypto/x11/sph_echo.hsrc/crypto/x11/ssse3/echo.cpp
🧠 Learnings (2)
📚 Learning: 2025-07-20T18:42:49.794Z
Learnt from: CR
PR: dashpay/dash#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/crypto/{ctaes,x11}/** : Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11
Applied to files:
src/crypto/x11/echo.cpp
📚 Learning: 2025-01-06T09:51:03.167Z
Learnt from: kwvg
PR: dashpay/dash#6516
File: depends/patches/gmp/include_ldflags_in_configure.patch:557-621
Timestamp: 2025-01-06T09:51:03.167Z
Learning: The `GMP_GCC_ARM_UMODSI` macro checks only the compiler version, and `GMP_GCC_MIPS_O32` relies on the `-mabi=32` flag. Therefore, `$LDFLAGS` is irrelevant to these tests.
Applied to files:
configure.ac
🧬 Code graph analysis (7)
src/crypto/x11/arm_crypto/shavite.cpp (2)
src/crypto/x11/shavite.cpp (2)
CompressElement(72-99)CompressElement(72-73)src/crypto/x11/x86_aesni/shavite.cpp (2)
CompressElement(17-30)CompressElement(17-18)
src/crypto/x11/x86_aesni/echo.cpp (1)
src/crypto/x11/arm_crypto/echo.cpp (2)
StateRound(14-29)StateRound(14-14)
src/crypto/x11/echo.cpp (4)
src/crypto/x11/ssse3/echo.cpp (4)
MixColumn(28-51)MixColumn(28-28)ShiftAndMix(83-128)ShiftAndMix(83-83)src/crypto/x11/aes.cpp (4)
Round(12-21)Round(12-14)RoundKeyless(23-27)RoundKeyless(23-24)src/crypto/x11/dispatch.cpp (6)
FullStateRound(43-43)FullStateRound(64-64)FullStateRound(73-73)ShiftAndMix(52-52)ShiftAndMix(58-58)ShiftAndMix(74-74)src/crypto/x11/x86_aesni/echo.cpp (2)
FullStateRound(34-68)FullStateRound(34-34)
src/crypto/x11/dispatch.cpp (3)
src/crypto/x11/echo.cpp (4)
FullStateRound(123-145)FullStateRound(123-123)ShiftAndMix(147-154)ShiftAndMix(147-147)src/crypto/x11/shavite.cpp (2)
Compress(106-246)Compress(106-106)src/compat/cpuid.h (1)
GetCPUID(16-23)
src/crypto/x11/shavite.cpp (3)
src/crypto/x11/aes.cpp (2)
RoundKeyless(23-27)RoundKeyless(23-24)src/crypto/x11/arm_crypto/shavite.cpp (2)
Compress(34-181)Compress(34-34)src/crypto/x11/sph_types.h (1)
sph_enc32le(605-633)
src/crypto/x11/sph_echo.h (1)
src/crypto/x11/echo.cpp (6)
sph_echo512_init(327-331)sph_echo512_init(328-328)sph_echo512(334-338)sph_echo512(335-335)sph_echo512_close(341-345)sph_echo512_close(342-342)
src/crypto/x11/ssse3/echo.cpp (2)
src/crypto/x11/util/util.hpp (4)
Xor(34-34)Xor(34-34)Xor(67-67)Xor(67-67)src/crypto/x11/echo.cpp (2)
MixColumn(56-77)MixColumn(56-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: x86_64-pc-linux-gnu_multiprocess / Build depends
- GitHub Check: x86_64-apple-darwin / Build depends
- GitHub Check: x86_64-w64-mingw32 / Build depends
- GitHub Check: Lint / Run linters
- GitHub Check: x86_64-pc-linux-gnu_nowallet / Build depends
- GitHub Check: x86_64-pc-linux-gnu / Build depends
- GitHub Check: arm-linux-gnueabihf / Build depends
🔇 Additional comments (26)
src/crypto/x11/x86_aesni/echo.cpp (1)
1-12: LGTM! Well-structured X86 AES-NI implementation.The file header, includes, and guard conditions are properly set up for the X86 AES-NI backend.
src/crypto/x11/util/util.hpp (4)
66-66: [Past review issue still present] Fix missing parentheses in preprocessor condition.The condition is missing parentheses around the && expression, which could lead to unexpected evaluation.
-#if defined(ENABLE_SSSE3) || defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI) +#if defined(ENABLE_SSSE3) || (defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI))
13-20: LGTM! Proper platform-specific header inclusion.The conditional inclusion of ARM and x86 headers is well-structured with appropriate feature guards.
24-30: Well-implemented generic byte packing utility.The constexpr pack_le function provides a clean, portable implementation for little-endian byte packing that can be evaluated at compile time when possible.
50-63: Excellent ARM AES intrinsics implementation.The ARM-specific AES round functions correctly follow the emulation pattern for x86 AES intrinsics on ARMv8-A as documented in the referenced blog post. The separate
aes_round_nkoptimization for keyless rounds is a nice touch.configure.ac (4)
538-547: LGTM! Feature flags properly initialized.The new per-feature enable flags are correctly initialized to "no" as a safe default before probing.
718-737: Well-structured ARM NEON detection loop.The loop correctly tries multiple compiler flag combinations to detect NEON support on both ARMv8 and ARMv7 platforms, which improves compatibility across different ARM architectures.
1896-1905: Proper exposure of feature flags via AM_CONDITIONAL.All the new architecture-specific feature flags are correctly exposed for use in Makefile.am, maintaining consistency with the existing pattern.
558-560: [Past review issue still present] Syntax bug: missing comma in AX_CHECK_COMPILE_FLAG for AES-NIThere's a missing comma before the third argument which will break autoconf parsing.
Apply this diff:
-AX_CHECK_COMPILE_FLAG([-msse4.1 -maes], [X86_AESNI_CXXFLAGS="-msse4.1 -maes"], [], [$CXXFLAG_WERROR]) +AX_CHECK_COMPILE_FLAG([-msse4.1 -maes], [X86_AESNI_CXXFLAGS="-msse4.1 -maes"], [], [$CXXFLAG_WERROR])src/crypto/x11/dispatch.cpp (3)
85-91: LGTM! Clean platform-specific helper for macOS.The IsSysCtlNonZero helper provides a clean abstraction for checking macOS system control values.
100-124: LGTM! Well-structured runtime dispatch with proper fallback chain.The SapphireAutoDetect function correctly:
- Initializes with safe software fallbacks
- Performs runtime CPU detection
- Progressively upgrades to optimized implementations when available
- Uses proper feature detection through CPUID
The dispatch order is correct: more specific implementations (AES-NI) override before less specific ones (SSSE3).
131-133: Good coverage of macOS NEON detection keys.Including multiple sysctl keys (hw.optional.AdvSIMD variations) ensures compatibility across different macOS versions and architectures as noted in the cpu_features issue reference.
src/crypto/x11/arm_crypto/shavite.cpp (4)
16-30: LGTM! Clean ARM NEON implementation of CompressElement.The function correctly implements the compression element using ARM NEON intrinsics, properly handling the AES rounds and result extraction.
65-69: Good use of SIMD for AES round computation.The code efficiently uses ARM NEON intrinsics to perform AES rounds and extract results. The pattern is consistent and well-structured.
Also applies to: 89-93
149-162: LGTM! Clean macro usage for state rotation.The WROT macro is properly scoped and undefined after use, following best practices for macro hygiene.
39-40: [Past review issue still present] Large stack allocations may cause issues.The function allocates ~7KB on the stack which could cause stack overflow on embedded systems or threads with limited stack space.
Consider using heap allocation or reducing the stack footprint. Here's an approach using std::unique_ptr with aligned allocation:
- alignas(16) uint8x16_t rk_words[448/4]; - alignas(16) uint32_t rk[448]; + // Use aligned heap allocation to avoid stack overflow + struct AlignedArrays { + alignas(16) uint8x16_t rk_words[448/4]; + alignas(16) uint32_t rk[448]; + }; + auto arrays = std::make_unique<AlignedArrays>(); + auto& rk_words = arrays->rk_words; + auto& rk = arrays->rk;src/crypto/x11/ssse3/echo.cpp (1)
1-133: Implementation looks correct and follows best practices.This SSSE3-optimized implementation of the Echo algorithm correctly:
- Guards compilation with
ENABLE_SSSE3check- Uses proper alignment directives for SIMD operations
- Implements the same algorithmic structure as the reference implementation
- Uses appropriate SIMD intrinsics for GF(8) multiplication and matrix operations
The GF(8) multiplication implementation matches the reference code's logic:
((x & 0x80) >> 7) * 27 ^ ((x & 0x7F) << 1).src/crypto/x11/sph_echo.h (1)
79-116: API improvements enhance type safety.The changes from
void*to strongly-typedsph_echo512_context*parameters are excellent improvements that:
- Provide compile-time type checking
- Improve code clarity and maintainability
- Align with modern C++ best practices
The removal of
extern "C"wrappers is appropriate for internal C++ code.src/crypto/x11/shavite.cpp (2)
250-250: Global dispatch pointer initialization looks correct.The initialization of
shavite_c512with the soft implementation serves as a default that can be overridden by the runtime dispatcher based on CPU capabilities.
66-68: Incorrect — RoundKeyless signature matches the usageRoundKeyless is declared as void RoundKeyless(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, uint32_t& y0, uint32_t& y1, uint32_t& y2, uint32_t& y3) (src/crypto/x11/aes.h) and defined in src/crypto/x11/aes.cpp; calling RoundKeyless(x0, x1, x2, x3, x0, x1, x2, x3) is valid (inputs by value, outputs by reference).
Likely an incorrect or invalid review comment.
src/crypto/x11/echo.cpp (2)
33-41: Vendored dependency modification needs documentation.Previous review identified this concern: modifying files within
src/crypto/x11/violates the project's guideline to not change vendored dependencies. The extensive refactoring from C to C++ with namespaces represents a significant divergence from upstream.If this is an intentional fork, please document:
- The rationale for forking vs. wrapping
- Maintenance strategy going forward
- Any upstream compatibility considerations
158-159: Dispatch function pointers correctly initialized.The global function pointers
echo_roundandecho_shift_mixare properly initialized with the software implementations, allowing runtime override by the dispatcher based on CPU features.src/Makefile.am (4)
78-105: Hardware acceleration backends properly structured.The conditional compilation of architecture-specific crypto libraries is well-organized:
- Proper dependency chains (e.g., X86_AESNI requires SSE41)
- Clear separation of concerns between different SIMD instruction sets
- Consistent naming convention for library targets
821-821: SSE4.1 dependency for AES-NI requires documentation.The compound preprocessor definition correctly enforces that AES-NI requires SSE4.1 support, but this dependency should be documented in user-facing documentation.
747-749: Build flags properly applied for SPH library.The addition of
SPHLIB_FLAGSto both CXXFLAGS and CFLAGS ensures that the appropriate compiler optimizations are applied to the SPH crypto library components.
1094-1094: Document DISABLE_OPTIMIZED_SHA256 and its performance impactlibdashconsensus is built with -DDISABLE_OPTIMIZED_SHA256 (src/Makefile.am:1094). That macro gates optimized SHA256 code in src/crypto/sha256.cpp and src/crypto/x11/{util/util.hpp,dispatch.cpp} and therefore can disable hardware-accelerated/optimized SHA256, affecting consensus verification throughput. Add brief documentation describing the rationale (upstream compatibility), expected performance impact, and how downstreams can enable optimized builds — e.g., README.md or docs/build.md and an inline comment beside the CPPFLAGS.
| extern sapphire::dispatch::EchoShiftMix echo_shift_mix; | ||
| extern sapphire::dispatch::EchoRoundFn echo_round; | ||
| extern sapphire::dispatch::ShaviteCompressFn shavite_c512; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
🧩 Analysis chain
Verify extern declaration placement and linkage.
The extern declarations for the dispatch function pointers should ideally be in a header file (dispatch.h) to ensure consistent signatures across translation units.
🏁 Script executed:
#!/bin/bash
# Check if these dispatch symbols are properly declared in the header
echo "Checking for dispatch declarations in header..."
rg -n "extern.*echo_shift_mix|extern.*echo_round|extern.*shavite_c512" src/crypto/x11/dispatch.h
# Verify the symbols are defined somewhere
echo -e "\nChecking for dispatch definitions..."
rg -n "dispatch::EchoShiftMix echo_shift_mix|dispatch::EchoRoundFn echo_round|dispatch::ShaviteCompressFn shavite_c512" --type cppLength of output: 684
Move extern declarations into src/crypto/x11/dispatch.h and include it where needed.
Currently the externs are declared only in src/crypto/x11/dispatch.cpp (lines 96–98) while the definitions live in src/crypto/x11/echo.cpp (158–159) and src/crypto/x11/shavite.cpp (250); place the extern declarations in dispatch.h to ensure consistent signatures and linkage across translation units.
🤖 Prompt for AI Agents
In src/crypto/x11/dispatch.cpp around lines 96–99 the extern declarations for
echo_shift_mix, echo_round and shavite_c512 are placed in the .cpp; move these
extern declarations into the header src/crypto/x11/dispatch.h (add them if
missing) so all translation units share the same declarations, then include
"dispatch.h" where those externs are needed (e.g., dispatch.cpp, echo.cpp,
shavite.cpp). Ensure the header uses proper include guards or #pragma once and
that the extern signatures exactly match the definitions in echo.cpp and
shavite.cpp so linkage is consistent.
|
my m1 pro develop: This PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be working
utACK/light-ACK 005967b
|
What is |
|
@knst, It's |
Please DO NOT recommend this option |
|
Updated instruction in PR how to get build after this PR. I need it pretty often when I am doing bisects
|
Additional Information
Important
Due to the build system changes done in this PR, switching between commits in this branch or between this branch and any other branch (like
develop) requires removing temporary files for x11rm -r src/crypto/x11/.deps/.As the dispatcher doesn't kick in until after genesis block validation, the correctness of the implementation can be validated by applying a commit that would retrigger the validation in the benchmark (commit). This is not included in this PR as it negatively impacts benchmark results in some builds (comment).
While working on the ARMv7/ARMv8 backends, it was noticed that the
pow_hashbenchmark's error rate on Apple Silicon can go as high as ~15%, even on repeated runs, which reduces the reliability of benchmark results and hinders decision-making. To mitigate this,minEpochIterationshas been reintroduced and set to 20 (up from pre-dash#6752's 10).Baseline measurements were taken against this change before large footprint removal.
Misc.
libdashconsensusis built with hardware optimizations disabled, which is currently set withDISABLE_OPTIMIZED_SHA256(introduced in bitcoin#29180). To align with upstream behavior, our platform-specific code is disabled in the library as well using the macro to mean "disable optimizations".Despite not being specified in Apple's documentation (source), some versions of macOS report NEON (a.k.a. advanced SIMD) using
hw.optional.arm.AdvSIMDinstead ofhw.optional.AdvSIMD, so we check for thatsysctlas well (see google/cpufeatures#390)Unaligned memory accesses are hardware-supported by platforms like x86_64 and sphlib utilizes them to improve performance (source). When switching to the small footprint implementation of Shavite512, this triggers a UBSan error (build).
As this is intentional behavior, we have opted to suppress the error by conditionally suppressing alignment sanitization if the target platform supports it and it has been enabled (i.e.
SPH_UPTRis set, which it is by default).Benchmarks
Apple M1, macOS Sequoia 15.7
MULTIPROCESS=1 CC=clang CXX=clang++ HOST=aarch64-apple-darwinCC=clang CXX=clang++ CFLAGS="-O2 -g" LDFLAGS="-Wl,-O2" ./configure --prefix=$(pwd)/depends/aarch64-apple-darwin --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccacheBaseline
Echo512:
Shavite512:
X11:
Optimized
Echo512:
Shavite512:
X11:
AMD Ryzen 5 5600G, Ubuntu 24.04 (Noble)
MULTIPROCESS=1 CC=clang-18 CXX=clang++-18 HOST=x86_64-pc-linux-gnuCC=clang-18 CXX=clang++-18 CFLAGS="-O2 -g" LDFLAGS="-Wl,--as-needed -Wl,-O2" ./configure --prefix=$(pwd)/depends/x86_64-pc-linux-gnu --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccacheBaseline
Echo512:
Shavite512:
X11:
Optimized
Echo512:
Shavite512:
X11:
Breaking Changes
Platforms that partially support the set of extensions for x86_64 (SSSE3, SSE4.1 and AES-NI) or ARMv7/ARMv8 (NEON and crypto extensions for AES) may experience different performance characteristics.
The following cases may experience performance degradation due to moving to a small-footprint variant and the overhead (and lost optimization opportunity) incurred by runtime dispatching.
libdashconsensus(optimizations are disabled wholesale for the library following upstream behavior)Checklist