Skip to content

Commit c4a9d82

Browse files
Merge #6852: crypto: add dispatcher and implement hardware acceleration for Echo512 and Shavite512, improve benchmark stability
005967b crypto: drop naive AES backends (Kittywhiskers Van Gogh) d131fcc crypto: implement Shavite512's full `Compress()` routine (Kittywhiskers Van Gogh) 31f93ad crypto: unroll Echo512's `FullStateRound()` (Kittywhiskers Van Gogh) fa68c70 crypto: implement ARM NEON backend for Echo512's `ShiftAndMix()` (Kittywhiskers Van Gogh) 963215b crypto: implement ARM AES backend for Shavite512's `CompressElement()` (Kittywhiskers Van Gogh) 959c9ee crypto: implement ARM AES backend for Echo512's `FullStateRound()` (Kittywhiskers Van Gogh) 76bd236 crypto: implement naive ARM AES backend for simple rounds (Kittywhiskers Van Gogh) f2ececc crypto: implement AES-NI backend for Shavite512's `CompressElement()` (Kittywhiskers Van Gogh) e1bbec4 crypto: avoid extra load/store in Echo512's `ShiftAndMix()` (Kittywhiskers Van Gogh) 250dcce crypto: combine Echo512's Shift and Mix operations (Kittywhiskers Van Gogh) 71d6ef9 crypto: implement SSSE3 backend for Echo512's `ShiftRows()` (Kittywhiskers Van Gogh) da38871 crypto: implement SSSE3 backend for Echo512's `MixColumns()` (Kittywhiskers Van Gogh) 31a6732 crypto: implement AES-NI backend for Echo512's `FullStateRound()` (Kittywhiskers Van Gogh) 1f2eb21 crypto: implement naive AES-NI backend for simple rounds (Kittywhiskers Van Gogh) cba39c5 const: use function pointer to allow for switching implementation (Kittywhiskers Van Gogh) d6d3518 crypto: replace hardcoded AES transform tables with constexpr tables (Kittywhiskers Van Gogh) 20fc998 refactor: move software AES round to header (Kittywhiskers Van Gogh) 386742b refactor: remove large footprint Shavite512 impl, switch to C++ (Kittywhiskers Van Gogh) 7e8607e refactor: remove large footprint Echo512 impl, switch to C++ (Kittywhiskers Van Gogh) c4e7a40 fix: suppress unaligned memory UBSan warnings if supported by arch (Kittywhiskers Van Gogh) 830928c bench: set minimum epoch iterations to improve `pow_hash` stability (Kittywhiskers Van Gogh) Pull request description: ## Additional Information ### Important * Due to the build system changes done in this PR, switching between commits in this branch or between this branch and any other branch (like `develop`) will likely require `git -dfx src/bench src/crypto src/primitives` and re-running `./autogen.sh` and `./configure`. * As the dispatcher doesn't kick in until _after_ genesis block validation, the correctness of the implementation can be validated by applying a commit that would retrigger the validation in the benchmark ([commit](23d15b2)). This is not included in this PR as it negatively impacts benchmark results in some builds ([comment](#6796 (comment))). * While working on the ARMv7/ARMv8 backends, it was noticed that the `pow_hash` benchmark's error rate on Apple Silicon can go as high as ~15%, even on repeated runs, which reduces the reliability of benchmark results and hinders decision-making. To mitigate this, `minEpochIterations` has been reintroduced and set to 20 (up from pre-[dash#6752](#6752 10). Baseline measurements were taken against this change _before_ large footprint removal. ### Misc. * `libdashconsensus` is built with hardware optimizations disabled, which is currently set with `DISABLE_OPTIMIZED_SHA256` (introduced in [bitcoin#29180](bitcoin#29180)). To align with upstream behavior, our platform-specific code is disabled in the library as well using the macro to mean "disable optimizations". * Despite not being specified in Apple's documentation ([source](https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics)), some versions of macOS report NEON (a.k.a. advanced SIMD) using `hw.optional.arm.AdvSIMD` instead of `hw.optional.AdvSIMD`, so we check for that `sysctl` as well (see [google/cpufeatures#390](google/cpu_features#390)) * Unaligned memory accesses are hardware-supported by platforms like x86_64 and sphlib utilizes them to improve performance ([source](https://github.com/dashpay/dash/blob/893b46a000c5088ce92f8625e74d7e3c126e0cdb/src/crypto/x11/sph_types.h#L117-L132)). When switching to the small footprint implementation of Shavite512, this triggers a UBSan error ([build](https://github.com/dashpay/dash/actions/runs/17896732544/job/50884688479#step:10:292)). As this is intentional behavior, we have opted to suppress the error by conditionally suppressing alignment sanitization _if_ the target platform supports it and it has been enabled (i.e. `SPH_UPTR` is set, which it is by default). ## Benchmarks ### Apple M1, macOS Sequoia 15.7 * **Build Information:** Apple Clang 17 (clang-1700.3.19.1), Xcode 26.0 (17A324) * **Depends Config:** `MULTIPROCESS=1 CC=clang CXX=clang++ HOST=aarch64-apple-darwin` * **Build Config:** `CC=clang CXX=clang++ CFLAGS="-O2 -g" LDFLAGS="-Wl,-O2" ./configure --prefix=$(pwd)/depends/aarch64-apple-darwin --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache` **Baseline** * <details> <summary>Echo512:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 39.68 | 25,201,342.83 | 0.2% | 0.01 | `Pow_Echo512_0032b` | 16.09 | 62,143,122.31 | 1.2% | 0.01 | `Pow_Echo512_0080b` | 19.63 | 50,943,039.85 | 0.2% | 0.01 | `Pow_Echo512_0128b` | 12.24 | 81,674,975.07 | 0.9% | 0.01 | `Pow_Echo512_0512b` | 10.94 | 91,442,178.90 | 0.4% | 0.01 | `Pow_Echo512_1024b` | 10.30 | 97,063,958.41 | 0.3% | 0.01 | `Pow_Echo512_2048b` | 9.72 | 102,906,236.78 | 0.4% | 0.11 | `Pow_Echo512_1M` ``` </details> * <details> <summary>Shavite512:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 15.18 | 65,869,311.23 | 0.5% | 0.01 | `Pow_Shavite512_0032b` | 6.08 | 164,380,032.21 | 0.5% | 0.01 | `Pow_Shavite512_0080b` | 7.50 | 133,291,221.55 | 0.4% | 0.01 | `Pow_Shavite512_0128b` | 4.69 | 213,338,740.34 | 0.6% | 0.01 | `Pow_Shavite512_0512b` | 4.25 | 235,339,069.77 | 0.8% | 0.01 | `Pow_Shavite512_1024b` | 4.02 | 248,784,999.42 | 0.7% | 0.01 | `Pow_Shavite512_2048b` | 3.75 | 266,841,572.42 | 0.4% | 0.04 | `Pow_Shavite512_1M` ``` </details> * <details> <summary>X11:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 249.10 | 4,014,450.39 | 0.2% | 0.01 | `Pow_X11_0032b` | 100.76 | 9,924,687.87 | 0.7% | 0.01 | `Pow_X11_0080b` | 63.72 | 15,694,749.39 | 0.4% | 0.01 | `Pow_X11_0128b` | 16.99 | 58,861,757.72 | 0.3% | 0.01 | `Pow_X11_0512b` | 9.27 | 107,817,565.81 | 0.2% | 0.01 | `Pow_X11_1024b` | 5.28 | 189,479,026.11 | 0.8% | 0.01 | `Pow_X11_2048b` | 1.37 | 731,462,009.69 | 0.7% | 0.01 | `Pow_X11_1M` ``` </details> **Optimized** * <details> <summary>Echo512:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 14.34 | 69,741,448.00 | 0.8% | 0.01 | `Pow_Echo512_0032b` | 5.68 | 176,124,721.60 | 0.0% | 0.01 | `Pow_Echo512_0080b` | 7.32 | 136,556,992.52 | 1.5% | 0.01 | `Pow_Echo512_0128b` | 4.31 | 232,035,714.32 | 0.1% | 0.01 | `Pow_Echo512_0512b` | 3.97 | 251,961,418.41 | 0.8% | 0.01 | `Pow_Echo512_1024b` | 3.62 | 276,147,242.57 | 0.1% | 0.01 | `Pow_Echo512_2048b` | 3.43 | 291,830,314.57 | 0.4% | 0.82 | `Pow_Echo512_1M` ``` </details> * <details> <summary>Shavite512:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 17.16 | 58,273,414.75 | 0.9% | 0.01 | `Pow_Shavite512_0032b` | 6.81 | 146,755,834.13 | 0.2% | 0.01 | `Pow_Shavite512_0080b` | 8.34 | 119,878,248.65 | 1.2% | 0.01 | `Pow_Shavite512_0128b` | 5.05 | 197,863,716.10 | 0.7% | 0.01 | `Pow_Shavite512_0512b` | 4.55 | 219,843,493.16 | 0.8% | 0.01 | `Pow_Shavite512_1024b` | 4.40 | 227,519,688.25 | 0.5% | 0.01 | `Pow_Shavite512_2048b` | 3.79 | 263,911,928.82 | 0.1% | 0.91 | `Pow_Shavite512_1M` ``` </details> * <details> <summary>X11:</summary> ``` | ns/byte | byte/s | err% | total | benchmark |--------------------:|--------------------:|--------:|----------:|:---------- | 224.98 | 4,444,913.98 | 1.1% | 0.01 | `Pow_X11_0032b` | 90.76 | 11,017,721.52 | 1.7% | 0.01 | `Pow_X11_0080b` | 57.66 | 17,343,058.47 | 0.6% | 0.01 | `Pow_X11_0128b` | 15.79 | 63,347,591.25 | 0.3% | 0.01 | `Pow_X11_0512b` | 8.39 | 119,204,486.51 | 0.7% | 0.01 | `Pow_X11_1024b` | 4.98 | 200,909,248.89 | 1.1% | 0.01 | `Pow_X11_2048b` | 1.33 | 751,189,042.44 | 0.2% | 0.32 | `Pow_X11_1M` ``` </details> ### AMD Ryzen 5 5600G, Ubuntu 24.04 (Noble) * **Build Information:** Ubuntu Clang 18.1.8 * **Depends Config:** `MULTIPROCESS=1 CC=clang-18 CXX=clang++-18 HOST=x86_64-pc-linux-gnu` * **Build Config:** `CC=clang-18 CXX=clang++-18 CFLAGS="-O2 -g" LDFLAGS="-Wl,--as-needed -Wl,-O2" ./configure --prefix=$(pwd)/depends/x86_64-pc-linux-gnu --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache` **Baseline** * <details> <summary>Echo512:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 51.81 | 19,300,083.40 | 0.2% | 916.31 | 230.13 | 3.982 | 6.19 | 0.5% | 0.01 | `Pow_Echo512_0032b` | 20.72 | 48,266,658.57 | 0.1% | 366.56 | 92.01 | 3.984 | 2.49 | 0.5% | 0.01 | `Pow_Echo512_0080b` | 25.67 | 38,953,174.42 | 0.1% | 456.97 | 113.97 | 4.009 | 2.95 | 0.5% | 0.01 | `Pow_Echo512_0128b` | 16.00 | 62,491,477.70 | 0.1% | 285.25 | 71.06 | 4.014 | 1.81 | 0.6% | 0.01 | `Pow_Echo512_0512b` | 14.38 | 69,562,745.47 | 0.1% | 256.63 | 63.83 | 4.020 | 1.63 | 0.6% | 0.01 | `Pow_Echo512_1024b` | 13.65 | 73,272,852.80 | 0.1% | 242.32 | 60.27 | 4.020 | 1.53 | 0.6% | 0.01 | `Pow_Echo512_2048b` | 12.84 | 77,877,131.11 | 0.0% | 228.02 | 56.70 | 4.022 | 1.44 | 0.6% | 3.07 | `Pow_Echo512_1M` ``` </details> * <details> <summary>Shavite512:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 26.75 | 37,379,255.99 | 0.0% | 423.31 | 118.82 | 3.563 | 1.41 | 0.0% | 0.01 | `Pow_Shavite512_0032b` | 10.71 | 93,360,115.43 | 0.0% | 169.35 | 47.57 | 3.560 | 0.58 | 0.0% | 0.01 | `Pow_Shavite512_0080b` | 13.26 | 75,436,781.19 | 0.0% | 210.04 | 58.87 | 3.568 | 0.45 | 0.0% | 0.01 | `Pow_Shavite512_0128b` | 8.26 | 121,051,101.55 | 0.0% | 130.99 | 36.69 | 3.570 | 0.27 | 0.0% | 0.01 | `Pow_Shavite512_0512b` | 7.44 | 134,444,187.57 | 0.0% | 117.82 | 33.04 | 3.566 | 0.24 | 0.4% | 0.01 | `Pow_Shavite512_1024b` | 7.06 | 141,686,987.23 | 0.2% | 111.23 | 31.18 | 3.568 | 0.23 | 0.2% | 0.01 | `Pow_Shavite512_2048b` | 6.65 | 150,409,844.26 | 0.1% | 104.65 | 29.37 | 3.563 | 0.21 | 0.0% | 1.60 | `Pow_Shavite512_1M` ``` </details> * <details> <summary>X11:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 326.13 | 3,066,305.67 | 0.0% | 5,979.73 | 1,448.47 | 4.128 | 41.00 | 0.1% | 0.01 | `Pow_X11_0032b` | 130.47 | 7,664,483.17 | 0.0% | 2,391.79 | 579.50 | 4.127 | 16.38 | 0.1% | 0.01 | `Pow_X11_0080b` | 83.00 | 12,048,837.15 | 0.0% | 1,516.78 | 368.62 | 4.115 | 10.30 | 0.1% | 0.01 | `Pow_X11_0128b` | 21.78 | 45,920,017.94 | 0.0% | 395.29 | 96.72 | 4.087 | 2.64 | 0.1% | 0.01 | `Pow_X11_0512b` | 11.50 | 86,977,849.93 | 0.0% | 208.38 | 51.07 | 4.081 | 1.36 | 0.1% | 0.01 | `Pow_X11_1024b` | 6.38 | 156,826,545.04 | 0.0% | 114.92 | 28.16 | 4.081 | 0.72 | 0.1% | 0.01 | `Pow_X11_2048b` | 1.21 | 827,082,388.48 | 0.1% | 21.65 | 5.34 | 4.052 | 0.09 | 0.0% | 0.29 | `Pow_X11_1M` ``` </details> **Optimized** * <details> <summary>Echo512:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 17.52 | 57,085,952.31 | 0.0% | 166.19 | 77.77 | 2.137 | 7.88 | 0.0% | 0.01 | `Pow_Echo512_0032b` | 7.01 | 142,732,714.40 | 0.1% | 66.51 | 31.10 | 2.139 | 3.16 | 0.0% | 0.01 | `Pow_Echo512_0080b` | 8.69 | 115,084,427.41 | 0.1% | 81.58 | 38.56 | 2.116 | 3.76 | 0.0% | 0.01 | `Pow_Echo512_0128b` | 5.36 | 186,499,101.72 | 0.1% | 50.51 | 23.79 | 2.123 | 2.31 | 0.1% | 0.01 | `Pow_Echo512_0512b` | 4.79 | 208,614,198.78 | 0.1% | 45.33 | 21.29 | 2.129 | 2.07 | 0.0% | 0.01 | `Pow_Echo512_1024b` | 4.51 | 221,584,256.28 | 0.0% | 42.74 | 20.04 | 2.133 | 1.95 | 0.0% | 0.01 | `Pow_Echo512_2048b` | 4.27 | 234,154,680.31 | 0.0% | 40.15 | 18.95 | 2.119 | 1.83 | 0.0% | 1.02 | `Pow_Echo512_1M` ``` </details> * <details> <summary>Shavite512:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 10.12 | 98,853,454.39 | 0.5% | 69.47 | 44.93 | 1.546 | 8.38 | 0.0% | 0.01 | `Pow_Shavite512_0032b` | 3.98 | 251,044,756.51 | 0.1% | 27.81 | 17.69 | 1.572 | 3.36 | 0.0% | 0.01 | `Pow_Shavite512_0080b` | 4.89 | 204,696,173.21 | 0.1% | 32.78 | 21.69 | 1.511 | 3.90 | 0.0% | 0.01 | `Pow_Shavite512_0128b` | 3.01 | 332,315,588.24 | 0.1% | 20.08 | 13.37 | 1.502 | 2.42 | 0.1% | 0.01 | `Pow_Shavite512_0512b` | 2.70 | 369,999,483.82 | 0.2% | 17.96 | 12.00 | 1.496 | 2.17 | 0.0% | 0.01 | `Pow_Shavite512_1024b` | 2.55 | 392,427,831.38 | 0.0% | 16.90 | 11.32 | 1.493 | 2.05 | 0.0% | 0.01 | `Pow_Shavite512_2048b` | 2.39 | 418,541,658.56 | 0.0% | 15.84 | 10.60 | 1.494 | 1.92 | 0.0% | 0.57 | `Pow_Shavite512_1M` ``` </details> * <details> <summary>X11:</summary> ``` | ns/byte | byte/s | err% | ins/byte | cyc/byte | IPC | bra/byte | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 271.74 | 3,679,981.60 | 0.1% | 4,875.78 | 1,206.89 | 4.040 | 49.66 | 0.0% | 0.01 | `Pow_X11_0032b` | 108.70 | 9,199,814.12 | 0.2% | 1,950.20 | 482.28 | 4.044 | 19.84 | 0.0% | 0.01 | `Pow_X11_0080b` | 69.40 | 14,408,573.10 | 0.2% | 1,240.79 | 308.15 | 4.027 | 12.47 | 0.0% | 0.01 | `Pow_X11_0128b` | 18.33 | 54,546,702.44 | 0.1% | 326.29 | 81.43 | 4.007 | 3.18 | 0.0% | 0.01 | `Pow_X11_0512b` | 9.78 | 102,262,967.62 | 0.2% | 173.88 | 43.39 | 4.007 | 1.63 | 0.0% | 0.01 | `Pow_X11_1024b` | 5.48 | 182,512,076.76 | 0.1% | 97.67 | 24.33 | 4.015 | 0.86 | 0.0% | 0.01 | `Pow_X11_2048b` | 1.20 | 835,292,126.52 | 0.0% | 21.62 | 5.31 | 4.068 | 0.09 | 0.0% | 0.29 | `Pow_X11_1M` ``` </details> ## Breaking Changes Platforms that partially support the set of extensions for x86_64 (SSSE3, SSE4.1 and AES-NI) or ARMv7/ARMv8 (NEON and crypto extensions for AES) may experience different performance characteristics. The following cases may experience performance degradation due to moving to a small-footprint variant and the overhead (and lost optimization opportunity) incurred by runtime dispatching. * Platforms other than x86_64 or ARMv7 and above (there are no backends that implement optimized routines for them) * x86_64 or ARMv7/ARMv8, without extensions (i.e. older chips, OS/Hypervisor-level disablement of extensions) * Operating systems other than Windows, macOS, Linux and FreeBSD (there are no dispatch routines for other platforms) * Using `libdashconsensus` (optimizations are disabled wholesale for the library following upstream behavior) ## Checklist - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added or updated relevant unit/integration/functional/e2e tests **(note: N/A)** - [x] I have made corresponding changes to the documentation **(note: N/A)** - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: UdjinM6: utACK/light-ACK 005967b Tree-SHA512: 8b23f8e4e591faa4fbdb75ef85a64fed5f0429c804e650e8389789468d7e594998b11228794d650adfaeaf65085d157f98533983a7893d8db5ba587341c86e44
2 parents 3a27f30 + 005967b commit c4a9d82

File tree

26 files changed

+1862
-1537
lines changed

26 files changed

+1862
-1537
lines changed

configure.ac

Lines changed: 84 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ if test "$enable_debug" = "yes"; then
379379
AX_CHECK_COMPILE_FLAG([-ftrapv], [DEBUG_CXXFLAGS="$DEBUG_CXXFLAGS -ftrapv"], [], [$CXXFLAG_WERROR])
380380
else
381381
dnl If not debugging, enable more aggressive optimizations for sphlib sources
382-
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_CFLAGS="$SPHLIB_CFLAGS -O3"], [], [$CXXFLAG_WERROR])
382+
AX_CHECK_COMPILE_FLAG([-O3], [SPHLIB_FLAGS="$SPHLIB_FLAGS -O3"], [], [$CXXFLAG_WERROR])
383383

384384
# We always enable at at least -g1 debug info to support proper stacktraces in crash infos
385385
# Stacktraces will be suboptimal due to optimization, but better than nothing. Also, -fno-omit-frame-pointer
@@ -535,21 +535,27 @@ dnl https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111843. To work around that, se
535535
dnl -fstack-reuse=none for all gcc builds. (Only gcc understands this flag)
536536
AX_CHECK_COMPILE_FLAG([-fstack-reuse=none], [CORE_CXXFLAGS="$CORE_CXXFLAGS -fstack-reuse=none"])
537537

538+
enable_arm_aes=no
538539
enable_arm_crc=no
540+
enable_arm_neon=no
539541
enable_arm_shani=no
542+
enable_ssse3=no
540543
enable_sse42=no
541544
enable_sse41=no
542545
enable_avx2=no
546+
enable_x86_aesni=no
543547
enable_x86_shani=no
544548

545549
dnl Check for optional instruction set support. Enabling these does _not_ imply that all code will
546550
dnl be compiled with them, rather that specific objects/libs may use them after checking for runtime
547551
dnl compatibility.
548552

549553
dnl x86
554+
AX_CHECK_COMPILE_FLAG([-mssse3], [SSSE3_CXXFLAGS="-mssse3"], [], [$CXXFLAG_WERROR])
550555
AX_CHECK_COMPILE_FLAG([-msse4.2], [SSE42_CXXFLAGS="-msse4.2"], [], [$CXXFLAG_WERROR])
551556
AX_CHECK_COMPILE_FLAG([-msse4.1], [SSE41_CXXFLAGS="-msse4.1"], [], [$CXXFLAG_WERROR])
552557
AX_CHECK_COMPILE_FLAG([-mavx -mavx2], [AVX2_CXXFLAGS="-mavx -mavx2"], [], [$CXXFLAG_WERROR])
558+
AX_CHECK_COMPILE_FLAG([-msse4.1 -maes], [X86_AESNI_CXXFLAGS="-msse4.1 -maes"], [], [$CXXFLAG_WERROR])
553559
AX_CHECK_COMPILE_FLAG([-msse4 -msha], [X86_SHANI_CXXFLAGS="-msse4 -msha"], [], [$CXXFLAG_WERROR])
554560

555561
enable_clmul=
@@ -570,6 +576,20 @@ if test "$enable_clmul" = "yes"; then
570576
AC_DEFINE([HAVE_CLMUL], [1], [Define this symbol if clmul instructions can be used])
571577
fi
572578

579+
TEMP_CXXFLAGS="$CXXFLAGS"
580+
CXXFLAGS="$SSSE3_CXXFLAGS $CXXFLAGS"
581+
AC_MSG_CHECKING([for SSSE3 intrinsics])
582+
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
583+
#include <tmmintrin.h>
584+
]],[[
585+
__m64 x = _mm_abs_pi32(_m_from_int(0));
586+
return 0;
587+
]])],
588+
[ AC_MSG_RESULT([yes]); enable_ssse3=yes; AC_DEFINE([ENABLE_SSSE3], [1], [Define this symbol to build code that uses SSSE3 intrinsics]) ],
589+
[ AC_MSG_RESULT([no])]
590+
)
591+
CXXFLAGS="$TEMP_CXXFLAGS"
592+
573593
TEMP_CXXFLAGS="$CXXFLAGS"
574594
CXXFLAGS="$SSE42_CXXFLAGS $CXXFLAGS"
575595
AC_MSG_CHECKING([for SSE4.2 intrinsics])
@@ -640,9 +660,41 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
640660
)
641661
CXXFLAGS="$TEMP_CXXFLAGS"
642662

663+
TEMP_CXXFLAGS="$CXXFLAGS"
664+
CXXFLAGS="$X86_AESNI_CXXFLAGS $CXXFLAGS"
665+
AC_MSG_CHECKING([for x86 AES-NI intrinsics])
666+
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
667+
#include <stdint.h>
668+
#include <immintrin.h>
669+
#include <wmmintrin.h>
670+
]],[[
671+
__m128i x = _mm_setzero_si128();
672+
x = _mm_aesenc_si128(x, _mm_setzero_si128());
673+
return _mm_extract_epi32(x, 0);
674+
]])],
675+
[ AC_MSG_RESULT([yes]); enable_x86_aesni=yes; AC_DEFINE([ENABLE_X86_AESNI], [1], [Define this symbol to build code that uses x86 AES-NI intrinsics]) ],
676+
[ AC_MSG_RESULT([no])]
677+
)
678+
CXXFLAGS="$TEMP_CXXFLAGS"
679+
643680
# ARM
644681
AX_CHECK_COMPILE_FLAG([-march=armv8-a+crc+crypto], [ARM_CRC_CXXFLAGS="-march=armv8-a+crc+crypto"], [], [$CXXFLAG_WERROR])
645-
AX_CHECK_COMPILE_FLAG([-march=armv8-a+crypto], [ARM_SHANI_CXXFLAGS="-march=armv8-a+crypto"], [], [$CXXFLAG_WERROR])
682+
AX_CHECK_COMPILE_FLAG([-march=armv8-a+crypto], [ARM_AES_CXXFLAGS="-march=armv8-a+crypto"; ARM_SHANI_CXXFLAGS="-march=armv8-a+crypto"], [], [$CXXFLAG_WERROR])
683+
684+
TEMP_CXXFLAGS="$CXXFLAGS"
685+
CXXFLAGS="$ARM_AES_CXXFLAGS $CXXFLAGS"
686+
AC_MSG_CHECKING([for ARMv8 AES intrinsics])
687+
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
688+
#include <arm_neon.h>
689+
]],[[
690+
uint8x16_t a, b;
691+
vaesmcq_u8(vaeseq_u8(a, b));
692+
return 0;
693+
]])],
694+
[ AC_MSG_RESULT([yes]); enable_arm_aes=yes; AC_DEFINE([ENABLE_ARM_AES], [1], [Define this symbol to build code that uses ARMv8 AES intrinsics]) ],
695+
[ AC_MSG_RESULT([no])]
696+
)
697+
CXXFLAGS="$TEMP_CXXFLAGS"
646698

647699
TEMP_CXXFLAGS="$CXXFLAGS"
648700
CXXFLAGS="$ARM_CRC_CXXFLAGS $CXXFLAGS"
@@ -663,6 +715,27 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
663715
)
664716
CXXFLAGS="$TEMP_CXXFLAGS"
665717

718+
ARM_NEON_CXXFLAGS=""
719+
TEMP_CXXFLAGS="$CXXFLAGS"
720+
for flag in "-march=armv8-a" "-march=armv7-a -mfpu=neon"; do
721+
AX_CHECK_COMPILE_FLAG([$flag], [
722+
CXXFLAGS="$CXXFLAGS $flag"
723+
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
724+
#include <arm_neon.h>
725+
]], [[
726+
float32x4_t f = vdupq_n_f32(0.0);
727+
return 0;
728+
]])], [
729+
ARM_NEON_CXXFLAGS="$flag"
730+
enable_arm_neon=yes
731+
AC_DEFINE([ENABLE_ARM_NEON], [1], [Define this symbol to build code that uses ARM NEON intrinsics])
732+
break
733+
])
734+
CXXFLAGS="$TEMP_CXXFLAGS"
735+
])
736+
done
737+
CXXFLAGS="$TEMP_CXXFLAGS"
738+
666739
TEMP_CXXFLAGS="$CXXFLAGS"
667740
CXXFLAGS="$ARM_SHANI_CXXFLAGS $CXXFLAGS"
668741
AC_MSG_CHECKING([for ARMv8 SHA-NI intrinsics])
@@ -1820,11 +1893,15 @@ AM_CONDITIONAL([USE_QRCODE], [test "$use_qr" = "yes"])
18201893
AM_CONDITIONAL([USE_LCOV], [test "$use_lcov" = "yes"])
18211894
AM_CONDITIONAL([USE_LIBEVENT], [test "$use_libevent" = "yes"])
18221895
AM_CONDITIONAL([HARDEN], [test "$use_hardening" = "yes"])
1896+
AM_CONDITIONAL([ENABLE_SSSE3], [test "$enable_ssse3" = "yes"])
18231897
AM_CONDITIONAL([ENABLE_SSE42], [test "$enable_sse42" = "yes"])
18241898
AM_CONDITIONAL([ENABLE_SSE41], [test "$enable_sse41" = "yes"])
18251899
AM_CONDITIONAL([ENABLE_AVX2], [test "$enable_avx2" = "yes"])
1900+
AM_CONDITIONAL([ENABLE_X86_AESNI], [test "$enable_x86_aesni" = "yes"])
18261901
AM_CONDITIONAL([ENABLE_X86_SHANI], [test "$enable_x86_shani" = "yes"])
1902+
AM_CONDITIONAL([ENABLE_ARM_AES], [test "$enable_arm_aes" = "yes"])
18271903
AM_CONDITIONAL([ENABLE_ARM_CRC], [test "$enable_arm_crc" = "yes"])
1904+
AM_CONDITIONAL([ENABLE_ARM_NEON], [test "$enable_arm_neon" = "yes"])
18281905
AM_CONDITIONAL([ENABLE_ARM_SHANI], [test "$enable_arm_shani" = "yes"])
18291906
AM_CONDITIONAL([WORDS_BIGENDIAN], [test "$ac_cv_c_bigendian" = "yes"])
18301907
AM_CONDITIONAL([USE_NATPMP], [test "$use_natpmp" = "yes"])
@@ -1877,13 +1954,17 @@ AC_SUBST(PIC_FLAGS)
18771954
AC_SUBST(PIE_FLAGS)
18781955
AC_SUBST(SANITIZER_CXXFLAGS)
18791956
AC_SUBST(SANITIZER_LDFLAGS)
1880-
AC_SUBST(SPHLIB_CFLAGS)
1957+
AC_SUBST(SPHLIB_FLAGS)
1958+
AC_SUBST(SSSE3_CXXFLAGS)
18811959
AC_SUBST(SSE42_CXXFLAGS)
18821960
AC_SUBST(SSE41_CXXFLAGS)
18831961
AC_SUBST(CLMUL_CXXFLAGS)
18841962
AC_SUBST(AVX2_CXXFLAGS)
1963+
AC_SUBST(X86_AESNI_CXXFLAGS)
18851964
AC_SUBST(X86_SHANI_CXXFLAGS)
1965+
AC_SUBST(ARM_AES_CXXFLAGS)
18861966
AC_SUBST(ARM_CRC_CXXFLAGS)
1967+
AC_SUBST(ARM_NEON_CXXFLAGS)
18871968
AC_SUBST(ARM_SHANI_CXXFLAGS)
18881969
AC_SUBST(LIBTOOL_APP_LDFLAGS)
18891970
AC_SUBST(USE_SQLITE)

src/Makefile.am

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,17 @@ endif
7575
LIBBITCOIN_CRYPTO = $(LIBBITCOIN_CRYPTO_BASE)
7676
LIBBITCOIN_CRYPTO_SPH = crypto/libbitcoin_crypto_sph.la
7777
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SPH)
78+
if ENABLE_SSSE3
79+
LIBBITCOIN_CRYPTO_SSSE3 = crypto/libbitcoin_crypto_ssse3.la
80+
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SSSE3)
81+
endif
7882
if ENABLE_SSE41
7983
LIBBITCOIN_CRYPTO_SSE41 = crypto/libbitcoin_crypto_sse41.la
8084
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_SSE41)
85+
if ENABLE_X86_AESNI
86+
LIBBITCOIN_CRYPTO_X86_AESNI = crypto/libbitcoin_crypto_x86_aesni.la
87+
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_X86_AESNI)
88+
endif
8189
if ENABLE_X86_SHANI
8290
LIBBITCOIN_CRYPTO_X86_SHANI = crypto/libbitcoin_crypto_x86_shani.la
8391
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_X86_SHANI)
@@ -87,6 +95,14 @@ if ENABLE_AVX2
8795
LIBBITCOIN_CRYPTO_AVX2 = crypto/libbitcoin_crypto_avx2.la
8896
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_AVX2)
8997
endif
98+
if ENABLE_ARM_AES
99+
LIBBITCOIN_CRYPTO_ARM_AES = crypto/libbitcoin_crypto_arm_aes.la
100+
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_ARM_AES)
101+
endif
102+
if ENABLE_ARM_NEON
103+
LIBBITCOIN_CRYPTO_ARM_NEON = crypto/libbitcoin_crypto_arm_neon.la
104+
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_ARM_NEON)
105+
endif
90106
if ENABLE_ARM_SHANI
91107
LIBBITCOIN_CRYPTO_ARM_SHANI = crypto/libbitcoin_crypto_arm_shani.la
92108
LIBBITCOIN_CRYPTO += $(LIBBITCOIN_CRYPTO_ARM_SHANI)
@@ -730,23 +746,26 @@ crypto_libbitcoin_crypto_avx2_la_SOURCES = crypto/sha256_avx2.cpp
730746
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
731747
# CXXFLAGS above
732748
crypto_libbitcoin_crypto_sph_la_LDFLAGS = $(AM_LDFLAGS) -static
733-
crypto_libbitcoin_crypto_sph_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
749+
crypto_libbitcoin_crypto_sph_la_CXXFLAGS = $(AM_CXXFLAGS) $(SPHLIB_FLAGS) $(PIE_FLAGS) -static
734750
crypto_libbitcoin_crypto_sph_la_CPPFLAGS = $(AM_CPPFLAGS)
735-
crypto_libbitcoin_crypto_sph_la_CFLAGS = $(SPHLIB_CFLAGS)
751+
crypto_libbitcoin_crypto_sph_la_CFLAGS = $(SPHLIB_FLAGS)
736752
crypto_libbitcoin_crypto_sph_la_CPPFLAGS += \
737753
-DSPH_SMALL_FOOTPRINT_CUBEHASH=1 \
738754
-DSPH_SMALL_FOOTPRINT_JH=1
739755
crypto_libbitcoin_crypto_sph_la_SOURCES = \
740-
crypto/x11/aes_helper.c \
756+
crypto/x11/aes.cpp \
757+
crypto/x11/aes.h \
741758
crypto/x11/blake.c \
742759
crypto/x11/bmw.c \
743760
crypto/x11/cubehash.c \
744-
crypto/x11/echo.c \
761+
crypto/x11/dispatch.cpp \
762+
crypto/x11/dispatch.h \
763+
crypto/x11/echo.cpp \
745764
crypto/x11/groestl.c \
746765
crypto/x11/jh.c \
747766
crypto/x11/keccak.c \
748767
crypto/x11/luffa.c \
749-
crypto/x11/shavite.c \
768+
crypto/x11/shavite.cpp \
750769
crypto/x11/simd.c \
751770
crypto/x11/skein.c \
752771
crypto/x11/sph_blake.h \
@@ -760,7 +779,51 @@ crypto_libbitcoin_crypto_sph_la_SOURCES = \
760779
crypto/x11/sph_shavite.h \
761780
crypto/x11/sph_simd.h \
762781
crypto/x11/sph_skein.h \
763-
crypto/x11/sph_types.h
782+
crypto/x11/sph_types.h \
783+
crypto/x11/util/consts_aes.hpp \
784+
crypto/x11/util/util.hpp
785+
786+
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
787+
# CXXFLAGS above
788+
crypto_libbitcoin_crypto_arm_aes_la_LDFLAGS = $(AM_LDFLAGS) -static
789+
crypto_libbitcoin_crypto_arm_aes_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
790+
crypto_libbitcoin_crypto_arm_aes_la_CPPFLAGS = $(AM_CPPFLAGS)
791+
crypto_libbitcoin_crypto_arm_aes_la_CXXFLAGS += $(ARM_AES_CXXFLAGS)
792+
crypto_libbitcoin_crypto_arm_aes_la_CPPFLAGS += -DENABLE_ARM_AES
793+
crypto_libbitcoin_crypto_arm_aes_la_SOURCES = \
794+
crypto/x11/arm_crypto/echo.cpp \
795+
crypto/x11/arm_crypto/shavite.cpp
796+
797+
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
798+
# CXXFLAGS above
799+
crypto_libbitcoin_crypto_arm_neon_la_LDFLAGS = $(AM_LDFLAGS) -static
800+
crypto_libbitcoin_crypto_arm_neon_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
801+
crypto_libbitcoin_crypto_arm_neon_la_CPPFLAGS = $(AM_CPPFLAGS)
802+
crypto_libbitcoin_crypto_arm_neon_la_CXXFLAGS += $(ARM_NEON_CXXFLAGS)
803+
crypto_libbitcoin_crypto_arm_neon_la_CPPFLAGS += -DENABLE_ARM_NEON
804+
crypto_libbitcoin_crypto_arm_neon_la_SOURCES = \
805+
crypto/x11/arm_neon/echo.cpp
806+
807+
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
808+
# CXXFLAGS above
809+
crypto_libbitcoin_crypto_ssse3_la_LDFLAGS = $(AM_LDFLAGS) -static
810+
crypto_libbitcoin_crypto_ssse3_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
811+
crypto_libbitcoin_crypto_ssse3_la_CPPFLAGS = $(AM_CPPFLAGS)
812+
crypto_libbitcoin_crypto_ssse3_la_CXXFLAGS += $(SSSE3_CXXFLAGS)
813+
crypto_libbitcoin_crypto_ssse3_la_CPPFLAGS += -DENABLE_SSSE3
814+
crypto_libbitcoin_crypto_ssse3_la_SOURCES = \
815+
crypto/x11/ssse3/echo.cpp
816+
817+
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
818+
# CXXFLAGS above
819+
crypto_libbitcoin_crypto_x86_aesni_la_LDFLAGS = $(AM_LDFLAGS) -static
820+
crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
821+
crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS = $(AM_CPPFLAGS)
822+
crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS += $(X86_AESNI_CXXFLAGS)
823+
crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS += -DENABLE_SSE41 -DENABLE_X86_AESNI
824+
crypto_libbitcoin_crypto_x86_aesni_la_SOURCES = \
825+
crypto/x11/x86_aesni/echo.cpp \
826+
crypto/x11/x86_aesni/shavite.cpp
764827

765828
# See explanation for -static in crypto_libbitcoin_crypto_base_la's LDFLAGS and
766829
# CXXFLAGS above

src/bench/bench_bitcoin.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
#include <clientversion.h>
88
#include <crypto/sha256.h>
9+
#include <crypto/x11/dispatch.h>
910
#include <fs.h>
1011
#include <util/strencodings.h>
1112
#include <util/system.h>
@@ -61,6 +62,7 @@ int main(int argc, char** argv)
6162
{
6263
ArgsManager argsman;
6364
SetupBenchArgs(argsman);
65+
SapphireAutoDetect();
6466
SHA256AutoDetect();
6567
std::string error;
6668
if (!argsman.ParseParameters(argc, argv, error)) {

0 commit comments

Comments
 (0)