crypto: add dispatcher and implement hardware acceleration for Echo512 and Shavite512, improve benchmark stability #6852

kwvg · 2025-09-21T17:24:46Z

Additional Information

Important

Due to the build system changes done in this PR, switching between commits in this branch or between this branch and any other branch (like develop) requires removing temporary files for x11 rm -r src/crypto/x11/.deps/.
As the dispatcher doesn't kick in until after genesis block validation, the correctness of the implementation can be validated by applying a commit that would retrigger the validation in the benchmark (commit). This is not included in this PR as it negatively impacts benchmark results in some builds (comment).
While working on the ARMv7/ARMv8 backends, it was noticed that the pow_hash benchmark's error rate on Apple Silicon can go as high as ~15%, even on repeated runs, which reduces the reliability of benchmark results and hinders decision-making. To mitigate this, minEpochIterations has been reintroduced and set to 20 (up from pre-dash#6752's 10).

Baseline measurements were taken against this change before large footprint removal.

Misc.

libdashconsensus is built with hardware optimizations disabled, which is currently set with DISABLE_OPTIMIZED_SHA256 (introduced in bitcoin#29180). To align with upstream behavior, our platform-specific code is disabled in the library as well using the macro to mean "disable optimizations".
Despite not being specified in Apple's documentation (source), some versions of macOS report NEON (a.k.a. advanced SIMD) using hw.optional.arm.AdvSIMD instead of hw.optional.AdvSIMD, so we check for that sysctl as well (see google/cpufeatures#390)
Unaligned memory accesses are hardware-supported by platforms like x86_64 and sphlib utilizes them to improve performance (source). When switching to the small footprint implementation of Shavite512, this triggers a UBSan error (build).

As this is intentional behavior, we have opted to suppress the error by conditionally suppressing alignment sanitization if the target platform supports it and it has been enabled (i.e. SPH_UPTR is set, which it is by default).

Benchmarks

Apple M1, macOS Sequoia 15.7

Build Information: Apple Clang 17 (clang-1700.3.19.1), Xcode 26.0 (17A324)
Depends Config: MULTIPROCESS=1 CC=clang CXX=clang++ HOST=aarch64-apple-darwin
Build Config: CC=clang CXX=clang++ CFLAGS="-O2 -g" LDFLAGS="-Wl,-O2" ./configure --prefix=$(pwd)/depends/aarch64-apple-darwin --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache

Baseline

Echo512:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               39.68 |       25,201,342.83 |    0.2% |      0.01 | `Pow_Echo512_0032b`
|               16.09 |       62,143,122.31 |    1.2% |      0.01 | `Pow_Echo512_0080b`
|               19.63 |       50,943,039.85 |    0.2% |      0.01 | `Pow_Echo512_0128b`
|               12.24 |       81,674,975.07 |    0.9% |      0.01 | `Pow_Echo512_0512b`
|               10.94 |       91,442,178.90 |    0.4% |      0.01 | `Pow_Echo512_1024b`
|               10.30 |       97,063,958.41 |    0.3% |      0.01 | `Pow_Echo512_2048b`
|                9.72 |      102,906,236.78 |    0.4% |      0.11 | `Pow_Echo512_1M`

Shavite512:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               15.18 |       65,869,311.23 |    0.5% |      0.01 | `Pow_Shavite512_0032b`
|                6.08 |      164,380,032.21 |    0.5% |      0.01 | `Pow_Shavite512_0080b`
|                7.50 |      133,291,221.55 |    0.4% |      0.01 | `Pow_Shavite512_0128b`
|                4.69 |      213,338,740.34 |    0.6% |      0.01 | `Pow_Shavite512_0512b`
|                4.25 |      235,339,069.77 |    0.8% |      0.01 | `Pow_Shavite512_1024b`
|                4.02 |      248,784,999.42 |    0.7% |      0.01 | `Pow_Shavite512_2048b`
|                3.75 |      266,841,572.42 |    0.4% |      0.04 | `Pow_Shavite512_1M`

X11:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              249.10 |        4,014,450.39 |    0.2% |      0.01 | `Pow_X11_0032b`
|              100.76 |        9,924,687.87 |    0.7% |      0.01 | `Pow_X11_0080b`
|               63.72 |       15,694,749.39 |    0.4% |      0.01 | `Pow_X11_0128b`
|               16.99 |       58,861,757.72 |    0.3% |      0.01 | `Pow_X11_0512b`
|                9.27 |      107,817,565.81 |    0.2% |      0.01 | `Pow_X11_1024b`
|                5.28 |      189,479,026.11 |    0.8% |      0.01 | `Pow_X11_2048b`
|                1.37 |      731,462,009.69 |    0.7% |      0.01 | `Pow_X11_1M`

Optimized

Echo512:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               14.34 |       69,741,448.00 |    0.8% |      0.01 | `Pow_Echo512_0032b`
|                5.68 |      176,124,721.60 |    0.0% |      0.01 | `Pow_Echo512_0080b`
|                7.32 |      136,556,992.52 |    1.5% |      0.01 | `Pow_Echo512_0128b`
|                4.31 |      232,035,714.32 |    0.1% |      0.01 | `Pow_Echo512_0512b`
|                3.97 |      251,961,418.41 |    0.8% |      0.01 | `Pow_Echo512_1024b`
|                3.62 |      276,147,242.57 |    0.1% |      0.01 | `Pow_Echo512_2048b`
|                3.43 |      291,830,314.57 |    0.4% |      0.82 | `Pow_Echo512_1M`

Shavite512:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               17.16 |       58,273,414.75 |    0.9% |      0.01 | `Pow_Shavite512_0032b`
|                6.81 |      146,755,834.13 |    0.2% |      0.01 | `Pow_Shavite512_0080b`
|                8.34 |      119,878,248.65 |    1.2% |      0.01 | `Pow_Shavite512_0128b`
|                5.05 |      197,863,716.10 |    0.7% |      0.01 | `Pow_Shavite512_0512b`
|                4.55 |      219,843,493.16 |    0.8% |      0.01 | `Pow_Shavite512_1024b`
|                4.40 |      227,519,688.25 |    0.5% |      0.01 | `Pow_Shavite512_2048b`
|                3.79 |      263,911,928.82 |    0.1% |      0.91 | `Pow_Shavite512_1M`

X11:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|              224.98 |        4,444,913.98 |    1.1% |      0.01 | `Pow_X11_0032b`
|               90.76 |       11,017,721.52 |    1.7% |      0.01 | `Pow_X11_0080b`
|               57.66 |       17,343,058.47 |    0.6% |      0.01 | `Pow_X11_0128b`
|               15.79 |       63,347,591.25 |    0.3% |      0.01 | `Pow_X11_0512b`
|                8.39 |      119,204,486.51 |    0.7% |      0.01 | `Pow_X11_1024b`
|                4.98 |      200,909,248.89 |    1.1% |      0.01 | `Pow_X11_2048b`
|                1.33 |      751,189,042.44 |    0.2% |      0.32 | `Pow_X11_1M`

AMD Ryzen 5 5600G, Ubuntu 24.04 (Noble)

Build Information: Ubuntu Clang 18.1.8
Depends Config: MULTIPROCESS=1 CC=clang-18 CXX=clang++-18 HOST=x86_64-pc-linux-gnu
Build Config: CC=clang-18 CXX=clang++-18 CFLAGS="-O2 -g" LDFLAGS="-Wl,--as-needed -Wl,-O2" ./configure --prefix=$(pwd)/depends/x86_64-pc-linux-gnu --enable-reduce-exports --without-gui --disable-fuzz-binary --disable-maintainer-mode --disable-dependency-tracking --disable-ccache

Baseline

Echo512:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|               51.81 |       19,300,083.40 |    0.2% |          916.31 |          230.13 |  3.982 |           6.19 |    0.5% |      0.01 | `Pow_Echo512_0032b`
|               20.72 |       48,266,658.57 |    0.1% |          366.56 |           92.01 |  3.984 |           2.49 |    0.5% |      0.01 | `Pow_Echo512_0080b`
|               25.67 |       38,953,174.42 |    0.1% |          456.97 |          113.97 |  4.009 |           2.95 |    0.5% |      0.01 | `Pow_Echo512_0128b`
|               16.00 |       62,491,477.70 |    0.1% |          285.25 |           71.06 |  4.014 |           1.81 |    0.6% |      0.01 | `Pow_Echo512_0512b`
|               14.38 |       69,562,745.47 |    0.1% |          256.63 |           63.83 |  4.020 |           1.63 |    0.6% |      0.01 | `Pow_Echo512_1024b`
|               13.65 |       73,272,852.80 |    0.1% |          242.32 |           60.27 |  4.020 |           1.53 |    0.6% |      0.01 | `Pow_Echo512_2048b`
|               12.84 |       77,877,131.11 |    0.0% |          228.02 |           56.70 |  4.022 |           1.44 |    0.6% |      3.07 | `Pow_Echo512_1M`

Shavite512:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|               26.75 |       37,379,255.99 |    0.0% |          423.31 |          118.82 |  3.563 |           1.41 |    0.0% |      0.01 | `Pow_Shavite512_0032b`
|               10.71 |       93,360,115.43 |    0.0% |          169.35 |           47.57 |  3.560 |           0.58 |    0.0% |      0.01 | `Pow_Shavite512_0080b`
|               13.26 |       75,436,781.19 |    0.0% |          210.04 |           58.87 |  3.568 |           0.45 |    0.0% |      0.01 | `Pow_Shavite512_0128b`
|                8.26 |      121,051,101.55 |    0.0% |          130.99 |           36.69 |  3.570 |           0.27 |    0.0% |      0.01 | `Pow_Shavite512_0512b`
|                7.44 |      134,444,187.57 |    0.0% |          117.82 |           33.04 |  3.566 |           0.24 |    0.4% |      0.01 | `Pow_Shavite512_1024b`
|                7.06 |      141,686,987.23 |    0.2% |          111.23 |           31.18 |  3.568 |           0.23 |    0.2% |      0.01 | `Pow_Shavite512_2048b`
|                6.65 |      150,409,844.26 |    0.1% |          104.65 |           29.37 |  3.563 |           0.21 |    0.0% |      1.60 | `Pow_Shavite512_1M`

X11:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|              326.13 |        3,066,305.67 |    0.0% |        5,979.73 |        1,448.47 |  4.128 |          41.00 |    0.1% |      0.01 | `Pow_X11_0032b`
|              130.47 |        7,664,483.17 |    0.0% |        2,391.79 |          579.50 |  4.127 |          16.38 |    0.1% |      0.01 | `Pow_X11_0080b`
|               83.00 |       12,048,837.15 |    0.0% |        1,516.78 |          368.62 |  4.115 |          10.30 |    0.1% |      0.01 | `Pow_X11_0128b`
|               21.78 |       45,920,017.94 |    0.0% |          395.29 |           96.72 |  4.087 |           2.64 |    0.1% |      0.01 | `Pow_X11_0512b`
|               11.50 |       86,977,849.93 |    0.0% |          208.38 |           51.07 |  4.081 |           1.36 |    0.1% |      0.01 | `Pow_X11_1024b`
|                6.38 |      156,826,545.04 |    0.0% |          114.92 |           28.16 |  4.081 |           0.72 |    0.1% |      0.01 | `Pow_X11_2048b`
|                1.21 |      827,082,388.48 |    0.1% |           21.65 |            5.34 |  4.052 |           0.09 |    0.0% |      0.29 | `Pow_X11_1M`

Optimized

Echo512:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|               17.52 |       57,085,952.31 |    0.0% |          166.19 |           77.77 |  2.137 |           7.88 |    0.0% |      0.01 | `Pow_Echo512_0032b`
|                7.01 |      142,732,714.40 |    0.1% |           66.51 |           31.10 |  2.139 |           3.16 |    0.0% |      0.01 | `Pow_Echo512_0080b`
|                8.69 |      115,084,427.41 |    0.1% |           81.58 |           38.56 |  2.116 |           3.76 |    0.0% |      0.01 | `Pow_Echo512_0128b`
|                5.36 |      186,499,101.72 |    0.1% |           50.51 |           23.79 |  2.123 |           2.31 |    0.1% |      0.01 | `Pow_Echo512_0512b`
|                4.79 |      208,614,198.78 |    0.1% |           45.33 |           21.29 |  2.129 |           2.07 |    0.0% |      0.01 | `Pow_Echo512_1024b`
|                4.51 |      221,584,256.28 |    0.0% |           42.74 |           20.04 |  2.133 |           1.95 |    0.0% |      0.01 | `Pow_Echo512_2048b`
|                4.27 |      234,154,680.31 |    0.0% |           40.15 |           18.95 |  2.119 |           1.83 |    0.0% |      1.02 | `Pow_Echo512_1M`

Shavite512:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|               10.12 |       98,853,454.39 |    0.5% |           69.47 |           44.93 |  1.546 |           8.38 |    0.0% |      0.01 | `Pow_Shavite512_0032b`
|                3.98 |      251,044,756.51 |    0.1% |           27.81 |           17.69 |  1.572 |           3.36 |    0.0% |      0.01 | `Pow_Shavite512_0080b`
|                4.89 |      204,696,173.21 |    0.1% |           32.78 |           21.69 |  1.511 |           3.90 |    0.0% |      0.01 | `Pow_Shavite512_0128b`
|                3.01 |      332,315,588.24 |    0.1% |           20.08 |           13.37 |  1.502 |           2.42 |    0.1% |      0.01 | `Pow_Shavite512_0512b`
|                2.70 |      369,999,483.82 |    0.2% |           17.96 |           12.00 |  1.496 |           2.17 |    0.0% |      0.01 | `Pow_Shavite512_1024b`
|                2.55 |      392,427,831.38 |    0.0% |           16.90 |           11.32 |  1.493 |           2.05 |    0.0% |      0.01 | `Pow_Shavite512_2048b`
|                2.39 |      418,541,658.56 |    0.0% |           15.84 |           10.60 |  1.494 |           1.92 |    0.0% |      0.57 | `Pow_Shavite512_1M`

X11:

|             ns/byte |              byte/s |    err% |        ins/byte |        cyc/byte |    IPC |       bra/byte |   miss% |     total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
|              271.74 |        3,679,981.60 |    0.1% |        4,875.78 |        1,206.89 |  4.040 |          49.66 |    0.0% |      0.01 | `Pow_X11_0032b`
|              108.70 |        9,199,814.12 |    0.2% |        1,950.20 |          482.28 |  4.044 |          19.84 |    0.0% |      0.01 | `Pow_X11_0080b`
|               69.40 |       14,408,573.10 |    0.2% |        1,240.79 |          308.15 |  4.027 |          12.47 |    0.0% |      0.01 | `Pow_X11_0128b`
|               18.33 |       54,546,702.44 |    0.1% |          326.29 |           81.43 |  4.007 |           3.18 |    0.0% |      0.01 | `Pow_X11_0512b`
|                9.78 |      102,262,967.62 |    0.2% |          173.88 |           43.39 |  4.007 |           1.63 |    0.0% |      0.01 | `Pow_X11_1024b`
|                5.48 |      182,512,076.76 |    0.1% |           97.67 |           24.33 |  4.015 |           0.86 |    0.0% |      0.01 | `Pow_X11_2048b`
|                1.20 |      835,292,126.52 |    0.0% |           21.62 |            5.31 |  4.068 |           0.09 |    0.0% |      0.29 | `Pow_X11_1M`

Breaking Changes

Platforms that partially support the set of extensions for x86_64 (SSSE3, SSE4.1 and AES-NI) or ARMv7/ARMv8 (NEON and crypto extensions for AES) may experience different performance characteristics.

The following cases may experience performance degradation due to moving to a small-footprint variant and the overhead (and lost optimization opportunity) incurred by runtime dispatching.

Platforms other than x86_64 or ARMv7 and above (there are no backends that implement optimized routines for them)
x86_64 or ARMv7/ARMv8, without extensions (i.e. older chips, OS/Hypervisor-level disablement of extensions)
Operating systems other than Windows, macOS, Linux and FreeBSD (there are no dispatch routines for other platforms)
Using libdashconsensus (optimizations are disabled wholesale for the library following upstream behavior)

Checklist

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests (note: N/A)
I have made corresponding changes to the documentation (note: N/A)
I have assigned this pull request to a milestone (for repository code-owners and collaborators only)

github-actions · 2025-09-21T17:25:15Z

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

reverts: - f300277 (logically)

coderabbitai · 2025-09-22T09:16:23Z

Walkthrough

Adds an X11 crypto subsystem with runtime dispatch (SapphireAutoDetect) and many architecture-specific backends (ARM AES/NEON, x86 SSSE3/SSE4.1/AVX2/AES-NI/SHA-NI). Build system updates (configure.ac, Makefile.am) probe CPU features and expose per-feature flags and SPHLIB_FLAGS. Several C crypto sources were replaced or moved to C++ (ECHO and SHAvite reworked), new AES utilities and T‑boxes added, and multiple optimized implementations (echo/shavite) for different ISAs introduced. Public APIs for ECHO/SHAvite contexts were tightened to typed context structs. Bench, init, and tests call SapphireAutoDetect; lint excludes the new crypto/x11 tree.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

High heterogeneity: build-system probes, many new feature flags, and large cross-cutting changes.
Dense, security‑sensitive logic: AES/T-box generation, SHAvite/ECHO algorithm rewrites, endian/packing handling.
Multiple ISA‑specific, intrinsics‑heavy files (ARM NEON, ARM AES, SSSE3, AES‑NI) requiring platform knowledge.
API/signature changes and file removals (C→C++ conversions) affect public headers and linkage; needs compatibility and ABI checks.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title accurately summarizes the primary changes in the branch: it adds a runtime dispatcher, implements hardware-accelerated Echo512 and Shavite512 code paths, and notes benchmark-stability adjustments; this directly matches the changeset and is specific enough for a reviewer to understand the main intent. It is a single clear sentence without irrelevant noise.
Description Check	✅ Passed	The pull request description is directly related to the changeset: it documents build-system/configure changes, the runtime dispatcher, hardware-accelerated Echo512/Shavite512 implementations, UBSan alignment handling, macOS detection quirks, benchmark results, and known breaking/performance impacts, all of which appear in the file summaries. The level of detail is specific and actionable and clearly maps to the modified files and objectives.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (12)

test/lint/lint-whitespace.py (1)
21-21: Add trailing slash to the new exclude and confirm scope.

Use a trailing "/" for consistency with most entries and to avoid pathspec edge cases. Also, excluding the entire directory may hide regressions; consider limiting exclusion to the specific subdirs/files that require it, or split tab vs. trailing-whitespace excludes if that was the intent.

Apply this diff:
-                 "src/crypto/x11",
+                 "src/crypto/x11/",
src/crypto/x11/sph_types.h (1)

336-342: Tighten the gating condition (portability/readability).

Use #if defined(SPH_UPTR) && defined(SPH_UNALIGNED) && SPH_UNALIGNED to avoid relying on undefined-macro-to-0 semantics. Also consider #if __has_attribute(no_sanitize) around the expansion site so this doesn’t expand on compilers lacking the attribute.
src/bench/bench_bitcoin.cpp (1)
65-66: Consider adding error handling for auto-detection.

While the current implementation is functional, consider adding error handling or logging for the auto-detection process, similar to how SHA256AutoDetect returns a string describing the selected implementation.

Consider enhancing with diagnostic output:
-    SapphireAutoDetect();
-    SHA256AutoDetect();
+    SapphireAutoDetect();
+    std::string sha256_algo = SHA256AutoDetect();
+    if (!argsman.GetBoolArg("-sanity-check", false)) {
+        tfm::format(std::cout, "Using the '%s' SHA256 implementation\n", sha256_algo);
+    }
This would provide visibility into which implementations are being benchmarked, helpful for performance analysis.
src/crypto/x11/dispatch.cpp (1)
111-113: Consider validating both SSE4.1 and AES-NI are actually available together

While the code checks both bits, it might be worth adding a comment explaining that AES-NI requires SSE4.1 as a prerequisite on x86 platforms.
 #if defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI)
     const bool use_sse_4_1 = ((ecx >> 19) & 1);
     const bool use_aes_ni = ((ecx >> 25) & 1);
+    // AES-NI instructions require SSE4.1 support as a prerequisite
     if (use_sse_4_1 && use_aes_ni) {
src/crypto/x11/arm_neon/echo.cpp (2)
53-79: Consider using std::swap for clarity in shift operations

While the current implementation is correct, using std::swap could make the circular shift operations more readable and potentially allow for compiler optimizations.
+#include <utility>
+
 void ALWAYS_INLINE ShiftRow1(uint8x16_t& Wa, uint8x16_t& Wb, uint8x16_t& Wc, uint8x16_t& Wd)
 {
-    uint8x16_t tmp = Wa;
-    Wa = Wb;
-    Wb = Wc;
-    Wc = Wd;
-    Wd = tmp;
+    uint8x16_t tmp = std::move(Wa);
+    Wa = std::move(Wb);
+    Wb = std::move(Wc);
+    Wc = std::move(Wd);
+    Wd = std::move(tmp);
 }
86-127: Consider loop-based load/store to reduce code duplication

The repetitive load and store operations could be simplified with a loop, improving maintainability without sacrificing performance.
-    w[0] = vreinterpretq_u8_u64(vld1q_u64(&W[0][0]));
-    w[1] = vreinterpretq_u8_u64(vld1q_u64(&W[1][0]));
-    // ... 14 more lines
-    w[15] = vreinterpretq_u8_u64(vld1q_u64(&W[15][0]));
+    for (int i = 0; i < 16; i++) {
+        w[i] = vreinterpretq_u8_u64(vld1q_u64(&W[i][0]));
+    }

     // ... transformation code ...

-    vst1q_u64(&W[0][0], vreinterpretq_u64_u8(w[0]));
-    vst1q_u64(&W[1][0], vreinterpretq_u64_u8(w[1]));
-    // ... 14 more lines
-    vst1q_u64(&W[15][0], vreinterpretq_u64_u8(w[15]));
+    for (int i = 0; i < 16; i++) {
+        vst1q_u64(&W[i][0], vreinterpretq_u64_u8(w[i]));
+    }
src/crypto/x11/ssse3/echo.cpp (1)
86-101: Consider using loops to reduce code duplication

Similar to the ARM NEON version, the repetitive load/store operations could benefit from loop-based implementation.
-    w[0] = _mm_load_si128((const __m128i*)&W[0][0]);
-    w[1] = _mm_load_si128((const __m128i*)&W[1][0]);
-    // ... repeated 14 more times
-    w[15] = _mm_load_si128((const __m128i*)&W[15][0]);
+    for (int i = 0; i < 16; i++) {
+        w[i] = _mm_load_si128((const __m128i*)&W[i][0]);
+    }

     // ... transformation code ...

-    _mm_store_si128((__m128i*)&W[0][0], w[0]);
-    _mm_store_si128((__m128i*)&W[1][0], w[1]);
-    // ... repeated 14 more times
-    _mm_store_si128((__m128i*)&W[15][0], w[15]);
+    for (int i = 0; i < 16; i++) {
+        _mm_store_si128((__m128i*)&W[i][0], w[i]);
+    }
Also applies to: 112-127
src/crypto/x11/dispatch.h (1)
5-6: Header guard naming convention

The header guard uses BITCOIN_ prefix but this is Dash Core code. Consider using DASH_ prefix for consistency.
-#ifndef BITCOIN_CRYPTO_X11_DISPATCH_H
-#define BITCOIN_CRYPTO_X11_DISPATCH_H
+#ifndef DASH_CRYPTO_X11_DISPATCH_H
+#define DASH_CRYPTO_X11_DISPATCH_H
And at the end:
-#endif // BITCOIN_CRYPTO_X11_DISPATCH_H
+#endif // DASH_CRYPTO_X11_DISPATCH_H
src/crypto/x11/shavite.cpp (1)

102-106: Minor: namespace closing comment typo

The closing comment says // namespace shavite_soft but the namespace is soft_shavite. Harmless, but please fix to avoid confusion in future diffs.

Also applies to: 246-248

src/crypto/x11/util/consts_aes.hpp (1)

65-77: T-tables look correct; one small robustness nit

Using std::rotl on uint32_t words to build all four tables is neat. Consider documenting the polynomial (0x11B) and generator (3) choice near gf8_pow/log for maintainers. No change required.

Also applies to: 78-82

configure.ac (1)

680-697: Feature probes look fine; gate cross-compilation assumptions

ARM AES/SHA and CRC tests include intrinsics that require specific -march flags. When cross-compiling, these probes can mis-detect if the target flags are incompatible with the build host. Consider using AC_TRY_LINK with explicit -target or honoring user-provided CXXFLAGS to bypass probes when cross-compiling.

Also applies to: 739-755
src/crypto/x11/util/util.hpp (1)
34-48: Consider adding static_assert for type sizes.

While the implementations look correct, adding compile-time assertions would provide additional safety guarantees for the SIMD types being used.

Add static assertions after the includes to verify expected sizes:
 #if defined(ENABLE_ARM_AES) || defined(ENABLE_ARM_NEON)
 #include <arm_neon.h>
+static_assert(sizeof(uint8x16_t) == 16, "uint8x16_t must be 16 bytes");
+static_assert(sizeof(uint32x4_t) == 16, "uint32x4_t must be 16 bytes");
 #endif // ENABLE_ARM_AES || ENABLE_ARM_NEON
 
 #if defined(ENABLE_SSSE3) || (defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI))
 #include <immintrin.h>
+static_assert(sizeof(__m128i) == 16, "__m128i must be 16 bytes");
 #endif // ENABLE_SSSE3 || (ENABLE_SSE41 && ENABLE_X86_AESNI)
Also applies to: 67-83

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 893b46a and 58965d4.

📒 Files selected for processing (26)

configure.ac (7 hunks)
src/Makefile.am (4 hunks)
src/bench/bench_bitcoin.cpp (2 hunks)
src/bench/pow_hash.cpp (12 hunks)
src/crypto/x11/aes.cpp (1 hunks)
src/crypto/x11/aes.h (1 hunks)
src/crypto/x11/aes_helper.c (0 hunks)
src/crypto/x11/arm_crypto/echo.cpp (1 hunks)
src/crypto/x11/arm_crypto/shavite.cpp (1 hunks)
src/crypto/x11/arm_neon/echo.cpp (1 hunks)
src/crypto/x11/dispatch.cpp (1 hunks)
src/crypto/x11/dispatch.h (1 hunks)
src/crypto/x11/echo.cpp (4 hunks)
src/crypto/x11/shavite.c (0 hunks)
src/crypto/x11/shavite.cpp (1 hunks)
src/crypto/x11/sph_echo.h (4 hunks)
src/crypto/x11/sph_shavite.h (4 hunks)
src/crypto/x11/sph_types.h (9 hunks)
src/crypto/x11/ssse3/echo.cpp (1 hunks)
src/crypto/x11/util/consts_aes.hpp (1 hunks)
src/crypto/x11/util/util.hpp (1 hunks)
src/crypto/x11/x86_aesni/echo.cpp (1 hunks)
src/crypto/x11/x86_aesni/shavite.cpp (1 hunks)
src/init/common.cpp (2 hunks)
src/test/util/setup_common.cpp (2 hunks)
test/lint/lint-whitespace.py (1 hunks)

💤 Files with no reviewable changes (2)

src/crypto/x11/shavite.c
src/crypto/x11/aes_helper.c

🧰 Additional context used

📓 Path-based instructions (4)

src/crypto/{ctaes,x11}/**

📄 CodeRabbit inference engine (CLAUDE.md)

Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11

Files:

src/crypto/x11/aes.cpp
src/crypto/x11/dispatch.cpp
src/crypto/x11/x86_aesni/shavite.cpp
src/crypto/x11/arm_crypto/echo.cpp
src/crypto/x11/dispatch.h
src/crypto/x11/arm_crypto/shavite.cpp
src/crypto/x11/arm_neon/echo.cpp
src/crypto/x11/sph_echo.h
src/crypto/x11/aes.h
src/crypto/x11/x86_aesni/echo.cpp
src/crypto/x11/sph_shavite.h
src/crypto/x11/echo.cpp
src/crypto/x11/util/util.hpp
src/crypto/x11/ssse3/echo.cpp
src/crypto/x11/shavite.cpp
src/crypto/x11/util/consts_aes.hpp
src/crypto/x11/sph_types.h

src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

src/crypto/x11/aes.cpp
src/bench/pow_hash.cpp
src/crypto/x11/dispatch.cpp
src/crypto/x11/x86_aesni/shavite.cpp
src/crypto/x11/arm_crypto/echo.cpp
src/bench/bench_bitcoin.cpp
src/crypto/x11/dispatch.h
src/crypto/x11/arm_crypto/shavite.cpp
src/init/common.cpp
src/test/util/setup_common.cpp
src/crypto/x11/arm_neon/echo.cpp
src/crypto/x11/sph_echo.h
src/crypto/x11/aes.h
src/crypto/x11/x86_aesni/echo.cpp
src/crypto/x11/sph_shavite.h
src/crypto/x11/echo.cpp
src/crypto/x11/util/util.hpp
src/crypto/x11/ssse3/echo.cpp
src/crypto/x11/shavite.cpp
src/crypto/x11/util/consts_aes.hpp
src/crypto/x11/sph_types.h

src/bench/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Performance benchmarks should be placed in src/bench/ and use nanobench

Files:

src/bench/pow_hash.cpp
src/bench/bench_bitcoin.cpp

src/{test,wallet/test,qt/test}/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Unit tests for C++ code should be placed in src/test/, src/wallet/test/, or src/qt/test/ and use Boost::Test or Qt 5 for GUI tests

Files:

src/test/util/setup_common.cpp

🧠 Learnings (1)

📚 Learning: 2025-01-06T09:51:03.167Z

Learnt from: kwvg
PR: dashpay/dash#6516
File: depends/patches/gmp/include_ldflags_in_configure.patch:557-621
Timestamp: 2025-01-06T09:51:03.167Z
Learning: The `GMP_GCC_ARM_UMODSI` macro checks only the compiler version, and `GMP_GCC_MIPS_O32` relies on the `-mabi=32` flag. Therefore, `$LDFLAGS` is irrelevant to these tests.

Applied to files:

configure.ac

🧬 Code graph analysis (17)

src/crypto/x11/dispatch.cpp (3)

src/crypto/x11/echo.cpp (4)

FullStateRound (123-145)

FullStateRound (123-123)

ShiftAndMix (147-154)

ShiftAndMix (147-147)

src/crypto/x11/shavite.cpp (2)

Compress (106-246)

Compress (106-106)

src/compat/cpuid.h (1)

GetCPUID (16-23)

src/crypto/x11/x86_aesni/shavite.cpp (2)

src/crypto/x11/arm_crypto/shavite.cpp (2)

CompressElement (16-30)

CompressElement (16-17)

src/crypto/x11/shavite.cpp (2)

CompressElement (72-99)

CompressElement (72-73)

src/crypto/x11/arm_crypto/echo.cpp (1)

src/crypto/x11/x86_aesni/echo.cpp (2)

StateRound (15-30)

StateRound (15-15)

src/bench/bench_bitcoin.cpp (1)

src/crypto/x11/dispatch.cpp (2)

SapphireAutoDetect (100-180)

SapphireAutoDetect (100-100)

src/crypto/x11/dispatch.h (1)

src/crypto/x11/dispatch.cpp (2)

SapphireAutoDetect (100-180)

SapphireAutoDetect (100-100)

src/crypto/x11/arm_crypto/shavite.cpp (2)

src/crypto/x11/shavite.cpp (2)

CompressElement (72-99)

CompressElement (72-73)

src/crypto/x11/x86_aesni/shavite.cpp (2)

CompressElement (17-30)

CompressElement (17-18)

src/init/common.cpp (1)

src/crypto/x11/dispatch.cpp (2)

SapphireAutoDetect (100-180)

SapphireAutoDetect (100-100)

src/test/util/setup_common.cpp (1)

src/crypto/x11/dispatch.cpp (2)

SapphireAutoDetect (100-180)

SapphireAutoDetect (100-100)

src/crypto/x11/arm_neon/echo.cpp (1)

src/crypto/x11/util/util.hpp (4)

Xor (34-34)

Xor (34-34)

Xor (67-67)

Xor (67-67)

src/crypto/x11/sph_echo.h (1)

src/crypto/x11/echo.cpp (6)

sph_echo512_init (327-331)

sph_echo512_init (328-328)

sph_echo512 (334-338)

sph_echo512 (335-335)

sph_echo512_close (341-345)

sph_echo512_close (342-342)

src/crypto/x11/aes.h (1)

src/crypto/x11/dispatch.h (1)

sapphire (12-19)

src/crypto/x11/x86_aesni/echo.cpp (1)

src/crypto/x11/arm_crypto/echo.cpp (2)

StateRound (14-29)

StateRound (14-14)

src/crypto/x11/sph_shavite.h (1)

src/crypto/x11/shavite.cpp (6)

sph_shavite512_init (342-346)

sph_shavite512_init (343-343)

sph_shavite512 (349-353)

sph_shavite512 (350-350)

sph_shavite512_close (356-361)

sph_shavite512_close (357-357)

src/crypto/x11/echo.cpp (4)

src/crypto/x11/ssse3/echo.cpp (4)

MixColumn (28-51)

MixColumn (28-28)

ShiftAndMix (83-128)

ShiftAndMix (83-83)

src/crypto/x11/aes.cpp (4)

Round (12-21)

Round (12-14)

RoundKeyless (23-27)

RoundKeyless (23-24)

src/crypto/x11/dispatch.cpp (6)

FullStateRound (43-43)

FullStateRound (64-64)

FullStateRound (73-73)

ShiftAndMix (52-52)

ShiftAndMix (58-58)

ShiftAndMix (74-74)

src/crypto/x11/x86_aesni/echo.cpp (2)

FullStateRound (34-68)

FullStateRound (34-34)

src/crypto/x11/ssse3/echo.cpp (2)

src/crypto/x11/util/util.hpp (4)

Xor (34-34)

Xor (34-34)

Xor (67-67)

Xor (67-67)

src/crypto/x11/echo.cpp (2)

MixColumn (56-77)

MixColumn (56-56)

src/crypto/x11/shavite.cpp (2)

src/crypto/x11/aes.cpp (2)

RoundKeyless (23-27)

RoundKeyless (23-24)

src/crypto/x11/sph_types.h (1)

sph_enc32le (605-633)

src/crypto/x11/util/consts_aes.hpp (1)

src/crypto/x11/util/util.hpp (6)

pack_le (24-30)

pack_le (24-24)

pack_le (36-39)

pack_le (36-36)

pack_le (72-75)

pack_le (72-72)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: linux64_tsan-test / Test source
GitHub Check: linux64_multiprocess-build / Build source

🔇 Additional comments (44)

src/crypto/x11/sph_types.h (1)

485-486: Attribute placement is valid — UBSan CI uses Clang

ci/test/00_setup_env_native_ubsan.sh sets BITCOIN_CONFIG with --with-sanitizers=undefined and CC=clang-18 CXX=clang++-18 (line 14), so UBSan jobs in CI use Clang; no GCC-based UBSan job was found in the inspected CI scripts.

src/bench/pow_hash.cpp (1)

31-31: Good addition for benchmark stability.

The consistent addition of minEpochIterations(20) across all benchmark functions is a solid improvement for measurement stability. This ensures each benchmark runs for at least 20 epochs before collecting results, which helps reduce variance from startup effects and provides more reliable measurements.

Also applies to: 41-41, 53-53, 65-65, 77-77, 89-89, 101-101, 113-113, 125-125, 137-137, 149-149, 161-161

src/test/util/setup_common.cpp (1)

16-16: LGTM! Consistent initialization of Sapphire dispatch.

The addition of SapphireAutoDetect() before SHA256AutoDetect() properly initializes the X11 crypto dispatch mechanism for tests. The ordering is appropriate as it mirrors the production initialization sequence.

Also applies to: 182-182

src/init/common.cpp (1)

12-12: LGTM! Proper initialization order for crypto dispatchers.

The addition of SapphireAutoDetect() at the beginning of SetGlobals() ensures X11 crypto optimizations are detected and initialized early in the startup sequence, before SHA256 detection. This is the correct initialization order.

Also applies to: 30-30

src/bench/bench_bitcoin.cpp (1)

9-9: LGTM! Consistent initialization across all entry points.

The addition of SapphireAutoDetect() before SHA256AutoDetect() ensures benchmarks use the same optimized crypto implementations as production code. This is essential for accurate performance measurements.

Also applies to: 65-65
src/crypto/x11/sph_shavite.h (2)
1-117: Extern "C" linkage removed — verify ABI compatibility

The header src/crypto/x11/sph_shavite.h no longer wraps declarations with extern "C"; this introduces C++ name mangling and can break any C consumers or public-API users.

Confirm no C consumers: grep -R --include='*.c' -n 'sph_shavite.h' .

Confirm no references to these symbols anywhere (repo or installed/public headers): rg -n 'sph_shavite512|sph_shavite512_init|sph_shavite512_close|sph_shavite512_addbits_and_close' . || true

If any C or external/public users exist, re-add the C linkage wrapper:
#ifdef __cplusplus
extern "C" {
#endif
...declarations...
#ifdef __cplusplus
}
#endif

78-78: Type-safety change accepted — callers not verified

Change from void* to sph_shavite512_context* is correct. The provided rg produced no output; unable to confirm all call sites were updated. Re-run these checks locally and paste results:
rg -n -C3 -uu -P "sph_shavite512_(init|close|addbits_and_close)\s*\("
rg -n -uu "sph_shavite512_"
rg -n -uu "sph_shavite512_context"
src/crypto/x11/arm_crypto/echo.cpp (1)

1-72: Vendored crypto implementation - no changes allowed

Per the coding guidelines, files in src/crypto/x11/** are vendored dependencies and should not be modified. This restriction applies to all crypto implementations in this directory.

src/crypto/x11/aes.cpp (1)

1-30: Vendored crypto implementation - no changes allowed

Per the coding guidelines, files in src/crypto/x11/** are vendored dependencies and should not be modified.

src/crypto/x11/aes.h (1)

1-21: Vendored crypto implementation - no changes allowed

Per the coding guidelines, files in src/crypto/x11/** are vendored dependencies and should not be modified.

src/crypto/x11/x86_aesni/shavite.cpp (1)

1-186: Vendored crypto implementation - no changes allowed

Per the coding guidelines, files in src/crypto/x11/** are vendored dependencies and should not be modified.

src/crypto/x11/x86_aesni/echo.cpp (1)

1-73: Vendored crypto implementation - no changes allowed

Per the coding guidelines, files in src/crypto/x11/** are vendored dependencies and should not be modified.

src/crypto/x11/dispatch.cpp (5)

1-2: Copyright year inconsistency flagged

The copyright year states 2025, but based on the PR summary (created 2025-09-21) and current context (September 2025), this appears to be accurate. However, I want to verify this is intentional.

85-90: Good error handling for sysctlbyname

The implementation correctly checks for sysctlbyname success and verifies the value is non-zero, providing proper feature detection on macOS.

100-180: Well-structured platform detection and dispatch initialization

The auto-detection function properly initializes with soft implementations as defaults, then conditionally upgrades to optimized versions based on runtime feature detection. The platform-specific detection code is properly guarded with appropriate preprocessor directives.

132-133: Good coverage for macOS NEON detection quirks

The code properly handles the macOS sysctl naming inconsistency for NEON/AdvSIMD detection, including the less common "hw.optional.arm.AdvSIMD" key. The reference to the cpu_features issue #390 provides valuable context.

96-99: Incorrect — definitions exist for these externs; do not change to static

Definitions found: src/crypto/x11/echo.cpp (lines 158–159) define echo_round and echo_shift_mix; src/crypto/x11/shavite.cpp (line 250) defines shavite_c512. Leave the externs as-is.

Likely an incorrect or invalid review comment.

src/crypto/x11/arm_neon/echo.cpp (2)

15-26: Correct GF(2^8) multiplication implementation

The implementation correctly performs multiplication by x (i.e., multiplication by 2) in GF(2^8) with the reduction polynomial x^8 + x^4 + x^3 + x + 1 (represented by 0x1b). The bit manipulation and conditional XOR are properly implemented.

28-51: Well-optimized MixColumn operation

The MixColumn implementation efficiently uses NEON intrinsics and follows the correct AES MixColumns-like transformation pattern. The operations are properly sequenced and the XOR operations are correctly nested.

src/crypto/x11/ssse3/echo.cpp (2)

15-26: Efficient GF(2^8) multiplication using SSE

Good use of _mm_add_epi8 for left shift (doubling) instead of a shift instruction, which is efficient for byte-wise operations in SSE.

86-127: Resolved — W is 16‑byte aligned where allocated.

The state array is declared with alignas(16) (DECL_STATE_BIG → src/crypto/x11/echo.cpp:162), so uses of _mm_load_si128/_mm_store_si128 are safe.

src/crypto/x11/dispatch.h (1)

14-17: Function pointer typedefs are well-defined

The dispatch function pointer types properly match the expected signatures for the Echo and Shavite operations. Good use of typedefs for clarity.

src/crypto/x11/arm_crypto/shavite.cpp (4)

42-55: Proper endianness handling

The code correctly handles both little-endian and big-endian systems, with appropriate byte-order conversion when needed.

58-119: Complex key schedule generation with proper state injection

The key schedule generation correctly implements the Shavite-512 specification with appropriate counter injections at rounds 32, 164, 316, and 440. The logic matches the reference implementation.

149-155: Macro usage for WROT is appropriate here

While macros are generally discouraged in modern C++, the WROT macro is well-scoped (defined and undefined within the function) and provides clear intent for the rotation operation.

121-123: Verify alignment of vld1q_u32 operation

The vld1q_u32 operation requires proper alignment. Since rk is declared with alignas(16), this should be fine, but verify that the indexing maintains alignment.

The code looks correct since rk is 16-byte aligned and we're loading at 4-word (16-byte) boundaries.

src/crypto/x11/shavite.cpp (2)

250-251: Nice: clear dispatch hook for software backend

Binding shavite_c512 to sapphire::soft_shavite::Compress cleanly separates dispatch from the implementation.

299-339: UBSan alignment: write path uses sph_enc32le; confirm alignment suppression matches SPH_UPTR policy

Given the PR notes about UBSan alignment, this close path writes into a byte buffer via sph_enc32le. That’s fine, but please confirm the build toggles (SPH_UPTR/SPH_UNALIGNED/UBSan suppress) are consistently defined so sanitizer builds won’t flag false positives on platforms that allow unaligned accesses.

src/crypto/x11/util/consts_aes.hpp (1)

48-64: Good: constexpr S-box generation with affine transform

Computing the S-box at compile time and validating with a static_assert improves trust and portability.

configure.ac (2)

381-394: SPHLIB_FLAGS introduced; verify it’s actually used for x11/sph sources

You set SPHLIB_FLAGS (e.g., -O3) and export it via AC_SUBST, but ensure src/Makefile.am consumes it for the relevant objects; otherwise it’s dead.

Also applies to: 1957-1969

1896-1906: AM_CONDITIONALs exported — ensure Makefile.am uses them to split per-ISA objects

These conditionals are defined; verify the per-backend objects/libraries are actually wrapped in the corresponding if ENABLE_* blocks so we don’t compile incompatible TU sets by default.

src/crypto/x11/echo.cpp (4)

123-145: Round/key schedule update logic looks correct

Two AES-based rounds per state element with T32 counter increments per block aligns with ECHO’s spec. Good use of soft_aes::Round/Keyless.

147-155: State mixing is clear; alignment handled

ShiftRows + MixColumn on alignas(16) W is straightforward, and the dispatch indirection via echo_round/echo_shift_mix is clean.

Also applies to: 161-178

164-173: Sanity: FINAL_BIG uses sph_dec64le_aligned on sc->buf — confirm SPH_ macros guard unaligned reads*

This mirrors SPH’s original pattern. Given the PR’s UBSan notes, please confirm sph_dec64le_aligned resolves to a safe path on platforms without guaranteed alignment.

Also applies to: 180-189

328-352: API refactor to typed context is consistent with header

The updated definitions match sph_echo.h’s typed signatures and avoid void*.

src/crypto/x11/sph_echo.h (1)

79-100: ABI change: C linkage removed and signatures tightened — confirm external C consumers

sph_echo512 is defined in src/crypto/x11/echo.cpp and used only from C++ in this repo (src/hash_x11.h, src/bench/pow_hash.cpp); no .c callers found. If this header is part of your public C ABI, restore extern "C" wrappers or document the ABI break in the release notes.

src/crypto/x11/util/util.hpp (3)

24-30: LGTM! Portable byte packing utility is well-implemented.

The constexpr pack_le function correctly implements little-endian packing with appropriate type casts to avoid implicit conversion warnings.

50-62: Efficient ARM AES round implementation.

The implementation correctly emulates x86 AESENC using ARM intrinsics as described in the referenced blog post. The no-key variant (aes_round_nk) appropriately optimizes by skipping the XOR operation.

72-75: Ordering verified — no change required.
_mm_set_epi32(w3, w2, w1, w0) produces 32-bit lanes 0..3 = w0..w3; ARM's uint32x4_t{w0,w1,w2,w3} and the corresponding unpack (extract/lane 0..3) match, so behaviour is consistent across implementations.

src/Makefile.am (5)

747-749: Correct usage of SPHLIB_FLAGS for small footprint compilation.

The SPH crypto library is correctly configured with SPHLIB_FLAGS and small footprint options for CubeHash and JH, which is appropriate for resource-constrained environments.

786-794: Well-structured platform-specific crypto backends.

The new crypto backend libraries are properly configured with:

Correct static linking flags

Appropriate platform-specific compiler flags (ARM_AES, ARM_NEON, SSSE3, X86_AESNI)

Proper preprocessor definitions for feature detection

The separation of concerns with distinct libraries per instruction set is clean and maintainable.

Also applies to: 797-804, 807-814, 817-824

1090-1094: DISABLE_OPTIMIZED_SHA256 correctly applied to libdashconsensus.

The consensus library appropriately disables optimizations to ensure deterministic behavior across platforms, which is critical for consensus code.

754-782: Resolve: X11 crypto sources converted to .cpp — no leftover .c references found.

Search shows aes.cpp, echo.cpp, shavite.cpp under src/crypto/x11 (and arch-specific subdirs); the only regex hits were echo.capnp files (unrelated).

78-105: Verify conditional compilation hierarchy for crypto extensions.

configure.ac declares independent AM_CONDITIONALs for ENABLE_X86_AESNI and ENABLE_X86_SHANI (and matching src/crypto/x11 modules), but src/Makefile.am nests AESNI/SHANI inside the ENABLE_SSE41 block — this mismatch can prevent AESNI/SHANI from being built when configure enables them but SSE4.1 is disabled. Align intent: either lift ENABLE_X86_AESNI/ENABLE_X86_SHANI to top-level conditionals in Makefile.am or make configure.ac only enable them when SSE4.1 is present.

Location: src/Makefile.am (lines 78–105); configure.ac AM_CONDITIONALs.

configure.ac

coderabbitai · 2025-09-22T09:16:27Z

src/crypto/x11/arm_crypto/shavite.cpp

+    alignas(16) uint8x16_t rk_words[448/4];
+    alignas(16) uint32_t rk[448];


⚠️ Potential issue

Large stack allocations may cause issues

The function allocates ~7KB on the stack (448 * 16 bytes for rk_words + 448 * 4 bytes for rk). This could cause stack overflow on embedded systems or threads with limited stack space.

Consider using heap allocation or reducing the stack footprint:

- alignas(16) uint8x16_t rk_words[448/4]; - alignas(16) uint32_t rk[448]; + // Consider using std::vector or std::unique_ptr for large allocations + alignas(16) std::array<uint8x16_t, 448/4> rk_words; + alignas(16) std::array<uint32_t, 448> rk;

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/crypto/x11/arm_crypto/shavite.cpp around lines 39-40, the two large stack arrays (alignas(16) uint8x16_t rk_words[448/4] and alignas(16) uint32_t rk[448]) allocate roughly 7KB on the stack which can overflow constrained stacks; change them to heap or static storage and preserve alignment: allocate aligned memory on the heap (e.g., posix_memalign/aligned_alloc or aligned new / std::aligned_alloc wrapped in RAII/unique_ptr) or make them function-static if thread-safety/recency allows, and ensure you free or use RAII to avoid leaks; update usages to reference the allocated buffers and handle allocation failure paths.

coderabbitai · 2025-09-22T09:16:27Z

src/crypto/x11/echo.cpp

+#include <crypto/x11/aes.h>
+#include <crypto/x11/dispatch.h>

-#include "sph_echo.h"
+#include <attributes.h>

-#ifdef __cplusplus
-extern "C"{
-#endif
+#include <cstddef>
+#include <cstring>

-#if SPH_SMALL_FOOTPRINT && !defined SPH_SMALL_FOOTPRINT_ECHO
-#define SPH_SMALL_FOOTPRINT_ECHO   1
-#endif
+#include "sph_echo.h"


⚠️ Potential issue

Policy concern: modifying vendored x11 code

Per coding guidelines, src/crypto/x11 is treated as vendored. This file significantly refactors that code. If this is an intentional fork, please document the divergence and ensure future upstream sync isn’t expected, or move wrappers/dispatch into a non-vendored path while keeping upstream files pristine.

Also applies to: 53-57

🤖 Prompt for AI Agents

In src/crypto/x11/echo.cpp around lines 33-41 (and similarly for changes at lines 53-57), you refactored code inside the vendored src/crypto/x11 tree; either treat this as an intentional fork and document the divergence or move your changes out of the vendored path into a non-vendored wrapper/dispatch layer. Fix by restoring the original upstream echo.cpp in src/crypto/x11, create a new file under a non-vendored directory (e.g., src/crypto/x11_wrappers/ or src/crypto/dispatch/) that implements your refactor/dispatch, update build files to prefer the wrapper, and add a short note in the repo docs (README or CONTRIBUTORS) describing the fork/divergence and maintenance expectations if you intend to keep the vendored file modified.

coderabbitai · 2025-09-22T09:16:27Z

src/crypto/x11/sph_types.h

+#if defined(__clang__)
+#define SPH_NO_SANITIZE(...) __attribute__((no_sanitize(__VA_ARGS__)))
+#else
+#define SPH_NO_SANITIZE(...)
+#endif // __clang__
+


⚠️ Potential issue

Do not patch vendored x11 headers; handle UBSan at build level instead.

This lives under src/crypto/x11 (vendored). Per guidelines, we should not modify vendored code. Please revert these additions and suppress alignment UBSan outside this header.

Safer alternatives:

Preferred: compile only the SPH/X11 TUs with -fno-sanitize=alignment when building with UBSan.

Or: in sanitized builds, force -DSPH_UNALIGNED=0 for those TUs to take the bytewise paths (slower but sanitizer-clean).

If this change is nevertheless kept (not recommended here), at minimum: guard with #if __has_attribute(no_sanitize) and #ifndef SPH_NO_SANITIZE to avoid overriding an externally provided definition and to cover older Clang variants.

🤖 Prompt for AI Agents

In src/crypto/x11/sph_types.h around lines 330 to 335, revert the added SPH_NO_SANITIZE macro (do not modify vendored x11 headers) and instead handle UBSan at build time: either compile the SPH/X11 translation units with -fno-sanitize=alignment when UBSan is enabled, or for sanitized builds add a TU-level define -DSPH_UNALIGNED=0 to force bytewise (unaligned-safe) code paths; if you must keep a macro in-source (not recommended), ensure you wrap it with checks for __has_attribute(no_sanitize) and guard with #ifndef SPH_NO_SANITIZE so you don’t override external definitions.

src/crypto/x11/util/util.hpp

coderabbitai · 2025-09-22T09:16:27Z

src/Makefile.am

+crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS = $(AM_CXXFLAGS) $(PIE_FLAGS) -static
+crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS = $(AM_CPPFLAGS)
+crypto_libbitcoin_crypto_x86_aesni_la_CXXFLAGS += $(X86_AESNI_CXXFLAGS)
+crypto_libbitcoin_crypto_x86_aesni_la_CPPFLAGS += -DENABLE_SSE41 -DENABLE_X86_AESNI


🛠️ Refactor suggestion

🧩 Analysis chain

Compound preprocessor definition needs attention.

The X86 AES-NI module defines both ENABLE_SSE41 and ENABLE_X86_AESNI, indicating a dependency on SSE4.1 for the AES-NI implementation. This is architecturally correct as AES-NI requires SSE4.1, but ensure this dependency is documented.

🏁 Script executed:

#!/bin/bash # Verify SSE4.1 is properly checked before enabling X86 AES-NI in configure echo "Checking configure.ac for SSE4.1/AES-NI dependency..." rg -A10 -B10 "X86_AESNI" configure.ac | grep -E "(SSE41|SSE4\.1)"

Length of output: 227

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Search repo for SSE4/AES-NI mentions in docs (README, INSTALL, docs/, doc/, CHANGELOG, NEWS):" rg -n -S -i 'sse4|sse4\.1|sse41|aes-?ni|aesni' README* INSTALL* docs/** doc/** CHANGELOG* NEWS* || true echo echo "Search whole repo (excluding common noise dirs) for SSE4/AES-NI:" rg -n -S -i 'sse4|sse4\.1|sse41|aes-?ni|aesni' --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' || true echo echo "Show configure.ac context for SSE41/X86_AESNI:" rg -n -C3 -S -i 'x86_aesni|aesni|aes-?ni|sse4|sse4\.1|sse41' configure.ac || true echo echo "Show src/Makefile.am lines 800-840:" sed -n '800,840p' src/Makefile.am || true

Length of output: 7068

Document SSE4.1 dependency for AES-NI support
Configure.ac correctly checks for SSE4.1 intrinsics before enabling AES-NI (AX_CHECK_COMPILE_FLAG for -msse4.1 then AES-NI check with -msse4.1 -maes), but I found no mention of this requirement in INSTALL or README. Add a note that building with AES-NI support requires SSE4.1 (compiler and hardware support).

🤖 Prompt for AI Agents

In src/Makefile.am around line 821, the build enables AES-NI but Configure.ac only enables that when SSE4.1 intrinsics are available; update project documentation to state that building with AES-NI support requires SSE4.1 (both compiler support and CPU hardware), and show the relevant compiler flag and configure behavior. Edit INSTALL and README to add a short note that AES-NI support requires SSE4.1 (e.g., compilers must accept -msse4.1 and CPUs must expose SSE4.1), mention that the autoconf checks will pass -msse4.1 during detection and that builds will fall back to non-AES-NI if SSE4.1 is absent, and include guidance on forcing or disabling AES-NI via configure flags or environment variables.

Also lets us get rid of an extra unnecessary pack/unpack

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

src/crypto/x11/x86_aesni/echo.cpp (2)
15-30: Consider loop unrolling for better performance.

The StateRound helper function is well-implemented. However, since you're already unrolling the loop in FullStateRound (lines 36-67), you might consider using compiler pragmas or attributes to further optimize the AES operations. Benchmark the current implementation against a version with vectorization hints to confirm whether additional optimization is beneficial.

34-68: Consider reducing code duplication with a macro or template.

The unrolled loop pattern (pack_le → StateRound) is repeated 16 times. While manual unrolling can improve performance, it makes the code harder to maintain.

Consider using a macro or constexpr loop to reduce duplication while maintaining performance:
 void FullStateRound(uint64_t W[16][2], uint32_t& k0, uint32_t& k1, uint32_t& k2, uint32_t& k3)
 {
-    __m128i key = util::pack_le(k0, k1, k2, k3);
-    StateRound(W, 0, key, k0, k1, k2, k3);
-    key = util::pack_le(k0, k1, k2, k3);
-    StateRound(W, 1, key, k0, k1, k2, k3);
-    // ... repeated 14 more times
+    #define PROCESS_ROUND(idx) \
+        do { \
+            __m128i key = util::pack_le(k0, k1, k2, k3); \
+            StateRound(W, idx, key, k0, k1, k2, k3); \
+        } while(0)
+    
+    PROCESS_ROUND(0);
+    PROCESS_ROUND(1);
+    PROCESS_ROUND(2);
+    // ... etc
+    #undef PROCESS_ROUND
 }
src/crypto/x11/dispatch.cpp (1)
148-159: Verify FreeBSD elf_aux_info error handling pattern.

The FreeBSD code correctly checks the return value of elf_aux_info. However, the pattern differs slightly between arm and aarch64 - the arm version combines the check in one line while aarch64 splits it. Both are correct but consider using consistent style.

Consider using the same pattern for both architectures for consistency:
 #if defined(__arm__)
-    have_arm_aes = ((::elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2)) == 0) && ((hwcap2 & HWCAP2_AES) != 0));
-    have_arm_neon = ((::elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap)) == 0) && ((hwcap & HWCAP_NEON) != 0));
+    if (::elf_aux_info(AT_HWCAP2, &hwcap2, sizeof(hwcap2)) == 0) {
+        have_arm_aes = ((hwcap2 & HWCAP2_AES) != 0);
+    }
+    if (::elf_aux_info(AT_HWCAP, &hwcap, sizeof(hwcap)) == 0) {
+        have_arm_neon = ((hwcap & HWCAP_NEON) != 0);
+    }
 #endif // __arm__
src/crypto/x11/arm_crypto/shavite.cpp (1)
110-119: Potential off-by-one in loop termination check.

The inner loop has a redundant break condition at line 116-117 that will never be reached since the outer loop already breaks when u == 448 at line 109.

The inner break is unreachable. Consider removing it:
         for (int s = 0; s < 8; s++) {
             rk[u + 0] = rk[u - 32] ^ rk[u - 7];
             rk[u + 1] = rk[u - 31] ^ rk[u - 6];
             rk[u + 2] = rk[u - 30] ^ rk[u - 5];
             rk[u + 3] = rk[u - 29] ^ rk[u - 4];
             u += 4;
-            if (u == 448)
-                break;
         }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58965d4 and 005967b.

📒 Files selected for processing (24)

configure.ac (7 hunks)
src/Makefile.am (4 hunks)
src/bench/bench_bitcoin.cpp (2 hunks)
src/crypto/x11/aes.cpp (1 hunks)
src/crypto/x11/aes.h (1 hunks)
src/crypto/x11/aes_helper.c (0 hunks)
src/crypto/x11/arm_crypto/echo.cpp (1 hunks)
src/crypto/x11/arm_crypto/shavite.cpp (1 hunks)
src/crypto/x11/arm_neon/echo.cpp (1 hunks)
src/crypto/x11/dispatch.cpp (1 hunks)
src/crypto/x11/dispatch.h (1 hunks)
src/crypto/x11/echo.cpp (4 hunks)
src/crypto/x11/shavite.c (0 hunks)
src/crypto/x11/shavite.cpp (1 hunks)
src/crypto/x11/sph_echo.h (4 hunks)
src/crypto/x11/sph_shavite.h (4 hunks)
src/crypto/x11/ssse3/echo.cpp (1 hunks)
src/crypto/x11/util/consts_aes.hpp (1 hunks)
src/crypto/x11/util/util.hpp (1 hunks)
src/crypto/x11/x86_aesni/echo.cpp (1 hunks)
src/crypto/x11/x86_aesni/shavite.cpp (1 hunks)
src/init/common.cpp (2 hunks)
src/test/util/setup_common.cpp (2 hunks)
test/lint/lint-whitespace.py (1 hunks)

💤 Files with no reviewable changes (2)

src/crypto/x11/shavite.c
src/crypto/x11/aes_helper.c

🚧 Files skipped from review as they are similar to previous changes (12)

src/test/util/setup_common.cpp
src/bench/bench_bitcoin.cpp
src/init/common.cpp
src/crypto/x11/aes.h
src/crypto/x11/x86_aesni/shavite.cpp
src/crypto/x11/util/consts_aes.hpp
src/crypto/x11/aes.cpp
src/crypto/x11/arm_neon/echo.cpp
src/crypto/x11/arm_crypto/echo.cpp
src/crypto/x11/dispatch.h
src/crypto/x11/sph_shavite.h
test/lint/lint-whitespace.py

🧰 Additional context used

📓 Path-based instructions (2)

src/crypto/{ctaes,x11}/**

📄 CodeRabbit inference engine (CLAUDE.md)

Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11

Files:

src/crypto/x11/arm_crypto/shavite.cpp
src/crypto/x11/x86_aesni/echo.cpp
src/crypto/x11/echo.cpp
src/crypto/x11/util/util.hpp
src/crypto/x11/dispatch.cpp
src/crypto/x11/shavite.cpp
src/crypto/x11/sph_echo.h
src/crypto/x11/ssse3/echo.cpp

src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

src/crypto/x11/arm_crypto/shavite.cpp
src/crypto/x11/x86_aesni/echo.cpp
src/crypto/x11/echo.cpp
src/crypto/x11/util/util.hpp
src/crypto/x11/dispatch.cpp
src/crypto/x11/shavite.cpp
src/crypto/x11/sph_echo.h
src/crypto/x11/ssse3/echo.cpp

🧠 Learnings (2)

📚 Learning: 2025-07-20T18:42:49.794Z

Learnt from: CR
PR: dashpay/dash#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-20T18:42:49.794Z
Learning: Applies to src/crypto/{ctaes,x11}/** : Do not make changes under any circumstances to vendored dependencies in src/crypto/ctaes and src/crypto/x11

Applied to files:

src/crypto/x11/echo.cpp

📚 Learning: 2025-01-06T09:51:03.167Z

Learnt from: kwvg
PR: dashpay/dash#6516
File: depends/patches/gmp/include_ldflags_in_configure.patch:557-621
Timestamp: 2025-01-06T09:51:03.167Z
Learning: The `GMP_GCC_ARM_UMODSI` macro checks only the compiler version, and `GMP_GCC_MIPS_O32` relies on the `-mabi=32` flag. Therefore, `$LDFLAGS` is irrelevant to these tests.

Applied to files:

configure.ac

🧬 Code graph analysis (7)

src/crypto/x11/arm_crypto/shavite.cpp (2)

src/crypto/x11/shavite.cpp (2)

CompressElement (72-99)

CompressElement (72-73)

src/crypto/x11/x86_aesni/shavite.cpp (2)

CompressElement (17-30)

CompressElement (17-18)

src/crypto/x11/x86_aesni/echo.cpp (1)

src/crypto/x11/arm_crypto/echo.cpp (2)

StateRound (14-29)

StateRound (14-14)

src/crypto/x11/echo.cpp (4)

src/crypto/x11/ssse3/echo.cpp (4)

MixColumn (28-51)

MixColumn (28-28)

ShiftAndMix (83-128)

ShiftAndMix (83-83)

src/crypto/x11/aes.cpp (4)

Round (12-21)

Round (12-14)

RoundKeyless (23-27)

RoundKeyless (23-24)

src/crypto/x11/dispatch.cpp (6)

FullStateRound (43-43)

FullStateRound (64-64)

FullStateRound (73-73)

ShiftAndMix (52-52)

ShiftAndMix (58-58)

ShiftAndMix (74-74)

src/crypto/x11/x86_aesni/echo.cpp (2)

FullStateRound (34-68)

FullStateRound (34-34)

src/crypto/x11/dispatch.cpp (3)

src/crypto/x11/echo.cpp (4)

FullStateRound (123-145)

FullStateRound (123-123)

ShiftAndMix (147-154)

ShiftAndMix (147-147)

src/crypto/x11/shavite.cpp (2)

Compress (106-246)

Compress (106-106)

src/compat/cpuid.h (1)

GetCPUID (16-23)

src/crypto/x11/shavite.cpp (3)

src/crypto/x11/aes.cpp (2)

RoundKeyless (23-27)

RoundKeyless (23-24)

src/crypto/x11/arm_crypto/shavite.cpp (2)

Compress (34-181)

Compress (34-34)

src/crypto/x11/sph_types.h (1)

sph_enc32le (605-633)

src/crypto/x11/sph_echo.h (1)

src/crypto/x11/echo.cpp (6)

sph_echo512_init (327-331)

sph_echo512_init (328-328)

sph_echo512 (334-338)

sph_echo512 (335-335)

sph_echo512_close (341-345)

sph_echo512_close (342-342)

src/crypto/x11/ssse3/echo.cpp (2)

src/crypto/x11/util/util.hpp (4)

Xor (34-34)

Xor (34-34)

Xor (67-67)

Xor (67-67)

src/crypto/x11/echo.cpp (2)

MixColumn (56-77)

MixColumn (56-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: x86_64-pc-linux-gnu_multiprocess / Build depends
GitHub Check: x86_64-apple-darwin / Build depends
GitHub Check: x86_64-w64-mingw32 / Build depends
GitHub Check: Lint / Run linters
GitHub Check: x86_64-pc-linux-gnu_nowallet / Build depends
GitHub Check: x86_64-pc-linux-gnu / Build depends
GitHub Check: arm-linux-gnueabihf / Build depends

🔇 Additional comments (26)

src/crypto/x11/x86_aesni/echo.cpp (1)

1-12: LGTM! Well-structured X86 AES-NI implementation.

The file header, includes, and guard conditions are properly set up for the X86 AES-NI backend.
src/crypto/x11/util/util.hpp (4)
66-66: [Past review issue still present] Fix missing parentheses in preprocessor condition.

The condition is missing parentheses around the && expression, which could lead to unexpected evaluation.
-#if defined(ENABLE_SSSE3) || defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI)
+#if defined(ENABLE_SSSE3) || (defined(ENABLE_SSE41) && defined(ENABLE_X86_AESNI))
13-20: LGTM! Proper platform-specific header inclusion.

The conditional inclusion of ARM and x86 headers is well-structured with appropriate feature guards.

24-30: Well-implemented generic byte packing utility.

The constexpr pack_le function provides a clean, portable implementation for little-endian byte packing that can be evaluated at compile time when possible.

50-63: Excellent ARM AES intrinsics implementation.

The ARM-specific AES round functions correctly follow the emulation pattern for x86 AES intrinsics on ARMv8-A as documented in the referenced blog post. The separate aes_round_nk optimization for keyless rounds is a nice touch.
configure.ac (4)
538-547: LGTM! Feature flags properly initialized.

The new per-feature enable flags are correctly initialized to "no" as a safe default before probing.

718-737: Well-structured ARM NEON detection loop.

The loop correctly tries multiple compiler flag combinations to detect NEON support on both ARMv8 and ARMv7 platforms, which improves compatibility across different ARM architectures.

1896-1905: Proper exposure of feature flags via AM_CONDITIONAL.

All the new architecture-specific feature flags are correctly exposed for use in Makefile.am, maintaining consistency with the existing pattern.

558-560: [Past review issue still present] Syntax bug: missing comma in AX_CHECK_COMPILE_FLAG for AES-NI

There's a missing comma before the third argument which will break autoconf parsing.

Apply this diff:
-AX_CHECK_COMPILE_FLAG([-msse4.1 -maes], [X86_AESNI_CXXFLAGS="-msse4.1 -maes"], [], [$CXXFLAG_WERROR])
+AX_CHECK_COMPILE_FLAG([-msse4.1 -maes], [X86_AESNI_CXXFLAGS="-msse4.1 -maes"], [], [$CXXFLAG_WERROR])
src/crypto/x11/dispatch.cpp (3)

85-91: LGTM! Clean platform-specific helper for macOS.

The IsSysCtlNonZero helper provides a clean abstraction for checking macOS system control values.

100-124: LGTM! Well-structured runtime dispatch with proper fallback chain.

The SapphireAutoDetect function correctly:

Initializes with safe software fallbacks

Performs runtime CPU detection

Progressively upgrades to optimized implementations when available

Uses proper feature detection through CPUID

The dispatch order is correct: more specific implementations (AES-NI) override before less specific ones (SSSE3).

131-133: Good coverage of macOS NEON detection keys.

Including multiple sysctl keys (hw.optional.AdvSIMD variations) ensures compatibility across different macOS versions and architectures as noted in the cpu_features issue reference.
src/crypto/x11/arm_crypto/shavite.cpp (4)
16-30: LGTM! Clean ARM NEON implementation of CompressElement.

The function correctly implements the compression element using ARM NEON intrinsics, properly handling the AES rounds and result extraction.

65-69: Good use of SIMD for AES round computation.

The code efficiently uses ARM NEON intrinsics to perform AES rounds and extract results. The pattern is consistent and well-structured.

Also applies to: 89-93

149-162: LGTM! Clean macro usage for state rotation.

The WROT macro is properly scoped and undefined after use, following best practices for macro hygiene.

39-40: [Past review issue still present] Large stack allocations may cause issues.

The function allocates ~7KB on the stack which could cause stack overflow on embedded systems or threads with limited stack space.

Consider using heap allocation or reducing the stack footprint. Here's an approach using std::unique_ptr with aligned allocation:
-    alignas(16) uint8x16_t rk_words[448/4];
-    alignas(16) uint32_t rk[448];
+    // Use aligned heap allocation to avoid stack overflow
+    struct AlignedArrays {
+        alignas(16) uint8x16_t rk_words[448/4];
+        alignas(16) uint32_t rk[448];
+    };
+    auto arrays = std::make_unique<AlignedArrays>();
+    auto& rk_words = arrays->rk_words;
+    auto& rk = arrays->rk;
src/crypto/x11/ssse3/echo.cpp (1)

1-133: Implementation looks correct and follows best practices.

This SSSE3-optimized implementation of the Echo algorithm correctly:

Guards compilation with ENABLE_SSSE3 check

Uses proper alignment directives for SIMD operations

Implements the same algorithmic structure as the reference implementation

Uses appropriate SIMD intrinsics for GF(8) multiplication and matrix operations

The GF(8) multiplication implementation matches the reference code's logic: ((x & 0x80) >> 7) * 27 ^ ((x & 0x7F) << 1).

src/crypto/x11/sph_echo.h (1)

79-116: API improvements enhance type safety.

The changes from void* to strongly-typed sph_echo512_context* parameters are excellent improvements that:

Provide compile-time type checking

Improve code clarity and maintainability

Align with modern C++ best practices

The removal of extern "C" wrappers is appropriate for internal C++ code.

src/crypto/x11/shavite.cpp (2)

250-250: Global dispatch pointer initialization looks correct.

The initialization of shavite_c512 with the soft implementation serves as a default that can be overridden by the runtime dispatcher based on CPU capabilities.

66-68: Incorrect — RoundKeyless signature matches the usage

RoundKeyless is declared as void RoundKeyless(uint32_t x0, uint32_t x1, uint32_t x2, uint32_t x3, uint32_t& y0, uint32_t& y1, uint32_t& y2, uint32_t& y3) (src/crypto/x11/aes.h) and defined in src/crypto/x11/aes.cpp; calling RoundKeyless(x0, x1, x2, x3, x0, x1, x2, x3) is valid (inputs by value, outputs by reference).

Likely an incorrect or invalid review comment.

src/crypto/x11/echo.cpp (2)

33-41: Vendored dependency modification needs documentation.

Previous review identified this concern: modifying files within src/crypto/x11/ violates the project's guideline to not change vendored dependencies. The extensive refactoring from C to C++ with namespaces represents a significant divergence from upstream.

If this is an intentional fork, please document:

The rationale for forking vs. wrapping

Maintenance strategy going forward

Any upstream compatibility considerations

158-159: Dispatch function pointers correctly initialized.

The global function pointers echo_round and echo_shift_mix are properly initialized with the software implementations, allowing runtime override by the dispatcher based on CPU features.

src/Makefile.am (4)

78-105: Hardware acceleration backends properly structured.

The conditional compilation of architecture-specific crypto libraries is well-organized:

Proper dependency chains (e.g., X86_AESNI requires SSE41)

Clear separation of concerns between different SIMD instruction sets

Consistent naming convention for library targets

821-821: SSE4.1 dependency for AES-NI requires documentation.

The compound preprocessor definition correctly enforces that AES-NI requires SSE4.1 support, but this dependency should be documented in user-facing documentation.

747-749: Build flags properly applied for SPH library.

The addition of SPHLIB_FLAGS to both CXXFLAGS and CFLAGS ensures that the appropriate compiler optimizations are applied to the SPH crypto library components.

1094-1094: Document DISABLE_OPTIMIZED_SHA256 and its performance impact

libdashconsensus is built with -DDISABLE_OPTIMIZED_SHA256 (src/Makefile.am:1094). That macro gates optimized SHA256 code in src/crypto/sha256.cpp and src/crypto/x11/{util/util.hpp,dispatch.cpp} and therefore can disable hardware-accelerated/optimized SHA256, affecting consensus verification throughput. Add brief documentation describing the rationale (upstream compatibility), expected performance impact, and how downstreams can enable optimized builds — e.g., README.md or docs/build.md and an inline comment beside the CPPFLAGS.

coderabbitai · 2025-09-22T10:09:54Z

src/crypto/x11/dispatch.cpp

+extern sapphire::dispatch::EchoShiftMix echo_shift_mix;
+extern sapphire::dispatch::EchoRoundFn echo_round;
+extern sapphire::dispatch::ShaviteCompressFn shavite_c512;
+


🛠️ Refactor suggestion

🧩 Analysis chain

Verify extern declaration placement and linkage.

The extern declarations for the dispatch function pointers should ideally be in a header file (dispatch.h) to ensure consistent signatures across translation units.

🏁 Script executed:

#!/bin/bash # Check if these dispatch symbols are properly declared in the header echo "Checking for dispatch declarations in header..." rg -n "extern.*echo_shift_mix|extern.*echo_round|extern.*shavite_c512" src/crypto/x11/dispatch.h # Verify the symbols are defined somewhere echo -e "\nChecking for dispatch definitions..." rg -n "dispatch::EchoShiftMix echo_shift_mix|dispatch::EchoRoundFn echo_round|dispatch::ShaviteCompressFn shavite_c512" --type cpp

Length of output: 684

Move extern declarations into src/crypto/x11/dispatch.h and include it where needed.

Currently the externs are declared only in src/crypto/x11/dispatch.cpp (lines 96–98) while the definitions live in src/crypto/x11/echo.cpp (158–159) and src/crypto/x11/shavite.cpp (250); place the extern declarations in dispatch.h to ensure consistent signatures and linkage across translation units.

🤖 Prompt for AI Agents

In src/crypto/x11/dispatch.cpp around lines 96–99 the extern declarations for echo_shift_mix, echo_round and shavite_c512 are placed in the .cpp; move these extern declarations into the header src/crypto/x11/dispatch.h (add them if missing) so all translation units share the same declarations, then include "dispatch.h" where those externs are needed (e.g., dispatch.cpp, echo.cpp, shavite.cpp). Ensure the header uses proper include guards or #pragma once and that the extern signatures exactly match the definitions in echo.cpp and shavite.cpp so linkage is consistent.

UdjinM6 · 2025-09-23T15:09:28Z

my m1 pro

develop:

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               38.68 |       25,854,892.97 |    0.2% |      0.01 | `Pow_Echo512_0032b`
|               15.50 |       64,514,821.45 |    0.3% |      0.01 | `Pow_Echo512_0080b`
|               19.09 |       52,377,329.11 |    0.3% |      0.01 | `Pow_Echo512_0128b`
|               11.86 |       84,327,262.25 |    0.3% |      0.01 | `Pow_Echo512_0512b`
|               10.64 |       93,944,123.41 |    0.3% |      0.01 | `Pow_Echo512_1024b`
|                9.48 |      105,478,278.07 |    0.4% |      0.10 | `Pow_Echo512_1M`
|               10.04 |       99,645,022.72 |    0.2% |      0.01 | `Pow_Echo512_2048b`
|               14.76 |       67,735,015.77 |    0.3% |      0.01 | `Pow_Shavite512_0032b`
|                5.91 |      169,179,963.96 |    0.3% |      0.01 | `Pow_Shavite512_0080b`
|                7.37 |      135,687,225.20 |    1.3% |      0.01 | `Pow_Shavite512_0128b`
|                4.55 |      219,622,024.53 |    0.5% |      0.01 | `Pow_Shavite512_0512b`
|                4.09 |      244,470,113.26 |    0.3% |      0.01 | `Pow_Shavite512_1024b`
|                3.64 |      274,690,711.99 |    0.3% |      0.04 | `Pow_Shavite512_1M`
|                3.87 |      258,498,729.78 |    0.3% |      0.01 | `Pow_Shavite512_2048b`
|              242.81 |        4,118,404.12 |    0.3% |      0.01 | `Pow_X11_0032b`
|               97.08 |       10,300,429.18 |    0.1% |      0.01 | `Pow_X11_0080b`
|               62.38 |       16,031,411.98 |    0.6% |      0.01 | `Pow_X11_0128b`
|               16.55 |       60,430,130.85 |    0.5% |      0.01 | `Pow_X11_0512b`
|                8.92 |      112,088,754.10 |    0.3% |      0.01 | `Pow_X11_1024b`
|                1.32 |      760,383,993.92 |    0.5% |      0.01 | `Pow_X11_1M`
|                5.11 |      195,511,195.68 |    0.3% |      0.01 | `Pow_X11_2048b`

This PR

|             ns/byte |              byte/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|               14.02 |       71,320,973.64 |    0.1% |      0.01 | `Pow_Echo512_0032b`
|                5.61 |      178,404,178.39 |    0.0% |      0.01 | `Pow_Echo512_0080b`
|                7.05 |      141,902,167.48 |    0.3% |      0.01 | `Pow_Echo512_0128b`
|                4.27 |      234,067,792.39 |    0.0% |      0.01 | `Pow_Echo512_0512b`
|                3.82 |      261,697,329.73 |    0.0% |      0.01 | `Pow_Echo512_1024b`
|                3.39 |      295,073,871.29 |    0.1% |      0.81 | `Pow_Echo512_1M`
|                3.61 |      277,369,449.15 |    0.1% |      0.01 | `Pow_Echo512_2048b`
|               16.83 |       59,434,919.83 |    0.2% |      0.01 | `Pow_Shavite512_0032b`
|                6.74 |      148,278,461.93 |    0.3% |      0.01 | `Pow_Shavite512_0080b`
|                8.13 |      123,007,772.59 |    1.8% |      0.01 | `Pow_Shavite512_0128b`
|                4.97 |      201,315,689.98 |    0.1% |      0.01 | `Pow_Shavite512_0512b`
|                4.48 |      223,427,193.41 |    0.2% |      0.01 | `Pow_Shavite512_1024b`
|                3.74 |      267,231,715.66 |    0.1% |      0.90 | `Pow_Shavite512_1M`
|                4.23 |      236,497,193.26 |    0.2% |      0.01 | `Pow_Shavite512_2048b`
|              221.75 |        4,509,573.65 |    0.2% |      0.01 | `Pow_X11_0032b`
|               88.63 |       11,282,557.55 |    0.2% |      0.01 | `Pow_X11_0080b`
|               56.80 |       17,606,348.44 |    0.4% |      0.01 | `Pow_X11_0128b`
|               15.20 |       65,771,599.76 |    0.3% |      0.01 | `Pow_X11_0512b`
|                8.27 |      120,903,071.40 |    0.3% |      0.01 | `Pow_X11_1024b`
|                1.32 |      756,867,000.23 |    0.2% |      0.32 | `Pow_X11_1M`
|                4.77 |      209,460,293.94 |    0.0% |      0.01 | `Pow_X11_2048b`

UdjinM6

Seems to be working

utACK/light-ACK 005967b

knst · 2025-09-27T09:07:17Z

What is git -dfx? I guess it's your own alias?

kwvg · 2025-09-27T12:27:00Z

@knst, It's git clean -dfx, meant to remove build files as the build system has difficulty with files renamed from .c to .cpp. It has been corrected in the PR description.

knst · 2025-09-27T13:06:27Z

git clean -dfx

    -f, --[no-]force      force

Please DO NOT recommend this option

knst · 2025-10-24T08:51:49Z

Updated instruction in PR how to get build after this PR. I need it pretty often when I am doing bisects

rm -r src/crypto/x11/.deps/

kwvg added this to the 23 milestone Sep 21, 2025

kwvg added 2 commits September 22, 2025 05:44

bench: set minimum epoch iterations to improve pow_hash stability

830928c

reverts: - f300277 (logically)

fix: suppress unaligned memory UBSan warnings if supported by arch

c4e7a40

kwvg force-pushed the sphlib_p2 branch 2 times, most recently from ec733b5 to 58965d4 Compare September 22, 2025 08:01

kwvg marked this pull request as ready for review September 22, 2025 08:59

kwvg requested review from PastaPastaPasta, UdjinM6 and knst September 22, 2025 08:59

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

kwvg added 18 commits September 22, 2025 09:38

refactor: remove large footprint Echo512 impl, switch to C++

7e8607e

refactor: remove large footprint Shavite512 impl, switch to C++

386742b

refactor: move software AES round to header

20fc998

crypto: replace hardcoded AES transform tables with constexpr tables

d6d3518

const: use function pointer to allow for switching implementation

cba39c5

crypto: implement naive AES-NI backend for simple rounds

1f2eb21

crypto: implement AES-NI backend for Echo512's FullStateRound()

31a6732

crypto: implement SSSE3 backend for Echo512's MixColumns()

da38871

crypto: implement SSSE3 backend for Echo512's ShiftRows()

71d6ef9

crypto: combine Echo512's Shift and Mix operations

250dcce

crypto: avoid extra load/store in Echo512's ShiftAndMix()

e1bbec4

crypto: implement AES-NI backend for Shavite512's CompressElement()

f2ececc

crypto: implement naive ARM AES backend for simple rounds

76bd236

crypto: implement ARM AES backend for Echo512's FullStateRound()

959c9ee

crypto: implement ARM AES backend for Shavite512's CompressElement()

963215b

crypto: implement ARM NEON backend for Echo512's ShiftAndMix()

fa68c70

crypto: unroll Echo512's FullStateRound()

31f93ad

Also lets us get rid of an extra unnecessary pack/unpack

crypto: implement Shavite512's full Compress() routine

d131fcc

crypto: drop naive AES backends

005967b

kwvg force-pushed the sphlib_p2 branch from 58965d4 to 005967b Compare September 22, 2025 09:56

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

UdjinM6 approved these changes Sep 23, 2025

View reviewed changes

PastaPastaPasta merged commit c4a9d82 into dashpay:develop Sep 25, 2025
36 of 44 checks passed

UdjinM6 modified the milestones: 23, 23.1 Oct 26, 2025

		alignas(16) uint8x16_t rk_words[448/4];
		alignas(16) uint32_t rk[448];

crypto: add dispatcher and implement hardware acceleration for Echo512 and Shavite512, improve benchmark stability #6852

crypto: add dispatcher and implement hardware acceleration for Echo512 and Shavite512, improve benchmark stability #6852

Uh oh!

Conversation

kwvg commented Sep 21, 2025 • edited by knst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Additional Information

Important

Misc.

Benchmarks

Apple M1, macOS Sequoia 15.7

AMD Ryzen 5 5600G, Ubuntu 24.04 (Noble)

Breaking Changes

Checklist

Uh oh!

github-actions bot commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ No Merge Conflicts Detected

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

UdjinM6 commented Sep 23, 2025

Uh oh!

UdjinM6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

knst commented Sep 27, 2025

Uh oh!

kwvg commented Sep 27, 2025

Uh oh!

knst commented Sep 27, 2025

Uh oh!

knst commented Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwvg commented Sep 21, 2025 •

edited by knst

Loading

github-actions bot commented Sep 21, 2025 •

edited

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading