Skip to content

Conversation

@eddyashton
Copy link
Member

See discussion in this thread:
https://github.com/microsoft/CCF/pull/7251/files#r2322012456

I think this is much nicer, but it does seem to be slightly slower (checking a bool for thread_local initialisation on every call), so this is debatable.

Comment on lines 404 to 418
PICOBENCH_SUITE("digest sha256");
namespace SHA256_bench
{
const std::vector<int> sha256_shifts = {
2 << 4, 2 << 6, 2 << 8, 2 << 10, 2 << 12, 2 << 14, 2 << 16};

auto openssl_sha256_preinit = sha256_bench;
PICOBENCH(openssl_sha256_preinit).iterations(sha256_shifts).baseline();

DEFINE_SHA256_BENCH(6)
DEFINE_SHA256_BENCH(8)
DEFINE_SHA256_BENCH(10)
DEFINE_SHA256_BENCH(12)
DEFINE_SHA256_BENCH(14)
DEFINE_SHA256_BENCH(16)
auto openssl_sha256_tl_init = sha256_bench_;
PICOBENCH(openssl_sha256_tl_init).iterations(sha256_shifts);

auto openssl_sha256_nocache = sha256_noopt_bench;
PICOBENCH(openssl_sha256_nocache).iterations(sha256_shifts);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor is worth doing in a separate, even if we don't use this initialisation. It moves all these benchmarks into a single suite, and uses the .iterations (the Dim column in the output) for the actual digest size, so that the ns/op and Ops/second columns are actually talking about per-byte costs, rather than some abstract multiple.

Before:

## digest sha256 (2 << 6):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_6_no *    |      10 |     0.004 |     442 |      - |  2259887.0
 openssl_sha256_6         |      10 |     0.003 |     279 |  0.633 |  3572704.5

## digest sha256 (2 << 8):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_8_no *    |      10 |     0.007 |     708 |      - |  1412429.4
 openssl_sha256_8         |      10 |     0.005 |     498 |  0.704 |  2005615.7

## digest sha256 (2 << 10):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_10_no *   |      10 |     0.017 |    1657 |      - |   603281.9
 openssl_sha256_10        |      10 |     0.015 |    1512 |  0.913 |   661025.9

## digest sha256 (2 << 12):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_12_no *   |      10 |     0.056 |    5587 |      - |   178980.5
 openssl_sha256_12        |      10 |     0.054 |    5418 |  0.970 |   184559.7

## digest sha256 (2 << 14):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_14_no *   |      10 |     0.214 |   21420 |      - |    46684.7
 openssl_sha256_14        |      10 |     0.212 |   21208 |  0.990 |    47151.1

## digest sha256 (2 << 16):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_16_no *   |      10 |     0.846 |   84614 |      - |    11818.3
 openssl_sha256_16        |      10 |     0.843 |   84281 |  0.996 |    11865.1
 ## digest sha256 (2 << 6):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_6_no *    |      10 |     0.004 |     442 |      - |  2259887.0
 openssl_sha256_6         |      10 |     0.003 |     279 |  0.633 |  3572704.5

## digest sha256 (2 << 8):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_8_no *    |      10 |     0.007 |     708 |      - |  1412429.4
 openssl_sha256_8         |      10 |     0.005 |     498 |  0.704 |  2005615.7

## digest sha256 (2 << 10):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_10_no *   |      10 |     0.017 |    1657 |      - |   603281.9
 openssl_sha256_10        |      10 |     0.015 |    1512 |  0.913 |   661025.9

## digest sha256 (2 << 12):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_12_no *   |      10 |     0.056 |    5587 |      - |   178980.5
 openssl_sha256_12        |      10 |     0.054 |    5418 |  0.970 |   184559.7

## digest sha256 (2 << 14):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_14_no *   |      10 |     0.214 |   21420 |      - |    46684.7
 openssl_sha256_14        |      10 |     0.212 |   21208 |  0.990 |    47151.1

## digest sha256 (2 << 16):

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_16_no *   |      10 |     0.846 |   84614 |      - |    11818.3
 openssl_sha256_16        |      10 |     0.843 |   84281 |  0.996 |    11865.1

After:

## digest sha256:

Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
openssl_sha256_noopt *   |      32 |     0.000 |      12 |      - | 80200501.3
openssl_sha256_opt       |      32 |     0.000 |       5 |  0.439 |182857142.9
openssl_sha256_noopt *   |     128 |     0.000 |       3 |      - |272921108.7
openssl_sha256_opt       |     128 |     0.000 |       2 |  0.557 |490421455.9
openssl_sha256_noopt *   |     512 |     0.001 |       1 |      - |665799739.9
openssl_sha256_opt       |     512 |     0.001 |       1 |  0.700 |951672862.5
openssl_sha256_noopt *   |    2048 |     0.002 |       0 |      - |1137146030.0
openssl_sha256_opt       |    2048 |     0.002 |       0 |  0.890 |1277604491.6
openssl_sha256_noopt *   |    8192 |     0.006 |       0 |      - |1362608117.1
openssl_sha256_opt       |    8192 |     0.006 |       0 |  0.934 |1458170167.3
openssl_sha256_noopt *   |   32768 |     0.022 |       0 |      - |1481977296.4
openssl_sha256_opt       |   32768 |     0.022 |       0 |  1.005 |1475105789.1
openssl_sha256_noopt *   |  131072 |     0.088 |       0 |      - |1490792870.9
openssl_sha256_opt       |  131072 |     0.090 |       0 |  1.028 |1450537289.3

The evidence that these are "the same numbers" is that the ratio in the Baseline column is (roughly, within noise) the same for each size. Just now the other columns are more readable, the Total ms is for one rather than 10, and as mentioned above the "op" unit is "byte".

(I've also added 2 << 4, because I was interested)

(Bonus, you can now run this directly with ./crypto_bench --run-suite="digest sha256", because it's a single suite)

@eddyashton
Copy link
Member Author

Is this actually faster?

If we only run a single digest, then the benchmarks say no - this is faster than not caching the contexts, but slower than the explicit _init call (since this approach introduces an implicit atomic read, to check whether the thread_local has been constructed).

## digest sha256:

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_noopt *   |      32 |     0.000 |      11 |      - | 84880636.6
 openssl_sha256_opt       |      32 |     0.000 |       5 |  0.440 |192771084.3
 openssl_sha256_tl        |      32 |     0.000 |       7 |  0.599 |141592920.4
 openssl_sha256_noopt *   |     128 |     0.001 |       5 |      - |184971098.3
 openssl_sha256_opt       |     128 |     0.000 |       2 |  0.382 |484848484.8
 openssl_sha256_tl        |     128 |     0.000 |       2 |  0.403 |458781362.0
 openssl_sha256_noopt *   |     512 |     0.001 |       1 |      - |596041909.2
 openssl_sha256_opt       |     512 |     0.001 |       1 |  0.629 |948148148.1
 openssl_sha256_tl        |     512 |     0.001 |       1 |  0.698 |853333333.3
 openssl_sha256_noopt *   |    2048 |     0.002 |       1 |      - |938158497.5
 openssl_sha256_opt       |    2048 |     0.002 |       0 |  0.770 |1219047619.0
 openssl_sha256_tl        |    2048 |     0.002 |       0 |  0.737 |1272840273.5
 openssl_sha256_noopt *   |    8192 |     0.006 |       0 |      - |1348255431.2
 openssl_sha256_opt       |    8192 |     0.006 |       0 |  0.952 |1416320885.2
 openssl_sha256_tl        |    8192 |     0.006 |       0 |  0.960 |1404423109.9
 openssl_sha256_noopt *   |   32768 |     0.023 |       0 |      - |1434173669.5
 openssl_sha256_opt       |   32768 |     0.023 |       0 |  0.990 |1449013885.2
 openssl_sha256_tl        |   32768 |     0.023 |       0 |  0.991 |1447030249.5
 openssl_sha256_noopt *   |  131072 |     0.090 |       0 |      - |1453239164.9
 openssl_sha256_opt       |  131072 |     0.090 |       0 |  0.999 |1454303371.9
 openssl_sha256_tl        |  131072 |     0.090 |       0 |  1.001 |1451726162.1

But that's slightly artificial, what about the minor "warm cache" (/branch predictor) win we'd expect if we digest multiple things in quick succession (simulated by "calling the function 10 times"). Then this approach is near-identical:

## digest sha256:

 Name (* = baseline)      |   Dim   |  Total ms |  ns/op  |Baseline| Ops/second
--------------------------|--------:|----------:|--------:|-------:|----------:
 openssl_sha256_noopt *   |      32 |     0.003 |     103 |      - |  9685230.0
 openssl_sha256_opt       |      32 |     0.001 |      42 |  0.408 | 23721275.0
 openssl_sha256_tl        |      32 |     0.001 |      44 |  0.433 | 22346368.7
 openssl_sha256_noopt *   |     128 |     0.004 |      34 |      - | 29317453.0
 openssl_sha256_opt       |     128 |     0.002 |      17 |  0.516 | 56863616.2
 openssl_sha256_tl        |     128 |     0.002 |      17 |  0.518 | 56587091.1
 openssl_sha256_noopt *   |     512 |     0.007 |      13 |      - | 75095335.9
 openssl_sha256_opt       |     512 |     0.005 |       9 |  0.721 |104086196.4
 openssl_sha256_tl        |     512 |     0.005 |       9 |  0.727 |103350827.6
 openssl_sha256_noopt *   |    2048 |     0.018 |       8 |      - |116542423.0
 openssl_sha256_opt       |    2048 |     0.015 |       7 |  0.876 |133090720.0
 openssl_sha256_tl        |    2048 |     0.016 |       7 |  0.883 |131958762.9
 openssl_sha256_noopt *   |    8192 |     0.060 |       7 |      - |136688246.7
 openssl_sha256_opt       |    8192 |     0.057 |       7 |  0.959 |142506740.9
 openssl_sha256_tl        |    8192 |     0.057 |       7 |  0.959 |142561300.3
 openssl_sha256_noopt *   |   32768 |     0.228 |       6 |      - |143992758.2
 openssl_sha256_opt       |   32768 |     0.226 |       6 |  0.992 |145198025.5
 openssl_sha256_tl        |   32768 |     0.226 |       6 |  0.994 |144806618.1
 openssl_sha256_noopt *   |  131072 |     0.900 |       6 |      - |145589128.8
 openssl_sha256_opt       |  131072 |     0.900 |       6 |  0.999 |145704360.4
 openssl_sha256_tl        |  131072 |     0.900 |       6 |  1.000 |145615331.2

I think that's close enough that it's worth comparing e2e numbers, so I'll try that.

@maxtropets
Copy link
Collaborator

Thanks for benchmarking, picking that up in #7251

@maxtropets maxtropets closed this Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants