8322179: RISC-V: Implement SHA-1 intrinsic #17130

Hamlin-Li · 2023-12-15T17:46:46Z

Hi,
Can you review this patch to implement SHA-1 intrinsic for riscv?
Thanks!

Test

Functionality

tests under test/hotspot/jtreg/compiler/intrinsics/sha
tests found via find test/jdk -iname "*SHA1*.java"

Performance

tested on T-HEAD Light Lichee Pi 4A

JMH_PARAMS="-f 1 -wi 10 -i 20" // for every loop of jmh test

benchmark tests MessageDigests.java GetMessageDigest.java MessageDigestBench.java MacBench.java which are under test/micro/org/openjdk/bench/, more spcifically TESTS="MessageDigests.digest MessageDigests.getAndDigest MessageDigestBench.digest"

// After
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   20      1845.446 ?     27.052  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    181455.350 ?    532.258  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2447.674 ?     10.239  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    182896.083 ?   1242.774  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   20  11599227.792 ? 121442.390  ns/op
// Before
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2352.475 ?    11.198  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    188495.684 ?  1467.942  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2437.347 ?     6.398  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    196086.570 ?  1140.998  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   20  12362160.119 ? 38788.109  ns/op

getAndDigest when size == 64
The data is not stable for test getAndDigest when size == 64, which I think is introduced by j.s.MessageDigest.getInstance itself, which we don't touch in this patch.
Check more details at 1

loop ... 1
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2447.674 ?     10.239  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2437.347 ?     6.398  ns/op

loop ... 2
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2501.846 ?     23.502  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      3148.071 ?    14.844  ns/op

loop ... 3
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2759.030 ?     13.484  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2584.943 ?     4.416  ns/op

loop ... 4
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2463.703 ?     13.599  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2429.879 ?    10.882  ns/op

loop ... 5
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2707.707 ?     13.850  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2449.757 ?      8.352  ns/op

loop ... 6
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2595.890 ?     12.611  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2422.945 ?     7.675  ns/op

loop ... 7
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2404.980 ?     14.419  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2461.898 ?     6.942  ns/op

loop ... 8
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2463.224 ?     11.629  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2446.619 ?     9.804  ns/op

loop ... 9
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2386.217 ?     15.932  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2466.078 ?     6.931  ns/op

loop ... 10
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2591.705 ?     12.051  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2400.250 ?     7.660  ns/op

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8322179: RISC-V: Implement SHA-1 intrinsic (Enhancement - P4)

Reviewers

Antonios Printezis (@gctony - Reviewer)
Fei Yang (@RealFYang - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17130/head:pull/17130
$ git checkout pull/17130

Update a local copy of the PR:
$ git checkout pull/17130
$ git pull https://git.openjdk.org/jdk.git pull/17130/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17130

View PR using the GUI difftool:
$ git pr show -t 17130

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17130.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2023-12-15T17:48:29Z

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2023-12-15T17:50:20Z

@Hamlin-Li The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2023-12-15T17:52:43Z

Webrevs

gctony

I'll finish the review tomorrow. But posting the comments I have so far.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

gctony · 2023-12-18T13:40:09Z

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

+    __ enter();
+
+    // c_rarg0 - c_rarg3: x10 - x13
+    Register buf    = c_rarg0;


You could copy the four arguments to a different set of registers and use a0 -> a3 for some of the other values to see if you can increase the number of compressed instructions that can be used. Unclear whether it's worth it or not.

Yeh, I'm not sure if we should take this approach.
Good side might be some code size reduction, bad side might be it's a bit confusing to read and maintain the code.

gctony · 2023-12-18T13:45:36Z

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

+    __ add(e, e, prev_e);
+  }
+
+  void sha1_reserve_prev_abcde(Register a, Register b, Register c, Register d, Register e,


I think it's safe to just use t0 and t1 for intermediate results without passing them as args. I used to pass them as args too, but I changed that for md5.

Yes, it is.
But I found out it's not that clear in some situation, and error-prone, e.g. when the code is a bit complicated.
So, my policy here is that I only use built-in register names in generate_sha1_implCompress, in all other places, I just use the renamed ones and use them explicitly.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

RealFYang · 2024-01-10T03:02:09Z

Simply ran micro:java.security.MessageDigests JMH on my Lichee-pi-4a board, seems there is a small regression for the MessageDigests.getAndDigest (length = 64) case:

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   417.311 ?  2.686  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.206 ?  0.008  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   404.769 ? 14.810  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.106 ?  0.046  ops/ms

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   518.057 ?  5.935  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.569 ?  0.009  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   378.184 ? 37.425  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.515 ?  0.017  ops/ms

openjdk · 2024-01-10T09:10:54Z

@Hamlin-Li this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout sha-1
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

Hamlin-Li · 2024-01-10T10:57:26Z

Simply ran micro:java.security.MessageDigests JMH on my Lichee-pi-4a board, seems there is a small regression for the MessageDigests.getAndDigest (length = 64) case:

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   417.311 ?  2.686  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.206 ?  0.008  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   404.769 ? 14.810  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.106 ?  0.046  ops/ms

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   518.057 ?  5.935  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.569 ?  0.009  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   378.184 ? 37.425  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.515 ?  0.017  ops/ms

Hey,
Thanks for testing.
The data is interesting, seems there is just a little improvement when data is 16384, compared to my test result more than 7X speedup. I will try different test environment. At the same time, can you help to double check your env whether the sha-1 intrinsic is enabled/disabled as expected?

Hamlin-Li · 2024-01-11T10:12:36Z

@RealFYang My bad, I think I added some options in my test scripts accidently, which cause the perf data is not right.

Hamlin-Li · 2024-01-11T10:25:48Z

The reason why there is some regression in getAndDigest when size == 64 is,

for test MessageDigests.getAndDigest, it's actually j.s.MessageDigest.getInstance + j.s.MessageDigest.digest;
for j.s.MessageDigest.getInstance, we don't make any improvement, during the jmh test the performance jitter is kind of big, which I show below with dozens of runs 1.
for j.s.MessageDigest.digest, we made the improvement, but when the size == 64, the improvement is not big enough to fill the gap introduced by performance jitter introduced by j.s.MessageDigest.getInstance.
so, combine above together, the performance "regression" in MessageDigests.getAndDigest when size == 64, should be performance jitter introduced by j.s.MessageDigest.getInstance, which I also show below with dozens of runs 2.

When the data size is bigger enough, i.e. 16384, the performance gain is more clear in test MessageDigests.getAndDigest.

Hamlin-Li · 2024-01-11T10:26:54Z

performance jitter of j.s.MessageDigest.getInstance

loop ... 1
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     674.589 ?   21.290  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     715.351 ?   16.302  ns/op
loop ... 2
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     744.618 ?   14.846  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     663.602 ?   15.462  ns/op
loop ... 3
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     695.022 ?   17.110  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.499 ?   17.883  ns/op
loop ... 4
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.238 ?   15.415  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     696.098 ?   13.663  ns/op
loop ... 5
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     676.999 ?   20.736  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     678.875 ?   14.123  ns/op
loop ... 6
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.661 ?   15.219  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     675.950 ?   16.287  ns/op
loop ... 7
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     666.744 ?   17.272  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     690.439 ?   17.091  ns/op
loop ... 8
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.327 ?   16.777  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     663.682 ?   16.646  ns/op
loop ... 9
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     688.047 ?   14.885  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.252 ?   17.249  ns/op
loop ... 10
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.169 ?   15.243  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     756.130 ?    6.580  ns/op

loop ... 1
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.255 ?    2.605  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.618 ?    4.099  ns/op
loop ... 2
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     672.097 ?    2.443  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     686.033 ?    2.162  ns/op
loop ... 3
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     662.741 ?    4.445  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.579 ?    1.971  ns/op
loop ... 4
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.821 ?   19.279  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.139 ?    2.115  ns/op
loop ... 5
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     667.951 ?   23.085  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     676.130 ?    2.538  ns/op
loop ... 6
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.076 ?    5.673  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.080 ?   13.638  ns/op
loop ... 7
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.471 ?   14.564  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     683.718 ?    1.804  ns/op
loop ... 8
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     665.747 ?    2.983  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     667.895 ?    3.142  ns/op
loop ... 9
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     808.671 ?   15.151  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     671.326 ?    9.426  ns/op
loop ... 10
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     666.879 ?    2.342  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.161 ?    3.510  ns/op

Hamlin-Li · 2024-01-11T10:29:28Z

performance jitter of MessageDigests.getAndDigest (when size == 64) introduced by j.s.MessageDigest.getInstance

loop ... 1
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2521.461 ?   16.450  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2428.247 ?   10.967  ns/op
loop ... 2
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2436.408 ?   32.545  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2424.811 ?   11.521  ns/op
loop ... 3
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2484.609 ?    9.321  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2462.019 ?   16.622  ns/op
loop ... 4
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2550.315 ?   32.915  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2459.512 ?   95.334  ns/op
loop ... 5
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2852.858 ?   45.638  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2472.366 ?   87.208  ns/op
loop ... 6
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2580.812 ?   41.330  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2452.006 ?   86.735  ns/op
loop ... 7
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2427.592 ?   44.410  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2437.358 ?   92.754  ns/op
loop ... 8
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2438.203 ?   20.321  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2633.727 ?   72.077  ns/op
loop ... 9
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2424.485 ?   24.730  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2464.582 ?   14.058  ns/op
loop ... 10
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2408.087 ?   34.954  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2464.284 ?   65.394  ns/op

loop ... 1
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2452.427 ?   12.069  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2451.976 ?    6.551  ns/op
loop ... 2
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2428.869 ?    8.818  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2520.389 ?    8.184  ns/op
loop ... 3
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2511.975 ?   16.673  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2509.494 ?   21.087  ns/op
loop ... 4
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2483.784 ?   16.029  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2512.870 ?    9.201  ns/op
loop ... 5
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2441.026 ?    9.222  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2431.875 ?    9.295  ns/op
loop ... 6
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2402.302 ?    9.737  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2474.198 ?    6.352  ns/op
loop ... 7
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2484.982 ?   13.996  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2444.898 ?   14.270  ns/op
loop ... 8
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2565.433 ?   10.722  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2469.290 ?   32.165  ns/op
loop ... 9
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2527.289 ?   18.710  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2480.432 ?    9.804  ns/op
loop ... 10
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2474.362 ?   23.856  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2433.547 ?   30.935  ns/op

RealFYang · 2024-01-16T07:10:23Z

FYI: The performance numbers seems more stable on other platforms like Unmatched board (JMH AverageTime mode):

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3974.419 ?    28.954  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   411073.165 ?  3731.988  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7136.679 ?   480.850  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   429881.929 ?  1265.110  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3993.060 ?     6.265  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   410724.751 ?  2075.018  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7085.596 ?   496.358  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   430184.356 ?  1052.236  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     4016.232 ?    48.074  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   417735.231 ?  7001.640  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7114.528 ?   504.775  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   438041.321 ? 20056.313  ns/op

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3685.514 ?     5.401  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364406.355 ?   217.797  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5427.864 ?    41.520  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367995.806 ?   228.853  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3681.851 ?     6.591  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364433.610 ?   226.146  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5483.575 ?    46.445  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367713.143 ?   348.944  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3686.556 ?     6.273  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364631.822 ?  1265.576  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5496.395 ?    66.473  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367870.983 ?   296.836  ns/op

Hamlin-Li · 2024-01-16T18:48:21Z

@RealFYang Thanks for testing. Yes, also found that on VF 2 test result is more stable than Lichee Pi 4A.

Not sure if we need to further investigate the performance jitter on Lichee Pi 4A, seems it's not related to this implementation. How do yo think about it?

Hamlin-Li · 2024-01-23T10:49:37Z

( NOTE: in above graph, column master is the data for jdk master; column sha-disable is the data for intrinsic disabled, column sha-enable is the data for intrinsic enabled. )

On T-HEAD Light Lichee Pi 4A, by running jmh 10 rounds (each round with JMH_PARAMS="-f 1 -wi 5 -i 10"), it shows there is no performance regression for MessageDigests.getAndDigest(size == 64).

More details
For MessageDigests.getAndDigest(size == 64), the performance data in master also shows more severe jitter than MessageDigests.getAndDigest(size == 16384). So, the performance jitter in MessageDigests.getAndDigest(size == 64) is not introduced by this implementation.
And, more common use case should be 1 getInstance invocation + N digest invocation, rather than 1 getInstance + 1 digest?

Hamlin-Li · 2024-01-28T09:26:18Z

Hey,
Can I get more reviews? Thanks

RealFYang · 2024-01-29T08:08:03Z

Hey, Can I get more reviews? Thanks

Hi, will take another look. Can you merge with master? I see bot added merge-conflicts label. Thanks.

Hamlin-Li · 2024-01-29T12:20:20Z

Hey, Can I get more reviews? Thanks

Hi, will take another look. Can you merge with master? I see bot added merge-conflicts label. Thanks.

Updated! Thanks!

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

RealFYang

Thanks for the update. A few more minor comments.

test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

Hamlin-Li · 2024-02-07T15:22:48Z

Hey, I'll be on vacation for a while, so maybe slow in response.
But please leave your comments freely as usual.

RealFYang

Hi, Thanks for the quick update. A couple of minor comments remain after another look.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

RealFYang · 2024-02-17T04:06:18Z

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

+    sha1_calculate_im_hash(a, b, c, d, e, prev_ab, prev_cd, prev_e);
+
+    if (multi_block) {
+      __ bge(limit, buf, L_sha1_loop, true);


A small question: Is it OK to continue the loop when limit equals buf?

I think the answer to your question is at https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/DigestBase.java#L130, the limit is indeed limit - blockSize, hope this answer your question?

RealFYang · 2024-02-17T13:27:19Z

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

+      __ add(t, t, cur_w);
+      __ rolw_imm(cur_w, a, 5, t0);
+      // as pointed above, we can use cur_w as temporary register here.
+      sha1_f(round, tmp, b, c, d);


Could we also reuse cur_k in this function where input param tmp is used? I see cur_k will also be recalculated at the beginning of each round like cur_w. Hope this could help eliminate tmp param and finally free t2.

Seems not, as cur_k is only calculated when (round % 20) == 0.

Other comments are all resolved.

Thanks. Could you please move int round as the last formal param for other assembler functions so that it will be more consistent? They are sha1_prepare_w, sha1_f, and sha1_process_round. BTW: I think it will be safer to add scratch register t0, t1 to the list for assert_different_registers if they are used in those assembler functions.

Sure, it's updated. Thanks.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

gctony

Overall, pretty good! Only a few minor suggestions for your consideration!

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

gctony · 2024-02-23T22:16:21Z

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

+
+    if (round == 16) {
+      int64_t block_bytes = round * 4;
+      __ addi(buf, buf, block_bytes);


Does buf need to be incremented here? And why at round == 16 and not after all the rounds are done? Maybe you can do this in the loop in generate_sha1_implCompress to have the code that initializes and increments it in the same function? Also, does it need to be incremented if !multi_block?

Thanks, Good catch!

And other comments are also resolved.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp

gctony

Thanks for applying the changes! PR LGTM.

openjdk · 2024-02-26T14:37:38Z

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322179: RISC-V: Implement SHA-1 intrinsic

Reviewed-by: tonyp, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 347 new commits pushed to the master branch:

0901ded: 8326433: Make file-local functions static in src/jdk.jdwp.agent/unix/native/libjdwp/exec_md.c
d22d890: 8325898: ChoiceFormat returns erroneous result when formatting bad pattern
93feda3: 8322962: Upcall stub might go undetected when freezing frames
fc67c2b: 8325530: Vague error message when com.sun.tools.attach.VirtualMachine fails to load agent library
b87d9cf: 8325506: Ensure randomness is only read from provided SecureRandom object
0963a4e: 8326699: Problemlist CertMsgCheck.java
bf13a4e: 8322881: java/nio/file/Files/CopyMoveVariations.java fails with AccessDeniedException due to permissions of files in /tmp
f62b578: 8311644: Server should not send bad_certificate alert when the client does not send any certificates
9a9cfbe: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF
3780ad3: 8326587: Separate out Microsoft toolchain linking
... and 337 more: https://git.openjdk.org/jdk/compare/7a300b63b5ca22dfe3e831e641f7a11b9c719b30...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

RealFYang

Looks good. Are there any changes on the performance numbers for the latest version? Thanks.

Hamlin-Li · 2024-02-27T08:16:02Z

Looks good. Are there any changes on the performance numbers for the latest version? Thanks.

I did not compare between all versions, but yes, there is further improvement especially when data size grow, e.g. comparing with abbf7c2, final version as below

# final
Benchmark                                           (algorithm)  (dataSize)  (digesterName)  (length)  (provider)  Mode  Cnt         Score        Error  Units
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   10      1845.049 ?     46.909  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    174383.296 ?    605.029  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   10      2478.380 ?     19.147  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    176391.224 ?    489.730  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   10  11591613.058 ? 310656.422  ns/op

# abbf7c2
Benchmark                                           (algorithm)  (dataSize)  (digesterName)  (length)  (provider)  Mode  Cnt         Score        Error  Units
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   10      1856.090 ?     34.903  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    181397.109 ?    446.396  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   10      2469.551 ?     23.858  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    183116.724 ?    343.619  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   10  12059508.183 ? 285536.713  ns/op

   benchmark                                        algorithm  dataSize  digesterName  length  provider        base                  comp                        diff
1  o.o.b.java.security.MessageDigests.digest              N/A       N/A         SHA-1      64   DEFAULT    1856.090   ±34.903    1845.049   ±46.909  ns/op    -11.041  -0.59%
2  o.o.b.java.security.MessageDigests.digest              N/A       N/A         SHA-1   16384   DEFAULT  181397.109  ±446.396  174383.296  ±605.029  ns/op  -7013.813  -3.87%  ✓
3  o.o.b.java.security.MessageDigests.getAndDigest        N/A       N/A         SHA-1      64   DEFAULT    2469.551   ±23.858    2478.380   ±19.147  ns/op     +8.829  +0.36%
4  o.o.b.java.security.MessageDigests.getAndDigest        N/A       N/A         SHA-1   16384   DEFAULT  183116.724  ±343.619  176391.224  ±489.730  ns/op  -6725.500  -3.67%  ✓

Hamlin-Li · 2024-02-27T08:16:55Z

Thanks @RealFYang @gctony for your detailed reviewing.

/integrate

openjdk · 2024-02-27T08:17:35Z

Going to push as commit a48f596.
Since your change was applied there have been 350 commits pushed to the master branch:

5daf63b: 8326636: Problem list StartupOutput.java due to 8326615
97aca09: 8326717: Disable stringop-overflow in shenandoahLock.cpp
ec9437d: 8325247: Memory leak in SessionKeyRef class def when using PKCS11 security provider
0901ded: 8326433: Make file-local functions static in src/jdk.jdwp.agent/unix/native/libjdwp/exec_md.c
d22d890: 8325898: ChoiceFormat returns erroneous result when formatting bad pattern
93feda3: 8322962: Upcall stub might go undetected when freezing frames
fc67c2b: 8325530: Vague error message when com.sun.tools.attach.VirtualMachine fails to load agent library
b87d9cf: 8325506: Ensure randomness is only read from provided SecureRandom object
0963a4e: 8326699: Problemlist CertMsgCheck.java
bf13a4e: 8322881: java/nio/file/Files/CopyMoveVariations.java fails with AccessDeniedException due to permissions of files in /tmp
... and 340 more: https://git.openjdk.org/jdk/compare/7a300b63b5ca22dfe3e831e641f7a11b9c719b30...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-02-27T08:17:43Z

@Hamlin-Li Pushed as commit a48f596.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Initial commit

505eca0

openjdk bot added the rfr Pull request is ready for review label Dec 15, 2023

openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 15, 2023

Add some comments

c4dc07b

gctony reviewed Dec 18, 2023

View reviewed changes

gctony reviewed Dec 19, 2023

View reviewed changes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

hamlin added 3 commits December 19, 2023 17:36

round 1 review

42f838a

refine code

43299e9

remove tp/gp

eb020a8

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jan 10, 2024

merge master; refactor UseSHAxxx

7de9b6a

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jan 29, 2024

Hamlin-Li and others added 2 commits January 29, 2024 11:44

Delete src/.vscode/settings.json

2c79ea0

revert string change

7224b49

RealFYang reviewed Feb 1, 2024

View reviewed changes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Show resolved Hide resolved

use x9 as src

50b4e11

RealFYang reviewed Feb 5, 2024

View reviewed changes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

not pass in temp register explicitly

e8b01b6

RealFYang reviewed Feb 6, 2024

View reviewed changes

test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

refine register usage and comment

f297659

RealFYang reviewed Feb 19, 2024

View reviewed changes

hamlin added 3 commits February 21, 2024 16:06

refine code

abbf7c2

optimize code

2c075a4

assert register; move round param

3bba2b8

gctony reviewed Feb 23, 2024

View reviewed changes

refine code

7278076

gctony approved these changes Feb 26, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Feb 26, 2024

RealFYang approved these changes Feb 27, 2024

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Feb 27, 2024

openjdk bot closed this Feb 27, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 27, 2024

Hamlin-Li deleted the sha-1 branch February 27, 2024 08:32

8322179: RISC-V: Implement SHA-1 intrinsic #17130

8322179: RISC-V: Implement SHA-1 intrinsic #17130

Conversation

Hamlin-Li commented Dec 15, 2023 • edited by openjdk bot Loading

Test

Functionality

Performance

Progress

Issue

Reviewers

Reviewing

Webrev

bridgekeeper bot commented Dec 15, 2023

openjdk bot commented Dec 15, 2023

mlbridge bot commented Dec 15, 2023 • edited Loading

Webrevs

gctony left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RealFYang commented Jan 10, 2024 • edited Loading

openjdk bot commented Jan 10, 2024

Hamlin-Li commented Jan 10, 2024

Hamlin-Li commented Jan 11, 2024

Hamlin-Li commented Jan 11, 2024 • edited Loading

Hamlin-Li commented Jan 11, 2024 • edited Loading

Hamlin-Li commented Jan 11, 2024 • edited Loading

RealFYang commented Jan 16, 2024 • edited Loading

Hamlin-Li commented Jan 16, 2024

Hamlin-Li commented Jan 23, 2024 • edited Loading

Hamlin-Li commented Jan 28, 2024

RealFYang commented Jan 29, 2024

Hamlin-Li commented Jan 29, 2024

RealFYang left a comment

Choose a reason for hiding this comment

Hamlin-Li commented Feb 7, 2024

RealFYang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RealFYang Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gctony left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gctony left a comment

Choose a reason for hiding this comment

openjdk bot commented Feb 26, 2024 • edited Loading

RealFYang left a comment • edited Loading

Choose a reason for hiding this comment

Hamlin-Li commented Feb 27, 2024 • edited Loading

Hamlin-Li commented Feb 27, 2024

openjdk bot commented Feb 27, 2024

openjdk bot commented Feb 27, 2024

Hamlin-Li commented Dec 15, 2023 •

edited by openjdk bot

Loading

mlbridge bot commented Dec 15, 2023 •

edited

Loading

RealFYang commented Jan 10, 2024 •

edited

Loading

Hamlin-Li commented Jan 11, 2024 •

edited

Loading

Hamlin-Li commented Jan 11, 2024 •

edited

Loading

Hamlin-Li commented Jan 11, 2024 •

edited

Loading

RealFYang commented Jan 16, 2024 •

edited

Loading

Hamlin-Li commented Jan 23, 2024 •

edited

Loading

RealFYang Feb 23, 2024 •

edited

Loading

openjdk bot commented Feb 26, 2024 •

edited

Loading

RealFYang left a comment •

edited

Loading

Hamlin-Li commented Feb 27, 2024 •

edited

Loading