Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8322179: RISC-V: Implement SHA-1 intrinsic #17130

Closed
wants to merge 15 commits into from

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Dec 15, 2023

Hi,
Can you review this patch to implement SHA-1 intrinsic for riscv?
Thanks!

Test

Functionality

tests under test/hotspot/jtreg/compiler/intrinsics/sha
tests found via find test/jdk -iname "*SHA1*.java"

Performance

tested on T-HEAD Light Lichee Pi 4A

JMH_PARAMS="-f 1 -wi 10 -i 20" // for every loop of jmh test

benchmark tests MessageDigests.java GetMessageDigest.java MessageDigestBench.java MacBench.java which are under test/micro/org/openjdk/bench/, more spcifically TESTS="MessageDigests.digest MessageDigests.getAndDigest MessageDigestBench.digest"

// After
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   20      1845.446 ?     27.052  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    181455.350 ?    532.258  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2447.674 ?     10.239  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    182896.083 ?   1242.774  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   20  11599227.792 ? 121442.390  ns/op
// Before
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2352.475 ?    11.198  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    188495.684 ?  1467.942  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2437.347 ?     6.398  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   20    196086.570 ?  1140.998  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   20  12362160.119 ? 38788.109  ns/op

getAndDigest when size == 64
The data is not stable for test getAndDigest when size == 64, which I think is introduced by j.s.MessageDigest.getInstance itself, which we don't touch in this patch.
Check more details at 1

loop ... 1
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2447.674 ?     10.239  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2437.347 ?     6.398  ns/op

loop ... 2
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2501.846 ?     23.502  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      3148.071 ?    14.844  ns/op

loop ... 3
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2759.030 ?     13.484  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2584.943 ?     4.416  ns/op

loop ... 4
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2463.703 ?     13.599  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2429.879 ?    10.882  ns/op

loop ... 5
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2707.707 ?     13.850  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2449.757 ?      8.352  ns/op

loop ... 6
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2595.890 ?     12.611  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2422.945 ?     7.675  ns/op

loop ... 7
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2404.980 ?     14.419  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2461.898 ?     6.942  ns/op

loop ... 8
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2463.224 ?     11.629  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2446.619 ?     9.804  ns/op

loop ... 9
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2386.217 ?     15.932  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2466.078 ?     6.931  ns/op

loop ... 10
// After
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2591.705 ?     12.051  ns/op
// Before
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   20      2400.250 ?     7.660  ns/op

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8322179: RISC-V: Implement SHA-1 intrinsic (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17130/head:pull/17130
$ git checkout pull/17130

Update a local copy of the PR:
$ git checkout pull/17130
$ git pull https://git.openjdk.org/jdk.git pull/17130/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 17130

View PR using the GUI difftool:
$ git pr show -t 17130

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17130.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 15, 2023

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 15, 2023
@openjdk
Copy link

openjdk bot commented Dec 15, 2023

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 15, 2023
@mlbridge
Copy link

mlbridge bot commented Dec 15, 2023

Copy link
Contributor

@gctony gctony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll finish the review tomorrow. But posting the comments I have so far.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Show resolved Hide resolved
__ enter();

// c_rarg0 - c_rarg3: x10 - x13
Register buf = c_rarg0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could copy the four arguments to a different set of registers and use a0 -> a3 for some of the other values to see if you can increase the number of compressed instructions that can be used. Unclear whether it's worth it or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, I'm not sure if we should take this approach.
Good side might be some code size reduction, bad side might be it's a bit confusing to read and maintain the code.

__ add(e, e, prev_e);
}

void sha1_reserve_prev_abcde(Register a, Register b, Register c, Register d, Register e,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to just use t0 and t1 for intermediate results without passing them as args. I used to pass them as args too, but I changed that for md5.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.
But I found out it's not that clear in some situation, and error-prone, e.g. when the code is a bit complicated.
So, my policy here is that I only use built-in register names in generate_sha1_implCompress, in all other places, I just use the renamed ones and use them explicitly.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
@RealFYang
Copy link
Member

RealFYang commented Jan 10, 2024

Simply ran micro:java.security.MessageDigests JMH on my Lichee-pi-4a board, seems there is a small regression for the MessageDigests.getAndDigest (length = 64) case:

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   417.311 ?  2.686  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.206 ?  0.008  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   404.769 ? 14.810  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.106 ?  0.046  ops/ms

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   518.057 ?  5.935  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.569 ?  0.009  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   378.184 ? 37.425  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.515 ?  0.017  ops/ms

@openjdk
Copy link

openjdk bot commented Jan 10, 2024

@Hamlin-Li this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout sha-1
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jan 10, 2024
@Hamlin-Li
Copy link
Author

Simply ran micro:java.security.MessageDigests JMH on my Lichee-pi-4a board, seems there is a small regression for the MessageDigests.getAndDigest (length = 64) case:

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   417.311 ?  2.686  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.206 ?  0.008  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   404.769 ? 14.810  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.106 ?  0.046  ops/ms

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  thrpt   15   518.057 ?  5.935  ops/ms
MessageDigests.digest                 SHA-1     16384     DEFAULT  thrpt   15     5.569 ?  0.009  ops/ms
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  thrpt   15   378.184 ? 37.425  ops/ms
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  thrpt   15     5.515 ?  0.017  ops/ms

Hey,
Thanks for testing.
The data is interesting, seems there is just a little improvement when data is 16384, compared to my test result more than 7X speedup. I will try different test environment. At the same time, can you help to double check your env whether the sha-1 intrinsic is enabled/disabled as expected?

@Hamlin-Li
Copy link
Author

@RealFYang My bad, I think I added some options in my test scripts accidently, which cause the perf data is not right.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Jan 11, 2024

The reason why there is some regression in getAndDigest when size == 64 is,

  1. for test MessageDigests.getAndDigest, it's actually j.s.MessageDigest.getInstance + j.s.MessageDigest.digest;
  2. for j.s.MessageDigest.getInstance, we don't make any improvement, during the jmh test the performance jitter is kind of big, which I show below with dozens of runs 1.
  3. for j.s.MessageDigest.digest, we made the improvement, but when the size == 64, the improvement is not big enough to fill the gap introduced by performance jitter introduced by j.s.MessageDigest.getInstance.
  4. so, combine above together, the performance "regression" in MessageDigests.getAndDigest when size == 64, should be performance jitter introduced by j.s.MessageDigest.getInstance, which I also show below with dozens of runs 2.

When the data size is bigger enough, i.e. 16384, the performance gain is more clear in test MessageDigests.getAndDigest.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Jan 11, 2024

performance jitter of j.s.MessageDigest.getInstance

loop ... 1
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     674.589 ?   21.290  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     715.351 ?   16.302  ns/op
loop ... 2
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     744.618 ?   14.846  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     663.602 ?   15.462  ns/op
loop ... 3
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     695.022 ?   17.110  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.499 ?   17.883  ns/op
loop ... 4
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.238 ?   15.415  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     696.098 ?   13.663  ns/op
loop ... 5
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     676.999 ?   20.736  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     678.875 ?   14.123  ns/op
loop ... 6
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.661 ?   15.219  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     675.950 ?   16.287  ns/op
loop ... 7
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     666.744 ?   17.272  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     690.439 ?   17.091  ns/op
loop ... 8
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.327 ?   16.777  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     663.682 ?   16.646  ns/op
loop ... 9
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     688.047 ?   14.885  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.252 ?   17.249  ns/op
loop ... 10
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     684.169 ?   15.243  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     756.130 ?    6.580  ns/op
loop ... 1
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     680.255 ?    2.605  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.618 ?    4.099  ns/op
loop ... 2
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     672.097 ?    2.443  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     686.033 ?    2.162  ns/op
loop ... 3
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     662.741 ?    4.445  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.579 ?    1.971  ns/op
loop ... 4
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.821 ?   19.279  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.139 ?    2.115  ns/op
loop ... 5
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     667.951 ?   23.085  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     676.130 ?    2.538  ns/op
loop ... 6
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.076 ?    5.673  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     677.080 ?   13.638  ns/op
loop ... 7
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     673.471 ?   14.564  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     683.718 ?    1.804  ns/op
loop ... 8
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     665.747 ?    2.983  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     667.895 ?    3.142  ns/op
loop ... 9
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     808.671 ?   15.151  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     671.326 ?    9.426  ns/op
loop ... 10
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     666.879 ?    2.342  ns/op
GetMessageDigest.getInstance                       SHA-1       N/A         N/A  avgt   20     670.161 ?    3.510  ns/op

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Jan 11, 2024

performance jitter of MessageDigests.getAndDigest (when size == 64) introduced by j.s.MessageDigest.getInstance

loop ... 1
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2521.461 ?   16.450  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2428.247 ?   10.967  ns/op
loop ... 2
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2436.408 ?   32.545  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2424.811 ?   11.521  ns/op
loop ... 3
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2484.609 ?    9.321  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2462.019 ?   16.622  ns/op
loop ... 4
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2550.315 ?   32.915  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2459.512 ?   95.334  ns/op
loop ... 5
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2852.858 ?   45.638  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2472.366 ?   87.208  ns/op
loop ... 6
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2580.812 ?   41.330  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2452.006 ?   86.735  ns/op
loop ... 7
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2427.592 ?   44.410  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2437.358 ?   92.754  ns/op
loop ... 8
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2438.203 ?   20.321  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2633.727 ?   72.077  ns/op
loop ... 9
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2424.485 ?   24.730  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2464.582 ?   14.058  ns/op
loop ... 10
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2408.087 ?   34.954  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2464.284 ?   65.394  ns/op
loop ... 1
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2452.427 ?   12.069  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2451.976 ?    6.551  ns/op
loop ... 2
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2428.869 ?    8.818  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2520.389 ?    8.184  ns/op
loop ... 3
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2511.975 ?   16.673  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2509.494 ?   21.087  ns/op
loop ... 4
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2483.784 ?   16.029  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2512.870 ?    9.201  ns/op
loop ... 5
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2441.026 ?    9.222  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2431.875 ?    9.295  ns/op
loop ... 6
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2402.302 ?    9.737  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2474.198 ?    6.352  ns/op
loop ... 7
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2484.982 ?   13.996  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2444.898 ?   14.270  ns/op
loop ... 8
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2565.433 ?   10.722  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2469.290 ?   32.165  ns/op
loop ... 9
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2527.289 ?   18.710  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2480.432 ?    9.804  ns/op
loop ... 10
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2474.362 ?   23.856  ns/op
MessageDigests.getAndDigest                        SHA-1        64     DEFAULT  avgt   20    2433.547 ?   30.935  ns/op

@RealFYang
Copy link
Member

RealFYang commented Jan 16, 2024

FYI: The performance numbers seems more stable on other platforms like Unmatched board (JMH AverageTime mode):

Before:
MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3974.419 ?    28.954  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   411073.165 ?  3731.988  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7136.679 ?   480.850  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   429881.929 ?  1265.110  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3993.060 ?     6.265  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   410724.751 ?  2075.018  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7085.596 ?   496.358  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   430184.356 ?  1052.236  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     4016.232 ?    48.074  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   417735.231 ?  7001.640  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     7114.528 ?   504.775  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   438041.321 ? 20056.313  ns/op

After:
MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3685.514 ?     5.401  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364406.355 ?   217.797  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5427.864 ?    41.520  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367995.806 ?   228.853  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3681.851 ?     6.591  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364433.610 ?   226.146  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5483.575 ?    46.445  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367713.143 ?   348.944  ns/op

MessageDigests.digest                 SHA-1        64     DEFAULT  avgt   15     3686.556 ?     6.273  ns/op
MessageDigests.digest                 SHA-1     16384     DEFAULT  avgt   15   364631.822 ?  1265.576  ns/op
MessageDigests.getAndDigest           SHA-1        64     DEFAULT  avgt   15     5496.395 ?    66.473  ns/op
MessageDigests.getAndDigest           SHA-1     16384     DEFAULT  avgt   15   367870.983 ?   296.836  ns/op

@Hamlin-Li
Copy link
Author

@RealFYang Thanks for testing. Yes, also found that on VF 2 test result is more stable than Lichee Pi 4A.

Not sure if we need to further investigate the performance jitter on Lichee Pi 4A, seems it's not related to this implementation. How do yo think about it?

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Jan 23, 2024

image

( NOTE: in above graph, column master is the data for jdk master; column sha-disable is the data for intrinsic disabled, column sha-enable is the data for intrinsic enabled. )

On T-HEAD Light Lichee Pi 4A, by running jmh 10 rounds (each round with JMH_PARAMS="-f 1 -wi 5 -i 10"), it shows there is no performance regression for MessageDigests.getAndDigest(size == 64).

More details
For MessageDigests.getAndDigest(size == 64), the performance data in master also shows more severe jitter than MessageDigests.getAndDigest(size == 16384). So, the performance jitter in MessageDigests.getAndDigest(size == 64) is not introduced by this implementation.
And, more common use case should be 1 getInstance invocation + N digest invocation, rather than 1 getInstance + 1 digest?

@Hamlin-Li
Copy link
Author

Hey,
Can I get more reviews? Thanks

@RealFYang
Copy link
Member

Hey, Can I get more reviews? Thanks

Hi, will take another look. Can you merge with master? I see bot added merge-conflicts label. Thanks.

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jan 29, 2024
@Hamlin-Li
Copy link
Author

Hey, Can I get more reviews? Thanks

Hi, will take another look. Can you merge with master? I see bot added merge-conflicts label. Thanks.

Updated! Thanks!

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. A few more minor comments.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
@Hamlin-Li
Copy link
Author

Hey, I'll be on vacation for a while, so maybe slow in response.
But please leave your comments freely as usual.

Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, Thanks for the quick update. A couple of minor comments remain after another look.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
sha1_calculate_im_hash(a, b, c, d, e, prev_ab, prev_cd, prev_e);

if (multi_block) {
__ bge(limit, buf, L_sha1_loop, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question: Is it OK to continue the loop when limit equals buf?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to your question is at https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/DigestBase.java#L130, the limit is indeed limit - blockSize, hope this answer your question?

__ add(t, t, cur_w);
__ rolw_imm(cur_w, a, 5, t0);
// as pointed above, we can use cur_w as temporary register here.
sha1_f(round, tmp, b, c, d);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also reuse cur_k in this function where input param tmp is used? I see cur_k will also be recalculated at the beginning of each round like cur_w. Hope this could help eliminate tmp param and finally free t2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not, as cur_k is only calculated when (round % 20) == 0.

Other comments are all resolved.

Copy link
Member

@RealFYang RealFYang Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Could you please move int round as the last formal param for other assembler functions so that it will be more consistent? They are sha1_prepare_w, sha1_f, and sha1_process_round. BTW: I think it will be safer to add scratch register t0, t1 to the list for assert_different_registers if they are used in those assembler functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's updated. Thanks.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Show resolved Hide resolved
Copy link
Contributor

@gctony gctony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, pretty good! Only a few minor suggestions for your consideration!

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved

if (round == 16) {
int64_t block_bytes = round * 4;
__ addi(buf, buf, block_bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does buf need to be incremented here? And why at round == 16 and not after all the rounds are done? Maybe you can do this in the loop in generate_sha1_implCompress to have the code that initializes and increments it in the same function? Also, does it need to be incremented if !multi_block?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Good catch!

And other comments are also resolved.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@gctony gctony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for applying the changes! PR LGTM.

@openjdk
Copy link

openjdk bot commented Feb 26, 2024

@Hamlin-Li This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8322179: RISC-V: Implement SHA-1 intrinsic

Reviewed-by: tonyp, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 347 new commits pushed to the master branch:

  • 0901ded: 8326433: Make file-local functions static in src/jdk.jdwp.agent/unix/native/libjdwp/exec_md.c
  • d22d890: 8325898: ChoiceFormat returns erroneous result when formatting bad pattern
  • 93feda3: 8322962: Upcall stub might go undetected when freezing frames
  • fc67c2b: 8325530: Vague error message when com.sun.tools.attach.VirtualMachine fails to load agent library
  • b87d9cf: 8325506: Ensure randomness is only read from provided SecureRandom object
  • 0963a4e: 8326699: Problemlist CertMsgCheck.java
  • bf13a4e: 8322881: java/nio/file/Files/CopyMoveVariations.java fails with AccessDeniedException due to permissions of files in /tmp
  • f62b578: 8311644: Server should not send bad_certificate alert when the client does not send any certificates
  • 9a9cfbe: 8325340: Add ASCII fast-path to Data-/ObjectInputStream.readUTF
  • 3780ad3: 8326587: Separate out Microsoft toolchain linking
  • ... and 337 more: https://git.openjdk.org/jdk/compare/7a300b63b5ca22dfe3e831e641f7a11b9c719b30...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 26, 2024
Copy link
Member

@RealFYang RealFYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Are there any changes on the performance numbers for the latest version? Thanks.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Feb 27, 2024

Looks good. Are there any changes on the performance numbers for the latest version? Thanks.

I did not compare between all versions, but yes, there is further improvement especially when data size grow, e.g. comparing with abbf7c2, final version as below

# final
Benchmark                                           (algorithm)  (dataSize)  (digesterName)  (length)  (provider)  Mode  Cnt         Score        Error  Units
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   10      1845.049 ?     46.909  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    174383.296 ?    605.029  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   10      2478.380 ?     19.147  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    176391.224 ?    489.730  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   10  11591613.058 ? 310656.422  ns/op

# abbf7c2
Benchmark                                           (algorithm)  (dataSize)  (digesterName)  (length)  (provider)  Mode  Cnt         Score        Error  Units
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1        64     DEFAULT  avgt   10      1856.090 ?     34.903  ns/op
o.o.b.java.security.MessageDigests.digest                   N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    181397.109 ?    446.396  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1        64     DEFAULT  avgt   10      2469.551 ?     23.858  ns/op
o.o.b.java.security.MessageDigests.getAndDigest             N/A         N/A           SHA-1     16384     DEFAULT  avgt   10    183116.724 ?    343.619  ns/op
o.o.b.javax.crypto.small.MessageDigestBench.digest         SHA1     1048576             N/A       N/A              avgt   10  12059508.183 ? 285536.713  ns/op
   benchmark                                        algorithm  dataSize  digesterName  length  provider        base                  comp                        diff
1  o.o.b.java.security.MessageDigests.digest              N/A       N/A         SHA-1      64   DEFAULT    1856.090   ±34.903    1845.049   ±46.909  ns/op    -11.041  -0.59%
2  o.o.b.java.security.MessageDigests.digest              N/A       N/A         SHA-1   16384   DEFAULT  181397.109  ±446.396  174383.296  ±605.029  ns/op  -7013.813  -3.87%  ✓
3  o.o.b.java.security.MessageDigests.getAndDigest        N/A       N/A         SHA-1      64   DEFAULT    2469.551   ±23.858    2478.380   ±19.147  ns/op     +8.829  +0.36%
4  o.o.b.java.security.MessageDigests.getAndDigest        N/A       N/A         SHA-1   16384   DEFAULT  183116.724  ±343.619  176391.224  ±489.730  ns/op  -6725.500  -3.67%  ✓

@Hamlin-Li
Copy link
Author

Thanks @RealFYang @gctony for your detailed reviewing.

/integrate

@openjdk
Copy link

openjdk bot commented Feb 27, 2024

Going to push as commit a48f596.
Since your change was applied there have been 350 commits pushed to the master branch:

  • 5daf63b: 8326636: Problem list StartupOutput.java due to 8326615
  • 97aca09: 8326717: Disable stringop-overflow in shenandoahLock.cpp
  • ec9437d: 8325247: Memory leak in SessionKeyRef class def when using PKCS11 security provider
  • 0901ded: 8326433: Make file-local functions static in src/jdk.jdwp.agent/unix/native/libjdwp/exec_md.c
  • d22d890: 8325898: ChoiceFormat returns erroneous result when formatting bad pattern
  • 93feda3: 8322962: Upcall stub might go undetected when freezing frames
  • fc67c2b: 8325530: Vague error message when com.sun.tools.attach.VirtualMachine fails to load agent library
  • b87d9cf: 8325506: Ensure randomness is only read from provided SecureRandom object
  • 0963a4e: 8326699: Problemlist CertMsgCheck.java
  • bf13a4e: 8322881: java/nio/file/Files/CopyMoveVariations.java fails with AccessDeniedException due to permissions of files in /tmp
  • ... and 340 more: https://git.openjdk.org/jdk/compare/7a300b63b5ca22dfe3e831e641f7a11b9c719b30...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Feb 27, 2024
@openjdk openjdk bot closed this Feb 27, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Feb 27, 2024
@openjdk
Copy link

openjdk bot commented Feb 27, 2024

@Hamlin-Li Pushed as commit a48f596.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@Hamlin-Li Hamlin-Li deleted the sha-1 branch February 27, 2024 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants