-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8322179: RISC-V: Implement SHA-1 intrinsic #17130
Conversation
👋 Welcome back mli! A progress list of the required criteria for merging this PR into |
@Hamlin-Li The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll finish the review tomorrow. But posting the comments I have so far.
__ enter(); | ||
|
||
// c_rarg0 - c_rarg3: x10 - x13 | ||
Register buf = c_rarg0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could copy the four arguments to a different set of registers and use a0 -> a3 for some of the other values to see if you can increase the number of compressed instructions that can be used. Unclear whether it's worth it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, I'm not sure if we should take this approach.
Good side might be some code size reduction, bad side might be it's a bit confusing to read and maintain the code.
__ add(e, e, prev_e); | ||
} | ||
|
||
void sha1_reserve_prev_abcde(Register a, Register b, Register c, Register d, Register e, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe to just use t0 and t1 for intermediate results without passing them as args. I used to pass them as args too, but I changed that for md5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is.
But I found out it's not that clear in some situation, and error-prone, e.g. when the code is a bit complicated.
So, my policy here is that I only use built-in register names in generate_sha1_implCompress, in all other places, I just use the renamed ones and use them explicitly.
Simply ran
|
@Hamlin-Li this pull request can not be integrated into git checkout sha-1
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Hey, |
@RealFYang My bad, I think I added some options in my test scripts accidently, which cause the perf data is not right. |
The reason why there is some regression in getAndDigest when size == 64 is,
When the data size is bigger enough, i.e. 16384, the performance gain is more clear in test MessageDigests.getAndDigest. |
performance jitter of j.s.MessageDigest.getInstance
|
performance jitter of MessageDigests.getAndDigest (when size == 64) introduced by j.s.MessageDigest.getInstance
|
FYI: The performance numbers seems more stable on other platforms like Unmatched board (JMH AverageTime mode):
|
@RealFYang Thanks for testing. Yes, also found that on VF 2 test result is more stable than Lichee Pi 4A. Not sure if we need to further investigate the performance jitter on Lichee Pi 4A, seems it's not related to this implementation. How do yo think about it? |
( NOTE: in above graph, column On More details |
Hey, |
Hi, will take another look. Can you merge with master? I see bot added merge-conflicts label. Thanks. |
Updated! Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. A few more minor comments.
test/hotspot/jtreg/compiler/testlibrary/sha/predicate/IntrinsicPredicates.java
Outdated
Show resolved
Hide resolved
Hey, I'll be on vacation for a while, so maybe slow in response. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Thanks for the quick update. A couple of minor comments remain after another look.
sha1_calculate_im_hash(a, b, c, d, e, prev_ab, prev_cd, prev_e); | ||
|
||
if (multi_block) { | ||
__ bge(limit, buf, L_sha1_loop, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A small question: Is it OK to continue the loop when limit
equals buf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer to your question is at https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/sun/security/provider/DigestBase.java#L130, the limit
is indeed limit - blockSize
, hope this answer your question?
__ add(t, t, cur_w); | ||
__ rolw_imm(cur_w, a, 5, t0); | ||
// as pointed above, we can use cur_w as temporary register here. | ||
sha1_f(round, tmp, b, c, d); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also reuse cur_k
in this function where input param tmp
is used? I see cur_k
will also be recalculated at the beginning of each round like cur_w
. Hope this could help eliminate tmp
param and finally free t2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems not, as cur_k is only calculated when (round % 20) == 0
.
Other comments are all resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Could you please move int round
as the last formal param for other assembler functions so that it will be more consistent? They are sha1_prepare_w
, sha1_f
, and sha1_process_round
. BTW: I think it will be safer to add scratch register t0
, t1
to the list for assert_different_registers
if they are used in those assembler functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, it's updated. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, pretty good! Only a few minor suggestions for your consideration!
|
||
if (round == 16) { | ||
int64_t block_bytes = round * 4; | ||
__ addi(buf, buf, block_bytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does buf
need to be incremented here? And why at round == 16
and not after all the rounds are done? Maybe you can do this in the loop in generate_sha1_implCompress
to have the code that initializes and increments it in the same function? Also, does it need to be incremented if !multi_block
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Good catch!
And other comments are also resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for applying the changes! PR LGTM.
@Hamlin-Li This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 347 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Are there any changes on the performance numbers for the latest version? Thanks.
I did not compare between all versions, but yes, there is further improvement especially when data size grow, e.g. comparing with
|
Thanks @RealFYang @gctony for your detailed reviewing. /integrate |
Going to push as commit a48f596.
Your commit was automatically rebased without conflicts. |
@Hamlin-Li Pushed as commit a48f596. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Hi,
Can you review this patch to implement SHA-1 intrinsic for riscv?
Thanks!
Test
Functionality
tests under
test/hotspot/jtreg/compiler/intrinsics/sha
tests found via
find test/jdk -iname "*SHA1*.java"
Performance
tested on
T-HEAD Light Lichee Pi 4A
JMH_PARAMS="-f 1 -wi 10 -i 20" // for every loop of jmh test
benchmark tests
MessageDigests.java GetMessageDigest.java MessageDigestBench.java MacBench.java
which are undertest/micro/org/openjdk/bench/
, more spcificallyTESTS="MessageDigests.digest MessageDigests.getAndDigest MessageDigestBench.digest"
getAndDigest when size == 64
The data is not stable for test getAndDigest when size == 64, which I think is introduced by j.s.MessageDigest.getInstance itself, which we don't touch in this patch.
Check more details at 1
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/17130/head:pull/17130
$ git checkout pull/17130
Update a local copy of the PR:
$ git checkout pull/17130
$ git pull https://git.openjdk.org/jdk.git pull/17130/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 17130
View PR using the GUI difftool:
$ git pr show -t 17130
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/17130.diff
Webrev
Link to Webrev Comment