crypto.sha3: rewrite and optimize kaccak_p_1600_24() engine, update tests#26524
Open
tankf33der wants to merge 1 commit intovlang:masterfrom
Open
crypto.sha3: rewrite and optimize kaccak_p_1600_24() engine, update tests#26524tankf33der wants to merge 1 commit intovlang:masterfrom
tankf33der wants to merge 1 commit intovlang:masterfrom
Conversation
Contributor
Author
|
@blackshirt take a look. Of course I've tested it with your pslhdsa implementation. |
Contributor
Author
|
@kimshrier - take a look. What you think? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I finally want to show the patch for accelerating
sha3performance.This is approximately the 4th generation patch from a multi-week development and fun.
It all started with a patch that speeds up by 10%, and ended up with a multi-fold speedup for both
tccandgcc.If you take my standard file for sha3 performance testing, you can see multiple function calls inside the rounds, once I conquered that it was just a matter of technique.
and even if you check whether the compiler inlined them, it still turns out to be costly.
Besides, the official site suggests merging several functions into one and then they are not needed at all.
The latest generation of the patch consists of simply unrolling the loops and making them less costly.
Had to tinker with it.
I have my own tests with full coverage for files with test vectors and openssl calls so I'm not worried.
Now the profiler shows normal metrics:
Had to sacrifice some tests because they became impossible, there's simply no code that they rely on.
Speed up: tcc ~4.5+ times, gcc ~3+ times