-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joserochh/ntt avoid memcpy #72
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start. Unless there's a performance hit, I think it'd be cleaner overall to have one out-of-place function for the butterflies / FwdT8 / ..., rather than separate in-place / out-of-place functions.
These are the performance results so far. |
Performance looks good - 4-8% speedup in the FwdNTTCopy / InvNTTCopy benchmarks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. It would be best to fold the first loop iteration back into the loop to avoid so much duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks @joserochh!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
* Fix 32-bit AVX512DQ InvNTT * Refactor NTT tests for better coverage
b61a5b4
to
f0305ed
Compare
Avoiding memcpy calls on NTT * Avoiding memcpys on Fwd NTT * Avoiding memcpy on INV NTT * Fixing some lines length * using only one out-of-place on first passes * Adding out-of-place for raddix 4 NTT * Adding gpg issue * Adding test cases for out place NTT * Removing commented code and testing GPG Signing * Fboemer/fix 32 bit invntt (#73) * Fix 32-bit AVX512DQ InvNT * Refactor NTT tests for better coverage * Added performance tips to README (#74) * small fix on test case (missed during merge) Co-authored-by: Fabian Boemer <fabian.boemer@intel.com>
Draft for changes to avoid memcpy operations on NTT