Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joserochh/ntt avoid memcpy #72

Merged
merged 12 commits into from
Oct 11, 2021

Conversation

joserochh
Copy link
Contributor

Draft for changes to avoid memcpy operations on NTT

@joserochh joserochh marked this pull request as ready for review October 4, 2021 16:43
@joserochh joserochh requested a review from a team as a code owner October 4, 2021 16:43
Copy link
Contributor

@fboemer fboemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start. Unless there's a performance hit, I think it'd be cleaner overall to have one out-of-place function for the butterflies / FwdT8 / ..., rather than separate in-place / out-of-place functions.

hexl/ntt/fwd-ntt-avx512.cpp Outdated Show resolved Hide resolved
hexl/ntt/fwd-ntt-avx512.cpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-default.hpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-radix-2.cpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-radix-2.cpp Outdated Show resolved Hide resolved
@joserochh joserochh marked this pull request as draft October 4, 2021 17:03
@joserochh
Copy link
Contributor Author

These are the performance results so far.
Performance.xlsx

@fboemer
Copy link
Contributor

fboemer commented Oct 4, 2021

These are the performance results so far. Performance.xlsx

Performance looks good - 4-8% speedup in the FwdNTTCopy / InvNTTCopy benchmarks

@joserochh
Copy link
Contributor Author

Last Performance Checks

@joserochh joserochh marked this pull request as ready for review October 5, 2021 22:39
Copy link
Contributor

@fboemer fboemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. It would be best to fold the first loop iteration back into the loop to avoid so much duplicate code.

hexl/ntt/ntt-default.hpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-default.hpp Show resolved Hide resolved
hexl/ntt/ntt-default.hpp Show resolved Hide resolved
hexl/ntt/ntt-radix-2.cpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-radix-4.cpp Outdated Show resolved Hide resolved
hexl/ntt/ntt-radix-2.cpp Show resolved Hide resolved
Copy link
Contributor

@fboemer fboemer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @joserochh!

Copy link
Contributor

@GelilaSeifu GelilaSeifu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@joserochh joserochh merged commit a1c6132 into intel:main Oct 11, 2021
fboemer added a commit that referenced this pull request Nov 8, 2021
Avoiding memcpy calls on NTT

* Avoiding memcpys on Fwd NTT
* Avoiding memcpy on INV NTT
* Fixing some lines length
* using only one out-of-place on first passes
* Adding out-of-place for raddix 4 NTT
* Adding gpg issue
* Adding test cases for out place NTT
* Removing commented code and testing GPG Signing
* Fboemer/fix 32 bit invntt (#73)
* Fix 32-bit AVX512DQ InvNT
* Refactor NTT tests for better coverage
* Added performance tips to README (#74)
* small fix on test case (missed during merge)

Co-authored-by: Fabian Boemer <fabian.boemer@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants