Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hardware accelerated Aes on x64 #5196

Merged
merged 1 commit into from
Jan 23, 2023
Merged

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Jan 22, 2023

Changes

Types of changes

Generates clean hardware accelerated asm for the encypt+decrypt methods compared to the software version. E.g. the encrypt 256 is below

; Method Encode256:ProcessRounds(byref):this
G_M000_IG01:                ;; offset=0000H
       sub      rsp, 40
       vzeroupper 

G_M000_IG02:                ;; offset=0007H
       mov      rax, gword ptr [rcx+10H]
       mov      ecx, dword ptr [rax+08H]
       cmp      ecx, 14
       jbe      G_M000_IG04
       vmovupd  xmm0, xmmword ptr [rdx]
       vpxor    xmm0, xmm0, xmmword ptr [rax+10H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+20H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+30H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+40H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+50H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+60H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+70H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+80H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+90H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+A0H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+B0H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+C0H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+D0H]
       vaesenc  xmm0, xmm0, xmmword ptr [rax+E0H]
       vaesenclast xmm0, xmm0, xmmword ptr [rax+F0H]
       vmovupd  xmmword ptr [rdx], xmm0

G_M000_IG03:                ;; offset=0090H
       add      rsp, 40
       ret      

G_M000_IG04:                ;; offset=0095H
       call     CORINFO_HELP_RNGCHKFAIL
       int3     
; Total bytes of code: 155

What types of changes does your code introduce?

  • Bugfix (a non-breaking change that fixes an issue)
  • New feature (a non-breaking change that adds functionality)
  • Breaking change (a fix or a feature that would cause existing functionality not to work as expected)
  • Documentation update
  • Code style update (formatting, renaming)
  • Refactoring (no functional or API changes)
  • Build-related changes
  • Other: performance

Testing

Confirmed all the Aes goes through the accelerated IBlockCipher on supported hardware (everything x64)

image

Requires testing

  • Yes
  • No

If yes, did you write tests?

  • Yes
  • No

@benaadams
Copy link
Member Author

Contributed the improvement upstream bcgit/bc-csharp#414

@rubo rubo self-requested a review January 23, 2023 08:55
@rubo
Copy link
Contributor

rubo commented Jan 23, 2023

This looks great. On the other hand, I'm considering replacing the BouncyCastle implementation with .NET's own one.

@benaadams
Copy link
Member Author

This looks great. On the other hand, I'm considering replacing the BouncyCastle implementation with .NET's own one.

Aren't necessarily incompatible; as I assume switching implementation will be a longer and more involved process due to different abstractions? So could do this change as quick win; then follow up with implementation change?

Copy link
Contributor

@rubo rubo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested. No issues.

@rubo rubo merged commit f5bd53d into NethermindEth:master Jan 23, 2023
@benaadams benaadams deleted the Aes branch January 23, 2023 23:03
@rubo
Copy link
Contributor

rubo commented Feb 3, 2023

I'm surprised that this is way faster than the implementation of Aes.Create() on the x64 machine which makes me think that the underlying OS implementation doesn't use AES-NI.

@benaadams
Copy link
Member Author

I'm surprised that this is way faster than the implementation of Aes.Create() on the x64 machine which makes me think that the underlying OS implementation doesn't use AES-NI.

Glad to here, I did spend quite a while optimizing the inheritance chain and implementation to get it just so 😅

However everything else is just abstraction cruft; so if you can take this implementation (and the sealed implementation inherence; which the Jit can then switch to direct calls as only use one type size of key and either encryption or decryption but not both at each call site) and combine it with platform version (which will be better for Arm and fallback); rather than bouncy castle; should have a very fast implementation.

@LukaszRozmej
Copy link
Member

What is the speedup?

@benaadams
Copy link
Member Author

The implementation of Aes in OpenSsl and Windows side is very good; however for small payloads might pay in interop and abstractions to get there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants