Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@erozenfeld
Copy link
Member

 This change adds code to recognize rotation idioms and generate efficient instructions for them.

 Two new operators are added: GT_ROL and GT_ROR.

 The patterns recognized:
 (x << c1) | (x >>> c2) => x rol c1
 (x >>> c1) | (x << c2) => x ror c2

 where c1 and c2  are constant and c1 + c2 == bitsize(x)

 (x << y) | (x >>> (N - y)) => x rol y
 (x >>> y) | (x << (N - y)) => x ror y

 where N == bitsize(x)

 (x << y & M1) | (x >>> (N - y) & M2) => x rol y
 (x >>> y & M1) | (x << (N - y) & M2) => x ror y

 where N == bitsize(x)
 M1 & (N - 1) == N - 1
 M2 & (N - 1) == N - 1

 For a simple benchmark with 4 rotation patterns in a tight loop
 time goes from 7.324 to 2.600 (2.8 speedup).

 Rotations found and optimized in mscorlib:
 System.Security.Cryptography.SHA256Managed::RotateRight
 System.Security.Cryptography.SHA384Managed::RotateRight
 System.Security.Cryptography.SHA512Managed::RotateRight
 System.Security.Cryptography.RIPEMD160Managed:MDTransform (320 instances!)
 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol1
 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol5
 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol30
 System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Drain
 (9 instances of Sha1ForNonSecretPurposes::Rol* inlined)

 Closes #1619.

     This change adds code to recognize rotation idioms and generate efficient instructions for them.

     Two new operators are added: GT_ROL and GT_ROR.

     The patterns recognized:
     (x << c1) | (x >>> c2) => x rol c1
     (x >>> c1) | (x << c2) => x ror c2

     where c1 and c2  are constant and c1 + c2 == bitsize(x)

     (x << y) | (x >>> (N - y)) => x rol y
     (x >>> y) | (x << (N - y)) => x ror y

     where N == bitsize(x)

     (x << y & M1) | (x >>> (N - y) & M2) => x rol y
     (x >>> y & M1) | (x << (N - y) & M2) => x ror y

     where N == bitsize(x)
     M1 & (N - 1) == N - 1
     M2 & (N - 1) == N - 1

     For a simple benchmark with 4 rotation patterns in a tight loop
     time goes from 7.324 to 2.600 (2.8 speedup).

     Rotations found and optimized in mscorlib:
     System.Security.Cryptography.SHA256Managed::RotateRight
     System.Security.Cryptography.SHA384Managed::RotateRight
     System.Security.Cryptography.SHA512Managed::RotateRight
     System.Security.Cryptography.RIPEMD160Managed:MDTransform (320 instances!)
     System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol1
     System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol5
     System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Rol30
     System.Diagnostics.Tracing.EventSource.Sha1ForNonSecretPurposes::Drain
     (9 instances of Sha1ForNonSecretPurposes::Rol* inlined)

     Closes #1619.
@erozenfeld
Copy link
Member Author

@sivarv PTAL

@CarolEidt
Copy link

Reviewed offline - thanks for adding the test!
LGTM

@sivarv
Copy link
Member

sivarv commented Oct 22, 2015

LGTM

erozenfeld added a commit that referenced this pull request Oct 22, 2015
     Generate efficient code for rotation patterns.
@erozenfeld erozenfeld merged commit 429bb1c into dotnet:master Oct 22, 2015
@erozenfeld erozenfeld deleted the RotateBits branch October 22, 2015 22:37
@jamesqo
Copy link

jamesqo commented Jul 19, 2016

This does not appear to be working for 32-bit... for this function

[MethodImpl(MethodImplOptions.NoInlining)]
private static int Foo(int left)
{
    uint rol5 = ((uint)left << 5) | ((uint)left >> 27);
    return (int)rol5;
}

a rol is getting emitted for x64, but not for regular x86.

; x64
G_M30394_IG01:

G_M30394_IG02:
       8BC1                 mov      eax, ecx
       C1C005               rol      eax, 5

G_M30394_IG03:
       C3                   ret

; x86
G_M30394_IG01:
       55           push     ebp
       8BEC         mov      ebp, esp

G_M30394_IG02:
       8BC1         mov      eax, ecx
       C1E005       shl      eax, 5
       C1E91B       shr      ecx, 27
       0BC1         or       eax, ecx

G_M30394_IG03:
       5D           pop      ebp
       C3           ret

@erozenfeld
Copy link
Member Author

Yes, this optimization was added only to RyuJIT (which is the default for 64 bit), not to the current 32-bit jit. 32-bit RyuJIT work is in progress and it will support this optimization.

@jamesqo
Copy link

jamesqo commented Jul 19, 2016

Ah, so that explains it. Thank you for clarifying 😄

@LMLB
Copy link

LMLB commented Aug 25, 2016

In the case with the masks, does the N == bitsize(x) test check the masked or the unmasked value of N? (e.g. (x << y & M1) | (x >>> (-y) & M2)) is also valid and something I have used in the past, as (x << y) | (x >> -y) in C#).

@erozenfeld
Copy link
Member Author

No, these patterns are not recognized with the current implementation.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
     Generate efficient code for rotation patterns.

Commit migrated from dotnet/coreclr@429bb1c
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants