[Arm64] Crypto arm64 intrinsics implementation #16499

debayang · 2018-02-22T07:01:27Z

@dotnet/arm64-contrib

Arm64 intrinsics implementation for Aes, SHA1 and SHA256

API proposal:ARM64 Crypto HW Intrinsics

dnfclas · 2018-02-22T07:01:38Z

All CLA requirements met.

sdmaclea · 2018-02-22T15:15:13Z

@CarolEidt PTAL
@dotnet/jit-contrib @dotnet/arm64-contrib FYI

sdmaclea · 2018-02-22T15:21:29Z

tests/src/JIT/HardwareIntrinsics/Arm64/Crypto.cs

+using System.Runtime.InteropServices;
+using System.Runtime.Intrinsics;
+using System.Runtime.Intrinsics.Arm.Arm64;
+


@debayang Building these tests will fail until API is added to corefx.

If we want to merge this, it will need #if. Something like I did here https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/HardwareIntrinsics/Arm64/Simd.cs#L6

Thanks @sdmaclea .
Fixed

sdmaclea · 2018-02-22T15:22:12Z

@briansull PTAL at the emitters

sdmaclea · 2018-02-22T15:24:48Z

@debayang The Ubuntu formatting job is failing, this is due to formatting issues. If you click on the details link, you can see a patch to apply to fix the formatting issues. You can also run install and run clang-format on source files in the src jit tree.

debayang · 2018-02-22T16:55:12Z

@sdmaclea Corrected formatting

sdmaclea

LGTM

sdmaclea · 2018-02-22T17:02:12Z

@debayang Looks like the test you added failed on Windows x86. Probably #if is not correct yet.

debayang · 2018-02-22T18:39:01Z

@sdmaclea Fixed test case.

briansull · 2018-02-22T19:16:07Z

src/jit/codegenarm64.cpp

+
+    bool     is16Byte = (node->gtSIMDSize > 8);
+    emitAttr attr     = is16Byte ? EA_16BYTE : EA_8BYTE;
+    insOpts  opt      = genGetSimdInsOpt(is16Byte, baseType);


I would like to see the signature for CodeGen::genGetSimdInsOpt(bool is16Byte, var_types elementType) changed to CodeGen::genGetSimdInsOpt(emitAttr size, var_types elementType) and an asset added at the start of that method : assert(isValidVectorDatasize(size);

All the callers to this method aleady use this unnecessary pattern to setup is16Byte
Also instead of attr we can use size, since we aren't going to have any GC flags for these instructions.

emitAttr size = is16Byte ? EA_16BYTE : EA_8BYTE;

I can do this for you on a follow up checkin.

@briansull Thanks. May be better to handle in a different checkin as this appears to be used in other places as well.

briansull · 2018-02-22T19:37:07Z

src/jit/emitarm64.cpp

@@ -427,6 +427,12 @@ void emitter::emitInsSanityCheck(instrDesc* id)
            assert(isValidImmCondFlags(emitGetInsSC(id)));
            break;

+        case IF_DR_2J: // DR_2J   ................ ......nnnnnddddd      Sd Sn


It is nice to add the instruction that use this encoding:

(sha1h)

briansull · 2018-02-22T19:43:43Z

src/jit/emitarm64.cpp

@@ -548,6 +554,7 @@ void emitter::emitInsSanityCheck(instrDesc* id)

        case IF_DV_2A: // DV_2A   .Q.......X...... ......nnnnnddddd      Vd Vn      (fabs, fcvt - vector)
        case IF_DV_2M: // DV_2M   .Q......XX...... ......nnnnnddddd      Vd Vn      (abs, neg   - vector)
+        case IF_DV_2P: // DV_2P   ................ ......nnnnnddddd      Vd Vn      (aes, sha1su1)


I would prefer adding a * to indicate that all forms of the aes instructions use this encoding:
(aes*, sha1su1)

briansull · 2018-02-22T19:47:17Z

src/jit/emitarm64.cpp

@@ -811,6 +835,9 @@ bool emitter::emitInsMayWriteToGCReg(instrDesc* id)
        case IF_DR_3C: // DR_3C   X..........mmmmm xxxsssnnnnnddddd      Rd Rn Rm ext(Rm) LSL imm(0-4)
        case IF_DR_3D: // DR_3D   X..........mmmmm cccc..nnnnnddddd      Rd Rn Rm cond
        case IF_DR_3E: // DR_3E   X........X.mmmmm ssssssnnnnnddddd      Rd Rn Rm imm(0-63)
+        case IF_DV_3F: // DV_3F   ...........mmmmm ......nnnnnddddd      Qd Sn Vm (vector) - Qd both source and dest
+        case IF_DV_3G: // DV_3G   ...........mmmmm ......nnnnnddddd      Vd Vn Vm (vector) - Vd both source and dest
+        case IF_DV_3H: // DV_3H   ...........mmmmm ......nnnnnddddd      Qd Qn Vm (vector) - Qd both source and dest


I think that DV_3F, DV_3G and DV_3H can all be collapsed to just one DV_3F

briansull · 2018-02-22T19:50:29Z

src/jit/emitarm64.cpp

+        case INS_sha1m:
+            assert(isValidVectorDatasize(size));
+            assert(isVectorRegister(reg1));
+            assert(isFloatReg(reg2));


isFloatRegister is implemented the same as a isVectorRegister.

briansull · 2018-02-22T20:00:08Z

src/jit/emitarm64.cpp

+            emitDispReg(id->idReg2(), size, true);
+            emitDispVectorReg(id->idReg3(), id->idInsOpt(), false);
+            break;
+


After collapsing (DV_3F, DV_3G and DV_3H) in order to print the correct register in the Asm Dump you will need to special case using either opcode or the binary bits of 'code'. (See IF_DR_3A for opcode or IF_DV_2O for 'code')

debayang · 2018-02-23T10:08:21Z

@briansull Made the requested changes in emitarm64.cpp.

CarolEidt · 2018-02-23T14:25:34Z

src/jit/hwintrinsicArm64.h

-        SimdUnaryOp,  // SIMD intrinsics which take one vector operand and return a vector
+        SimdBinaryOp,         // SIMD intrinsics which take two vector operands and return a vector
+        SimdUnaryOp,          // SIMD intrinsics which take one vector operand and return a vector
+        SimdBinaryOverlapOp,  // Same as SimdBinaryOp , with destination vector same as first source vector


Would it make sense to describe these as RMW (read-modify-write) Ops? That's the term used for xarch operations that overwrite their first source, and to me "Overlap" generally implies a more complex kind of operation that does a partial (re)write or something.

@CarolEidt I guess renaming them to "RMW" from "Overlap" should be OK. Will do that.

CarolEidt · 2018-02-23T14:33:01Z

src/jit/hwintrinsiclistArm64.h

@@ -23,7 +23,7 @@ HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_JSCVT     , Jscvt    )
 HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_LRCPC     , Lrcpc    )
 HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_PMULL     , Pmull    )
 HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA1      , Sha1     )
-HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2      , Sha2     )
+HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA256    , Sha256   )


I believe the same changes needs to be made to src/inc/corjitflags.h

@CarolEidt Missed it as the build passed.
I guess "pal/src/misc/jitsupport.cpp" also needs to be corrected.

Yes, I missed that. @BruceForstall - can you comment on whether this change needs to be made elsewhere?

The first parameter is the flag name, the second is the class which is controlled by the flag.

You can either:

Rename the class and leave the flag names alone (they match Linux Arm64 extension naming convention now)

Rename the class, rename the flags in JIT and rename the flags in the VM.

git grep HAS_ARM64_SHA2 src/inc/corjitflags.h: CORJIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better src/jit/hwintrinsiclistArm64.h:HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 ) src/jit/jitee.h: JIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better src/pal/src/misc/jitsupport.cpp: CPUCompileFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_HAS_ARM64_SHA2);

I think either is acceptable. The flag names should be consistent in the JIT & VM. If you update the flag names, you probably need to update the JITEEVersionIdentifier GUID too.

can you comment on whether this change needs to be made elsewhere?

Looks like @sdmaclea found all the spots.

leave the flag names alone (they match Linux Arm64 extension naming convention now)

Well, CORJIT_FLAG_HAS_ARM64_SHA512 already breaks the mold kind-of like what is suggested here.

you probably need to update the JITEEVersionIdentifier GUID too

Since there's no data binary change (just internal #define name change), there should be no need.

CarolEidt · 2018-02-23T14:48:25Z

src/jit/lsraarm64.cpp

@@ -957,6 +957,12 @@ void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree)

    switch (compiler->getHWIntrinsicInfo(intrinsicID).form)
    {
+        case HWIntrinsicInfo::Sha1HashOp:


In this method, presumably above where you are already separately handling the 2 and 3 operand cases, you need to set op2 (and op3 if present) as delay free, so that they won't be the same as the target register. Otherwise, you can't copy op1 to the target without possibly trashing another operand.
This may complicate the code a bit (though it will be potentially a bit simpler after #16517.
If none of the operands can be contained (i.e. used from memory, or an immediate that can be directly encoded), then instead of calling GetOperandInfo you would call getLocationInfo on those operands, e.g.

LocationInfoListNode* op2Info = getLocationInfo(op2); op2Info->info.isDelayFree = true; info->hasDelayFreeSrc = true;

And similar for op3 if present. If either of these can be contained, you need to handle those cases. You could possibly use CheckAndSetDelayFree from lsraxarch.cpp, moving it to lsrabuild.cpp. I hope that's not necessary as that code is actually going away and being handled another way in #16517.

@CarolEidt @sdmaclea getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?

case HWIntrinsicInfo::Sha1HashOp: assert(info->srcCount == 3); info->setInternalCandidates(this, RBM_ALLFLOAT); info->internalFloatCount = 1; if (!op2->isContained()) { LocationInfoListNode* op2Info = useList.Begin()->Next(); op2Info->info.isDelayFree = true; LocationInfoListNode* op3Info = op2Info->Next(); op3Info->info.isDelayFree = true; info->hasDelayFreeSrc = true; } break; case HWIntrinsicInfo::SimdTernaryRMWOp: assert(info->srcCount == 3); if (!op2->isContained()) { LocationInfoListNode* op2Info = useList.Begin()->Next(); op2Info->info.isDelayFree = true; LocationInfoListNode* op3Info = op2Info->Next(); op3Info->info.isDelayFree = true; info->hasDelayFreeSrc = true; } break;

getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?

Yes, my mistake. I had missed the fact that the operand info is retrieved above (n the calls to GetOperandInfo()).
However, here you are assuming that there is a one-to-one match between the LocationInfoListNodes and the operands, which is true as long as the operands are not contained. You check op2, but you should add an assert(!op3->isContained()) prior to accessing its info.

CarolEidt · 2018-02-23T14:50:56Z

src/jit/codegenarm64.cpp

+
+    getEmitter()->emitIns_R_R(INS_fmov, EA_4BYTE, tmpReg, elementReg);
+
+    if (targetReg != op1Reg)


You need to assert that neither op2Reg nor op3Reg are the same as targetReg, otherwise you will overwrite them here. See also my comments on lsraarm64.cpp - you need to set those operands as delay free so that this is ensured.

@CarolEidt @sdmaclea Will handle along with the above lsra modification which will make sure op2reg and op3reg are not same as op1Reg.
Also in this scenario - how should we ensure that tmpReg is not same as targetReg (e.g sha1p instruction below is wrong)

mov w0, #20
fmov s16, w0
mov v16.16b, v8.16b
sha1p q16, s16, v10.4s

That is handled similarly, you need to do:
info->isInternalRegDelayFree = true;

briansull

Looks Good

sdmaclea · 2018-02-28T17:08:10Z

@debayang I am confused. Did this merge? Why was it closed/deleted? What is different about #16657

debayang · 2018-02-28T17:12:18Z

@sdmaclea - I had recreated the forked branch and this got automatically closed.
So I raised a new PR (#16657) with the pending review comments @CarolEidt incorporated.

debayang changed the title ~~[ARM64] Crypto arm64 intrinsics implementation~~ [Arm64] Crypto arm64 intrinsics implementation Feb 22, 2018

sdmaclea reviewed Feb 22, 2018

View reviewed changes

debayang force-pushed the crypto_arm64_intrinsics branch from 5c3d064 to e31b5f0 Compare February 22, 2018 16:27

sdmaclea approved these changes Feb 22, 2018

View reviewed changes

debayang force-pushed the crypto_arm64_intrinsics branch from e31b5f0 to 1fa79f2 Compare February 22, 2018 17:26

briansull suggested changes Feb 22, 2018

View reviewed changes

debayang added 4 commits February 23, 2018 08:45

ARM64 Aes Crypto intrinsics implementation

e9f67f4

ARM64 SHA1 Crypto intrinsics implementation

25c3ced

ARM64 SHA256 Crypto intrinsics implementation

370684c

ARM64 Crypto intrinsics tests for AES/SHA1/SHA256

e1911dc

debayang force-pushed the crypto_arm64_intrinsics branch from 1fa79f2 to e1911dc Compare February 23, 2018 09:02

CarolEidt suggested changes Feb 23, 2018

View reviewed changes

briansull approved these changes Feb 23, 2018

View reviewed changes

debayang closed this Feb 28, 2018

debayang deleted the crypto_arm64_intrinsics branch February 28, 2018 16:55

debayang mentioned this pull request Feb 28, 2018

Crypto arm64 intrinsics #16657

Merged


		getEmitter()->emitIns_R_R(INS_fmov, EA_4BYTE, tmpReg, elementReg);

		if (targetReg != op1Reg)

[Arm64] Crypto arm64 intrinsics implementation #16499

[Arm64] Crypto arm64 intrinsics implementation #16499

Uh oh!

Conversation

debayang commented Feb 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dnfclas commented Feb 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sdmaclea commented Feb 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Feb 22, 2018

Uh oh!

sdmaclea commented Feb 22, 2018

Uh oh!

debayang commented Feb 22, 2018

Uh oh!

sdmaclea left a comment

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Feb 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

debayang commented Feb 22, 2018

Uh oh!

briansull Feb 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

debayang commented Feb 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sdmaclea Feb 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

sdmaclea commented Feb 28, 2018

debayang commented Feb 22, 2018 •

edited

Loading

dnfclas commented Feb 22, 2018 •

edited

Loading

sdmaclea commented Feb 22, 2018 •

edited

Loading

briansull Feb 22, 2018 •

edited

Loading

sdmaclea Feb 26, 2018 •

edited

Loading