Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64] Crypto arm64 intrinsics implementation #16499

Closed
wants to merge 4 commits into from

Conversation

debayang
Copy link

@debayang debayang commented Feb 22, 2018

@dotnet/arm64-contrib

Arm64 intrinsics implementation for Aes, SHA1 and SHA256

API proposal:ARM64 Crypto HW Intrinsics

@dnfclas
Copy link

dnfclas commented Feb 22, 2018

CLA assistant check
All CLA requirements met.

@debayang debayang changed the title [ARM64] Crypto arm64 intrinsics implementation [Arm64] Crypto arm64 intrinsics implementation Feb 22, 2018
@sdmaclea
Copy link

@CarolEidt PTAL
@dotnet/jit-contrib @dotnet/arm64-contrib FYI

using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.Arm.Arm64;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@debayang Building these tests will fail until API is added to corefx.

If we want to merge this, it will need #if. Something like I did here https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/HardwareIntrinsics/Arm64/Simd.cs#L6

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sdmaclea .
Fixed

@sdmaclea
Copy link

@briansull PTAL at the emitters

@sdmaclea
Copy link

@debayang The Ubuntu formatting job is failing, this is due to formatting issues. If you click on the details link, you can see a patch to apply to fix the formatting issues. You can also run install and run clang-format on source files in the src jit tree.

@debayang debayang force-pushed the crypto_arm64_intrinsics branch from 5c3d064 to e31b5f0 Compare February 22, 2018 16:27
@debayang
Copy link
Author

@sdmaclea Corrected formatting

Copy link

@sdmaclea sdmaclea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sdmaclea
Copy link

sdmaclea commented Feb 22, 2018

@debayang Looks like the test you added failed on Windows x86. Probably #if is not correct yet.

@debayang debayang force-pushed the crypto_arm64_intrinsics branch from e31b5f0 to 1fa79f2 Compare February 22, 2018 17:26
@debayang
Copy link
Author

@sdmaclea Fixed test case.


bool is16Byte = (node->gtSIMDSize > 8);
emitAttr attr = is16Byte ? EA_16BYTE : EA_8BYTE;
insOpts opt = genGetSimdInsOpt(is16Byte, baseType);
Copy link

@briansull briansull Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see the signature for CodeGen::genGetSimdInsOpt(bool is16Byte, var_types elementType) changed to CodeGen::genGetSimdInsOpt(emitAttr size, var_types elementType) and an asset added at the start of that method : assert(isValidVectorDatasize(size);

All the callers to this method aleady use this unnecessary pattern to setup is16Byte
Also instead of attr we can use size, since we aren't going to have any GC flags for these instructions.

emitAttr size = is16Byte ? EA_16BYTE : EA_8BYTE;

I can do this for you on a follow up checkin.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansull Thanks. May be better to handle in a different checkin as this appears to be used in other places as well.

@@ -427,6 +427,12 @@ void emitter::emitInsSanityCheck(instrDesc* id)
assert(isValidImmCondFlags(emitGetInsSC(id)));
break;

case IF_DR_2J: // DR_2J ................ ......nnnnnddddd Sd Sn

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is nice to add the instruction that use this encoding:

(sha1h)

@@ -548,6 +554,7 @@ void emitter::emitInsSanityCheck(instrDesc* id)

case IF_DV_2A: // DV_2A .Q.......X...... ......nnnnnddddd Vd Vn (fabs, fcvt - vector)
case IF_DV_2M: // DV_2M .Q......XX...... ......nnnnnddddd Vd Vn (abs, neg - vector)
case IF_DV_2P: // DV_2P ................ ......nnnnnddddd Vd Vn (aes, sha1su1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer adding a * to indicate that all forms of the aes instructions use this encoding:
(aes*, sha1su1)

@@ -811,6 +835,9 @@ bool emitter::emitInsMayWriteToGCReg(instrDesc* id)
case IF_DR_3C: // DR_3C X..........mmmmm xxxsssnnnnnddddd Rd Rn Rm ext(Rm) LSL imm(0-4)
case IF_DR_3D: // DR_3D X..........mmmmm cccc..nnnnnddddd Rd Rn Rm cond
case IF_DR_3E: // DR_3E X........X.mmmmm ssssssnnnnnddddd Rd Rn Rm imm(0-63)
case IF_DV_3F: // DV_3F ...........mmmmm ......nnnnnddddd Qd Sn Vm (vector) - Qd both source and dest
case IF_DV_3G: // DV_3G ...........mmmmm ......nnnnnddddd Vd Vn Vm (vector) - Vd both source and dest
case IF_DV_3H: // DV_3H ...........mmmmm ......nnnnnddddd Qd Qn Vm (vector) - Qd both source and dest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that DV_3F, DV_3G and DV_3H can all be collapsed to just one DV_3F

case INS_sha1m:
assert(isValidVectorDatasize(size));
assert(isVectorRegister(reg1));
assert(isFloatReg(reg2));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFloatRegister is implemented the same as a isVectorRegister.

emitDispReg(id->idReg2(), size, true);
emitDispVectorReg(id->idReg3(), id->idInsOpt(), false);
break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After collapsing (DV_3F, DV_3G and DV_3H) in order to print the correct register in the Asm Dump you will need to special case using either opcode or the binary bits of 'code'. (See IF_DR_3A for opcode or IF_DV_2O for 'code')

@debayang debayang force-pushed the crypto_arm64_intrinsics branch from 1fa79f2 to e1911dc Compare February 23, 2018 09:02
@debayang
Copy link
Author

@briansull Made the requested changes in emitarm64.cpp.

SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector
SimdBinaryOp, // SIMD intrinsics which take two vector operands and return a vector
SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector
SimdBinaryOverlapOp, // Same as SimdBinaryOp , with destination vector same as first source vector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to describe these as RMW (read-modify-write) Ops? That's the term used for xarch operations that overwrite their first source, and to me "Overlap" generally implies a more complex kind of operation that does a partial (re)write or something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt I guess renaming them to "RMW" from "Overlap" should be OK. Will do that.

@@ -23,7 +23,7 @@ HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_JSCVT , Jscvt )
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_LRCPC , Lrcpc )
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_PMULL , Pmull )
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA1 , Sha1 )
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 )
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA256 , Sha256 )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the same changes needs to be made to src/inc/corjitflags.h

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Missed it as the build passed.
I guess "pal/src/misc/jitsupport.cpp" also needs to be corrected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I missed that. @BruceForstall - can you comment on whether this change needs to be made elsewhere?

Copy link

@sdmaclea sdmaclea Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first parameter is the flag name, the second is the class which is controlled by the flag.

You can either:

  • Rename the class and leave the flag names alone (they match Linux Arm64 extension naming convention now)
  • Rename the class, rename the flags in JIT and rename the flags in the VM.
git grep HAS_ARM64_SHA2
src/inc/corjitflags.h:        CORJIT_FLAG_HAS_ARM64_SHA2          = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/jit/hwintrinsiclistArm64.h:HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2      , Sha2     )
src/jit/jitee.h:        JIT_FLAG_HAS_ARM64_SHA2          = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/pal/src/misc/jitsupport.cpp:        CPUCompileFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_HAS_ARM64_SHA2);

I think either is acceptable. The flag names should be consistent in the JIT & VM. If you update the flag names, you probably need to update the JITEEVersionIdentifier GUID too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment on whether this change needs to be made elsewhere?

Looks like @sdmaclea found all the spots.

leave the flag names alone (they match Linux Arm64 extension naming convention now)

Well, CORJIT_FLAG_HAS_ARM64_SHA512 already breaks the mold kind-of like what is suggested here.

you probably need to update the JITEEVersionIdentifier GUID too

Since there's no data binary change (just internal #define name change), there should be no need.

@@ -957,6 +957,12 @@ void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree)

switch (compiler->getHWIntrinsicInfo(intrinsicID).form)
{
case HWIntrinsicInfo::Sha1HashOp:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this method, presumably above where you are already separately handling the 2 and 3 operand cases, you need to set op2 (and op3 if present) as delay free, so that they won't be the same as the target register. Otherwise, you can't copy op1 to the target without possibly trashing another operand.
This may complicate the code a bit (though it will be potentially a bit simpler after #16517.
If none of the operands can be contained (i.e. used from memory, or an immediate that can be directly encoded), then instead of calling GetOperandInfo you would call getLocationInfo on those operands, e.g.

LocationInfoListNode* op2Info = getLocationInfo(op2);
op2Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;

And similar for op3 if present. If either of these can be contained, you need to handle those cases. You could possibly use CheckAndSetDelayFree from lsraxarch.cpp, moving it to lsrabuild.cpp. I hope that's not necessary as that code is actually going away and being handled another way in #16517.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt @sdmaclea getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?

   case HWIntrinsicInfo::Sha1HashOp:
        assert(info->srcCount == 3);
        info->setInternalCandidates(this, RBM_ALLFLOAT);
        info->internalFloatCount = 1;
        if (!op2->isContained())
        {
            LocationInfoListNode* op2Info = useList.Begin()->Next();
            op2Info->info.isDelayFree     = true;
            LocationInfoListNode* op3Info = op2Info->Next();
            op3Info->info.isDelayFree     = true;
            info->hasDelayFreeSrc         = true;
        }
        break;
    case HWIntrinsicInfo::SimdTernaryRMWOp:
        assert(info->srcCount == 3);
        if (!op2->isContained())
        {
            LocationInfoListNode* op2Info = useList.Begin()->Next();
            op2Info->info.isDelayFree     = true;
            LocationInfoListNode* op3Info = op2Info->Next();
            op3Info->info.isDelayFree     = true;
            info->hasDelayFreeSrc         = true;
        }
        break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?

Yes, my mistake. I had missed the fact that the operand info is retrieved above (n the calls to GetOperandInfo()).
However, here you are assuming that there is a one-to-one match between the LocationInfoListNodes and the operands, which is true as long as the operands are not contained. You check op2, but you should add an assert(!op3->isContained()) prior to accessing its info.


getEmitter()->emitIns_R_R(INS_fmov, EA_4BYTE, tmpReg, elementReg);

if (targetReg != op1Reg)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to assert that neither op2Reg nor op3Reg are the same as targetReg, otherwise you will overwrite them here. See also my comments on lsraarm64.cpp - you need to set those operands as delay free so that this is ensured.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt @sdmaclea Will handle along with the above lsra modification which will make sure op2reg and op3reg are not same as op1Reg.
Also in this scenario - how should we ensure that tmpReg is not same as targetReg (e.g sha1p instruction below is wrong)

mov w0, #20
fmov s16, w0
mov v16.16b, v8.16b
sha1p q16, s16, v10.4s

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is handled similarly, you need to do:
info->isInternalRegDelayFree = true;

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@debayang debayang closed this Feb 28, 2018
@debayang debayang deleted the crypto_arm64_intrinsics branch February 28, 2018 16:55
@sdmaclea
Copy link

@debayang I am confused. Did this merge? Why was it closed/deleted? What is different about #16657

@debayang
Copy link
Author

@sdmaclea - I had recreated the forked branch and this got automatically closed.
So I raised a new PR (#16657) with the pending review comments @CarolEidt incorporated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants