-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Arm64] Crypto arm64 intrinsics implementation #16499
Conversation
@CarolEidt PTAL |
using System.Runtime.InteropServices; | ||
using System.Runtime.Intrinsics; | ||
using System.Runtime.Intrinsics.Arm.Arm64; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@debayang Building these tests will fail until API is added to corefx.
If we want to merge this, it will need #if
. Something like I did here https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/HardwareIntrinsics/Arm64/Simd.cs#L6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sdmaclea .
Fixed
@briansull PTAL at the emitters |
@debayang The Ubuntu formatting job is failing, this is due to formatting issues. If you click on the details link, you can see a patch to apply to fix the formatting issues. You can also run install and run clang-format on source files in the src jit tree. |
5c3d064
to
e31b5f0
Compare
@sdmaclea Corrected formatting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@debayang Looks like the test you added failed on Windows x86. Probably #if is not correct yet. |
e31b5f0
to
1fa79f2
Compare
@sdmaclea Fixed test case. |
|
||
bool is16Byte = (node->gtSIMDSize > 8); | ||
emitAttr attr = is16Byte ? EA_16BYTE : EA_8BYTE; | ||
insOpts opt = genGetSimdInsOpt(is16Byte, baseType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see the signature for CodeGen::genGetSimdInsOpt(bool is16Byte, var_types elementType)
changed to CodeGen::genGetSimdInsOpt(emitAttr size, var_types elementType)
and an asset added at the start of that method : assert(isValidVectorDatasize(size);
All the callers to this method aleady use this unnecessary pattern to setup is16Byte
Also instead of attr
we can use size
, since we aren't going to have any GC flags for these instructions.
emitAttr size = is16Byte ? EA_16BYTE : EA_8BYTE;
I can do this for you on a follow up checkin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briansull Thanks. May be better to handle in a different checkin as this appears to be used in other places as well.
src/jit/emitarm64.cpp
Outdated
@@ -427,6 +427,12 @@ void emitter::emitInsSanityCheck(instrDesc* id) | |||
assert(isValidImmCondFlags(emitGetInsSC(id))); | |||
break; | |||
|
|||
case IF_DR_2J: // DR_2J ................ ......nnnnnddddd Sd Sn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is nice to add the instruction that use this encoding:
(sha1h)
src/jit/emitarm64.cpp
Outdated
@@ -548,6 +554,7 @@ void emitter::emitInsSanityCheck(instrDesc* id) | |||
|
|||
case IF_DV_2A: // DV_2A .Q.......X...... ......nnnnnddddd Vd Vn (fabs, fcvt - vector) | |||
case IF_DV_2M: // DV_2M .Q......XX...... ......nnnnnddddd Vd Vn (abs, neg - vector) | |||
case IF_DV_2P: // DV_2P ................ ......nnnnnddddd Vd Vn (aes, sha1su1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer adding a * to indicate that all forms of the aes instructions use this encoding:
(aes*, sha1su1)
src/jit/emitarm64.cpp
Outdated
@@ -811,6 +835,9 @@ bool emitter::emitInsMayWriteToGCReg(instrDesc* id) | |||
case IF_DR_3C: // DR_3C X..........mmmmm xxxsssnnnnnddddd Rd Rn Rm ext(Rm) LSL imm(0-4) | |||
case IF_DR_3D: // DR_3D X..........mmmmm cccc..nnnnnddddd Rd Rn Rm cond | |||
case IF_DR_3E: // DR_3E X........X.mmmmm ssssssnnnnnddddd Rd Rn Rm imm(0-63) | |||
case IF_DV_3F: // DV_3F ...........mmmmm ......nnnnnddddd Qd Sn Vm (vector) - Qd both source and dest | |||
case IF_DV_3G: // DV_3G ...........mmmmm ......nnnnnddddd Vd Vn Vm (vector) - Vd both source and dest | |||
case IF_DV_3H: // DV_3H ...........mmmmm ......nnnnnddddd Qd Qn Vm (vector) - Qd both source and dest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that DV_3F, DV_3G and DV_3H can all be collapsed to just one DV_3F
src/jit/emitarm64.cpp
Outdated
case INS_sha1m: | ||
assert(isValidVectorDatasize(size)); | ||
assert(isVectorRegister(reg1)); | ||
assert(isFloatReg(reg2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFloatRegister is implemented the same as a isVectorRegister.
src/jit/emitarm64.cpp
Outdated
emitDispReg(id->idReg2(), size, true); | ||
emitDispVectorReg(id->idReg3(), id->idInsOpt(), false); | ||
break; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After collapsing (DV_3F, DV_3G and DV_3H) in order to print the correct register in the Asm Dump you will need to special case using either opcode or the binary bits of 'code'. (See IF_DR_3A for opcode or IF_DV_2O for 'code')
1fa79f2
to
e1911dc
Compare
@briansull Made the requested changes in emitarm64.cpp. |
SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector | ||
SimdBinaryOp, // SIMD intrinsics which take two vector operands and return a vector | ||
SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector | ||
SimdBinaryOverlapOp, // Same as SimdBinaryOp , with destination vector same as first source vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to describe these as RMW (read-modify-write) Ops? That's the term used for xarch operations that overwrite their first source, and to me "Overlap" generally implies a more complex kind of operation that does a partial (re)write or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt I guess renaming them to "RMW" from "Overlap" should be OK. Will do that.
@@ -23,7 +23,7 @@ HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_JSCVT , Jscvt ) | |||
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_LRCPC , Lrcpc ) | |||
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_PMULL , Pmull ) | |||
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA1 , Sha1 ) | |||
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 ) | |||
HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA256 , Sha256 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the same changes needs to be made to src/inc/corjitflags.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Missed it as the build passed.
I guess "pal/src/misc/jitsupport.cpp" also needs to be corrected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I missed that. @BruceForstall - can you comment on whether this change needs to be made elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first parameter is the flag name, the second is the class which is controlled by the flag.
You can either:
- Rename the class and leave the flag names alone (they match Linux Arm64 extension naming convention now)
- Rename the class, rename the flags in JIT and rename the flags in the VM.
git grep HAS_ARM64_SHA2
src/inc/corjitflags.h: CORJIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/jit/hwintrinsiclistArm64.h:HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 )
src/jit/jitee.h: JIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/pal/src/misc/jitsupport.cpp: CPUCompileFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_HAS_ARM64_SHA2);
I think either is acceptable. The flag names should be consistent in the JIT & VM. If you update the flag names, you probably need to update the JITEEVersionIdentifier
GUID too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you comment on whether this change needs to be made elsewhere?
Looks like @sdmaclea found all the spots.
leave the flag names alone (they match Linux Arm64 extension naming convention now)
Well, CORJIT_FLAG_HAS_ARM64_SHA512
already breaks the mold kind-of like what is suggested here.
you probably need to update the JITEEVersionIdentifier GUID too
Since there's no data binary change (just internal #define
name change), there should be no need.
@@ -957,6 +957,12 @@ void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) | |||
|
|||
switch (compiler->getHWIntrinsicInfo(intrinsicID).form) | |||
{ | |||
case HWIntrinsicInfo::Sha1HashOp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this method, presumably above where you are already separately handling the 2 and 3 operand cases, you need to set op2 (and op3 if present) as delay free, so that they won't be the same as the target register. Otherwise, you can't copy op1 to the target without possibly trashing another operand.
This may complicate the code a bit (though it will be potentially a bit simpler after #16517.
If none of the operands can be contained (i.e. used from memory, or an immediate that can be directly encoded), then instead of calling GetOperandInfo
you would call getLocationInfo
on those operands, e.g.
LocationInfoListNode* op2Info = getLocationInfo(op2);
op2Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
And similar for op3 if present. If either of these can be contained, you need to handle those cases. You could possibly use CheckAndSetDelayFree
from lsraxarch.cpp, moving it to lsrabuild.cpp. I hope that's not necessary as that code is actually going away and being handled another way in #16517.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt @sdmaclea getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?
case HWIntrinsicInfo::Sha1HashOp:
assert(info->srcCount == 3);
info->setInternalCandidates(this, RBM_ALLFLOAT);
info->internalFloatCount = 1;
if (!op2->isContained())
{
LocationInfoListNode* op2Info = useList.Begin()->Next();
op2Info->info.isDelayFree = true;
LocationInfoListNode* op3Info = op2Info->Next();
op3Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
}
break;
case HWIntrinsicInfo::SimdTernaryRMWOp:
assert(info->srcCount == 3);
if (!op2->isContained())
{
LocationInfoListNode* op2Info = useList.Begin()->Next();
op2Info->info.isDelayFree = true;
LocationInfoListNode* op3Info = op2Info->Next();
op3Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
}
break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?
Yes, my mistake. I had missed the fact that the operand info is retrieved above (n the calls to GetOperandInfo()
).
However, here you are assuming that there is a one-to-one match between the LocationInfoListNode
s and the operands, which is true as long as the operands are not contained. You check op2
, but you should add an assert(!op3->isContained())
prior to accessing its info.
|
||
getEmitter()->emitIns_R_R(INS_fmov, EA_4BYTE, tmpReg, elementReg); | ||
|
||
if (targetReg != op1Reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to assert that neither op2Reg
nor op3Reg
are the same as targetReg
, otherwise you will overwrite them here. See also my comments on lsraarm64.cpp - you need to set those operands as delay free so that this is ensured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt @sdmaclea Will handle along with the above lsra modification which will make sure op2reg and op3reg are not same as op1Reg.
Also in this scenario - how should we ensure that tmpReg is not same as targetReg (e.g sha1p instruction below is wrong)
mov w0, #20
fmov s16, w0
mov v16.16b, v8.16b
sha1p q16, s16, v10.4s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is handled similarly, you need to do:
info->isInternalRegDelayFree = true;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
@sdmaclea - I had recreated the forked branch and this got automatically closed. |
@dotnet/arm64-contrib
Arm64 intrinsics implementation for Aes, SHA1 and SHA256
API proposal:ARM64 Crypto HW Intrinsics