-
Notifications
You must be signed in to change notification settings - Fork 5k
ARM64-SVE: refactor lsra buildHWIntrinsic #107459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@kunalspathak - Still WIP. For now ignore anything outside of |
This PR is ready now. Requires #107084, #107180 and a workaround for #107537 in order for all the hwintrinsic tests to pass. Apologies, this is a large change to review, and the github diff is confused about functions I haven't touched. Probably best starting a review from the new version of I recommend this is not merged until after we've gone past the Net9 RC2 deadline. @dotnet/arm64-contrib I'll do a spmidiff next. |
Got some asmdiffs for the SVE tests. Spotted two differences, and one of them is due to issues in HEAD. I'll raise PRs to fix these (plus one for LoadAndInsertScalar), and then rebase this once merged. I'd like there to be no asmdiff differences in this PR
|
Change-Id: Id60f884b7281a9fae85a948a361511656c91357e
Rebased on top of the other fixes. As mentioned in #107786, fixed it so that |
No asm diffs now:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments.
|
||
if (resultOpNum != 0) | ||
{ | ||
embeddedDelayFreeOp = embeddedOpNode->Op(resultOpNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, can you confirm if we just overwrite the embeddedDelayFreeOp
that was passed as the parameter to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hhm, can you confirm if we just overwrite the
embeddedDelayFreeOp
that was passed as the parameter to this function?
Yes. the value being passed into BuildEmbeddedOperandUses()
is the address of embeddedDelayFreeOp
. That value can be changed within BuildEmbeddedOperandUses()
and it won't effect any variables outside BuildEmbeddedOperandUses()
that were also same to the same value.
I think that's what you were asking for.
All fixed up as suggested. |
Looks like the reason for this is that in HEAD, assert(isRMW);
assert(intrin.op1->OperIs(GT_FIELD_LIST));
GenTreeFieldList* op1 = intrin.op1->AsFieldList();
assert(compiler->info.compNeedsConsecutiveRegisters);
for (GenTreeFieldList::Use& use : op1->Uses())
{
BuildDelayFreeUses(use.GetNode(), intrinsicTree);
srcCount++;
} My patch generates srcCount += BuildConsecutiveRegistersForUse(operand, delayFreeOp);
AIUI, |
With this patch there are no diffs again. It would be shame to put something so specific in this patch.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add BuildConditionalSelectHWIntrinsic() * Add GetRMWOp() * Use GetDelayFreeOp() in BuildConditionalSelectWithEmbeddedOp() * simplify op2 handling * Add getVectorAddrOperand() * Add getConsecutiveRegistersOperand * Add BuildOperand() * Use BuildOperand for op1 * Add buildHWIntrinsicImmediate * Add getOperandCandidates() * Remove BuildOperand() * remove delayFreeMultiple * Fixes from other PRs to be removed * Fix formatting * Use BuildHWIntrinsicImmediate for conditional select * Remove IsRMW * Replace BuildConditionalSelectWithEmbeddedOp() with BuildEmbeddedOperandUses() * Revert "Fixes from other PRs to be removed" * Move functions * Move functions * Remove failing unary tests * Fix opNum type * Revert "Remove failing unary tests" * Remove cases from getDelayFreeOperand that are handled by default * review cleanups * Simplify masks in getOperandCandidates() * Remove IsMaskedOperation() * Check for optional embedded masks in getDelayFreeOperand * Only call BuildDelayFreeUses when register types match * Assert only on Arm64 * Comment fixups
Fixes #104842
The logic for hwintrisics has become convoluted. Refactor it, for both SVE and AdvSimd.
Add functions to get the operand (if any) for each requirement - delay slot, consecutive registers, address, etc.
Then use a simple for loop to iterate through each operand and build depending on which requirements match for that operand.
Tested by using stress_test.py on the entire HardwareIntrinsics_Arm set.