-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[GISel][RISCV]Implement indirect parameter passing #95429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GISel][RISCV]Implement indirect parameter passing #95429
Conversation
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-globalisel Author: Gábor Spaits (spaits) ChangesSome targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y). RISC-V specificIn RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions. GlobalISel framework change and relation to SelectionDAGIn SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at SelectionDAG then calls the target specific Based on these observations, I think that the best way to implement indirect parameter passing is in In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a Another approachShould I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea? If this functionality does not belong to the framework, then how should I proceed? Override the Results of the current approachFor this llvm IR on rv32 target: define i64 @<!-- -->fun(i128 %x, i128 %y ) {
%a = shl i128 %x, %y
%2 = trunc i128 %a to i64
ret i64 %2
}
define i32 @<!-- -->fun_caller( ) {
%1 = call i64 @<!-- -->fun(i128 1, i128 2)
%2 = trunc i64 %1 to i32
ret i32 %2
} We get this after the IRtranslator: Function Live Ins: $x10, $x11
bb.1 (%ir-block.0):
liveins: $x10, $x11
%2:_(p0) = COPY $x10
%0:_(s128) = G_LOAD %2:_(p0) :: (load (s128), align 1)
%3:_(p0) = COPY $x11
%1:_(s128) = G_LOAD %3:_(p0) :: (load (s128), align 1)
%4:_(s128) = G_SHL %0:_, %1:_(s128)
%5:_(s64) = G_TRUNC %4:_(s128)
%6:_(s32), %7:_(s32) = G_UNMERGE_VALUES %5:_(s64)
$x10 = COPY %6:_(s32)
$x11 = COPY %7:_(s32)
PseudoRET implicit $x10, implicit $x11
Caller Frame Objects:
fi#<!-- -->0: size=128, align=8, at location [SP]
fi#<!-- -->1: size=128, align=8, at location [SP]
bb.1 (%ir-block.0):
%1:_(s128) = G_CONSTANT i128 1
%2:_(s128) = G_CONSTANT i128 2
ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
%3:_(p0) = G_FRAME_INDEX %stack.0
G_STORE %1:_(s128), %3:_(p0) :: (store (s128), align 8)
$x10 = COPY %3:_(p0)
%4:_(p0) = G_FRAME_INDEX %stack.1
G_STORE %2:_(s128), %4:_(p0) :: (store (s128), align 8)
$x11 = COPY %4:_(p0)
PseudoCALL target-flags(riscv-call) @<!-- -->fun, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x8_x9 $x18_x19 $x20_x21 $x22_x23 $x24_x25 $x26_x27>, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
%5:_(s32) = COPY $x10
%6:_(s32) = COPY $x11
%0:_(s64) = G_MERGE_VALUES %5:_(s32), %6:_(s32)
%7:_(s32) = G_TRUNC %0:_(s64)
$x10 = COPY %7:_(s32)
PseudoRET implicit $x10 Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem. Thank you for your time and patience in advance. Patch is 22.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95429.diff 4 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index 412cd0a21ad41..daa5e0c07e9b7 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -751,6 +751,7 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
const LLT NewLLT = Handler.isIncomingArgumentHandler() ? LocTy : ValTy;
const EVT OrigVT = EVT::getEVT(Args[i].Ty);
const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);
+ const LLT PointerTy = LLT::pointer(0, DL.getPointerSizeInBits(0));
// Expected to be multiple regs for a single incoming arg.
// There should be Regs.size() ArgLocs per argument.
@@ -764,19 +765,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
// If we can't directly assign the register, we need one or more
// intermediate values.
Args[i].Regs.resize(NumParts);
-
- // For each split register, create and assign a vreg that will store
- // the incoming component of the larger value. These will later be
- // merged to form the final vreg.
- for (unsigned Part = 0; Part < NumParts; ++Part)
- Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+
+ // When we have indirect parameter passing we are receiving a pointer,
+ // that points to the actual value. In that case we need a pointer.
+ if (VA.getLocInfo() == CCValAssign::Indirect &&
+ Args[i].Flags[0].isSplit()) {
+ if (Handler.isIncomingArgumentHandler())
+ Args[i].Regs[0] = MRI.createGenericVirtualRegister(PointerTy);
+ } else {
+ // For each split register, create and assign a vreg that will store
+ // the incoming component of the larger value. These will later be
+ // merged to form the final vreg.
+ for (unsigned Part = 0; Part < NumParts; ++Part)
+ Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+ }
}
assert((j + (NumParts - 1)) < ArgLocs.size() &&
"Too many regs for number of args");
// Coerce into outgoing value types before register assignment.
- if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy) {
+ if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy &&
+ VA.getLocInfo() != CCValAssign::Indirect) {
assert(Args[i].OrigRegs.size() == 1);
buildCopyToRegs(MIRBuilder, Args[i].Regs, Args[i].OrigRegs[0], OrigTy,
ValTy, extendOpFromFlags(Args[i].Flags[0]));
@@ -790,6 +800,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
CCValAssign &VA = ArgLocs[j + Idx];
const ISD::ArgFlagsTy Flags = Args[i].Flags[Part];
+ // We found an indirect parameter passing and we are at the first part of
+ // the value being passed. In this case copy the incoming pointer into a
+ // virtual register so later we can load it.
+ if (VA.getLocInfo() == CCValAssign::Indirect && Flags.isSplit()) {
+ if (Handler.isIncomingArgumentHandler())
+ Handler.assignValueToReg(ArgReg, VA.getLocReg(), VA);
+ else {
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ int FrameIdx = MFI.CreateStackObject(OrigTy.getScalarSizeInBits(),
+ Align(8), false);
+
+ auto PointerToStackReg =
+ MIRBuilder.buildFrameIndex(PointerTy, FrameIdx)
+ ->getOperand(0)
+ .getReg();
+ Handler.assignValueToAddress(Args[i].OrigRegs[Part], PointerToStackReg,
+ OrigTy, MachinePointerInfo{}, VA);
+ Handler.assignValueToReg(PointerToStackReg, VA.getLocReg(), VA);
+ }
+ break;
+ }
+
if (VA.isMemLoc() && !Flags.isByVal()) {
// Individual pieces may have been spilled to the stack and others
// passed in registers.
@@ -866,11 +898,16 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
}
}
- // Now that all pieces have been assigned, re-pack the register typed values
- // into the original value typed registers.
- if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
- // Merge the split registers into the expected larger result vregs of
- // the original call.
+ // In case of indirect parameter passing load the value referred to by
+ // the argument.
+ if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT &&
+ VA.getLocInfo() == CCValAssign::Indirect) {
+ Handler.assignValueToAddress(Args[i].OrigRegs[0], Args[i].Regs[0], OrigTy,
+ MachinePointerInfo{}, VA);
+
+ } else if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
+ // Now that all pieces have been assigned, re-pack the register typed values
+ // into the original value typed registers.
buildCopyFromRegs(MIRBuilder, Args[i].OrigRegs, Args[i].Regs, OrigTy,
LocTy, Args[i].Flags[0]);
}
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
index 2bfee45852b20..b1f381f4b30ad 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
@@ -89,12 +89,13 @@ struct RISCVOutgoingValueHandler : public CallLowering::OutgoingValueHandler {
const MachinePointerInfo &MPO,
const CCValAssign &VA) override {
MachineFunction &MF = MIRBuilder.getMF();
- uint64_t LocMemOffset = VA.getLocMemOffset();
-
+ uint64_t Offset = 0;
+ if (VA.isMemLoc())
+ Offset = VA.getLocMemOffset();
+
// TODO: Move StackAlignment to subtarget and share with FrameLowering.
- auto MMO =
- MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
- commonAlignment(Align(16), LocMemOffset));
+ auto *MMO = MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
+ commonAlignment(Align(16), Offset));
Register ExtReg = extendRegister(ValVReg, VA);
MIRBuilder.buildStore(ExtReg, Addr, *MMO);
@@ -341,10 +342,8 @@ static bool isLegalElementTypeForRVV(Type *EltTy,
// TODO: Remove IsLowerArgs argument by adding support for vectors in lowerCall.
static bool isSupportedArgumentType(Type *T, const RISCVSubtarget &Subtarget,
bool IsLowerArgs = false) {
- // TODO: Integers larger than 2*XLen are passed indirectly which is not
- // supported yet.
if (T->isIntegerTy())
- return T->getIntegerBitWidth() <= Subtarget.getXLen() * 2;
+ return true;
if (T->isHalfTy() || T->isFloatTy() || T->isDoubleTy())
return true;
if (T->isPointerTy())
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
index 1a3489521af19..92f5f6220f096 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
@@ -16,6 +16,173 @@
; Check that on RV32, i64 is passed in a pair of registers. Unlike
; the convention for varargs, this need not be an aligned pair.
+define i64 @callee_128i_in_regs(i128 %x, i128 %y ) {
+ ; RV32I-LABEL: name: callee_128i_in_regs
+ ; RV32I: bb.1 (%ir-block.0):
+ ; RV32I-NEXT: liveins: $x10, $x11
+ ; RV32I-NEXT: {{ $}}
+ ; RV32I-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+ ; RV32I-NEXT: [[LOAD:%[0-9]+]]:_(s128) = G_LOAD [[COPY]](p0) :: (load (s128), align 1)
+ ; RV32I-NEXT: [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+ ; RV32I-NEXT: [[LOAD1:%[0-9]+]]:_(s128) = G_LOAD [[COPY1]](p0) :: (load (s128), align 1)
+ ; RV32I-NEXT: [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s128)
+ ; RV32I-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+ ; RV32I-NEXT: $x10 = COPY [[UV]](s32)
+ ; RV32I-NEXT: $x11 = COPY [[UV1]](s32)
+ ; RV32I-NEXT: PseudoRET implicit $x10, implicit $x11
+ %2 = trunc i128 %x to i64
+ ret i64 %2
+}
+
+define i32 @caller_128i_in_regs( ) {
+ ; ILP32-LABEL: name: caller_128i_in_regs
+ ; ILP32: bb.1 (%ir-block.0):
+ ; ILP32-NEXT: [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+ ; ILP32-NEXT: [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+ ; ILP32-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32-NEXT: G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+ ; ILP32-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32-NEXT: G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+ ; ILP32-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32-NEXT: PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32-NEXT: PseudoRET implicit $x10
+ ;
+ ; ILP32F-LABEL: name: caller_128i_in_regs
+ ; ILP32F: bb.1 (%ir-block.0):
+ ; ILP32F-NEXT: [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+ ; ILP32F-NEXT: [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+ ; ILP32F-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32F-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32F-NEXT: G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+ ; ILP32F-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32F-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32F-NEXT: G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+ ; ILP32F-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32F-NEXT: PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32F-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32F-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32F-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32F-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32F-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32F-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32F-NEXT: PseudoRET implicit $x10
+ ;
+ ; ILP32D-LABEL: name: caller_128i_in_regs
+ ; ILP32D: bb.1 (%ir-block.0):
+ ; ILP32D-NEXT: [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+ ; ILP32D-NEXT: [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+ ; ILP32D-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32D-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32D-NEXT: G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+ ; ILP32D-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32D-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32D-NEXT: G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+ ; ILP32D-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32D-NEXT: PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32D-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32D-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32D-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32D-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32D-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32D-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32D-NEXT: PseudoRET implicit $x10
+ %1 = call i64 @callee_128i_in_regs(i128 1, i128 2)
+ %2 = trunc i64 %1 to i32
+ ret i32 %2
+}
+
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+ ; RV32I-LABEL: name: callee_256i_in_regs
+ ; RV32I: bb.1 (%ir-block.0):
+ ; RV32I-NEXT: liveins: $x10, $x11
+ ; RV32I-NEXT: {{ $}}
+ ; RV32I-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+ ; RV32I-NEXT: [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+ ; RV32I-NEXT: [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+ ; RV32I-NEXT: [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+ ; RV32I-NEXT: [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+ ; RV32I-NEXT: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+ ; RV32I-NEXT: $x10 = COPY [[UV]](s32)
+ ; RV32I-NEXT: $x11 = COPY [[UV1]](s32)
+ ; RV32I-NEXT: PseudoRET implicit $x10, implicit $x11
+ %2 = trunc i256 %x to i64
+ ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+ ; ILP32-LABEL: name: caller_256i_in_regs
+ ; ILP32: bb.1 (%ir-block.0):
+ ; ILP32-NEXT: [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+ ; ILP32-NEXT: [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+ ; ILP32-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32-NEXT: G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+ ; ILP32-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32-NEXT: G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+ ; ILP32-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32-NEXT: PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32-NEXT: PseudoRET implicit $x10
+ ;
+ ; ILP32F-LABEL: name: caller_256i_in_regs
+ ; ILP32F: bb.1 (%ir-block.0):
+ ; ILP32F-NEXT: [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+ ; ILP32F-NEXT: [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+ ; ILP32F-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32F-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32F-NEXT: G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+ ; ILP32F-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32F-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32F-NEXT: G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+ ; ILP32F-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32F-NEXT: PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32F-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32F-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32F-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32F-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32F-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32F-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32F-NEXT: PseudoRET implicit $x10
+ ;
+ ; ILP32D-LABEL: name: caller_256i_in_regs
+ ; ILP32D: bb.1 (%ir-block.0):
+ ; ILP32D-NEXT: [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+ ; ILP32D-NEXT: [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+ ; ILP32D-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32D-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; ILP32D-NEXT: G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+ ; ILP32D-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; ILP32D-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; ILP32D-NEXT: G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+ ; ILP32D-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; ILP32D-NEXT: PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+ ; ILP32D-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; ILP32D-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+ ; ILP32D-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+ ; ILP32D-NEXT: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; ILP32D-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+ ; ILP32D-NEXT: $x10 = COPY [[TRUNC]](s32)
+ ; ILP32D-NEXT: PseudoRET implicit $x10
+ %1 = call i64 @callee_256i_in_regs(i256 1, i256 2)
+ %2 = trunc i64 %1 to i32
+ ret i32 %2
+}
+
define i32 @callee_i64_in_regs(i32 %a, i64 %b) nounwind {
; RV32I-LABEL: name: callee_i64_in_regs
; RV32I: bb.1 (%ir-block.0):
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
index b175b8d92e6c9..0467bbe3d41bb 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
@@ -16,6 +16,85 @@
; Check that on RV64, i128 is passed in a pair of registers. Unlike
; the convention for varargs, this need not be an aligned pair.
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+ ; RV64I-LABEL: name: callee_256i_in_regs
+ ; RV64I: bb.1 (%ir-block.0):
+ ; RV64I-NEXT: liveins: $x10, $x11
+ ; RV64I-NEXT: {{ $}}
+ ; RV64I-NEXT: [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+ ; RV64I-NEXT: [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+ ; RV64I-NEXT: [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+ ; RV64I-NEXT: [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+ ; RV64I-NEXT: [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+ ; RV64I-NEXT: $x10 = COPY [[TRUNC]](s64)
+ ; RV64I-NEXT: PseudoRET implicit $x10
+ %2 = trunc i256 %x to i64
+ ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+ ; LP64-LABEL: name: caller_256i_in_regs
+ ; LP64: bb.1 (%ir-block.0):
+ ; LP64-NEXT: [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+ ; LP64-NEXT: [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+ ; LP64-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; LP64-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; LP64-NEXT: G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+ ; LP64-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; LP64-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+ ; LP64-NEXT: G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+ ; LP64-NEXT: $x11 = COPY [[FRAME_INDEX1]](p0)
+ ; LP64-NEXT: PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10
+ ; LP64-NEXT: ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+ ; LP64-NEXT: [[COPY:%[0-9]+]]:_(s64) = COPY $x10
+ ; LP64-NEXT: [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
+ ; LP64-NEXT: [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[TRUNC]](s32)
+ ; LP64-NEXT: $x10 = COPY [[ANYEXT]](s64)
+ ; LP64-NEXT: PseudoRET implicit $x10
+ ;
+ ; LP64F-LABEL: name: caller_256i_in_regs
+ ; LP64F: bb.1 (%ir-block.0):
+ ; LP64F-NEXT: [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+ ; LP64F-NEXT: [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+ ; LP64F-NEXT: ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+ ; LP64F-NEXT: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+ ; LP64F-NEXT: G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+ ; LP64F-NEXT: $x10 = COPY [[FRAME_INDEX]](p0)
+ ; LP64F-NEXT: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_IN...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN |
I think we do. |
The IR has its own ABI, independent of whatever any particular frontend does. You still should handle any IR function signature regardless of what clang does |
In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator: define dso_local noundef i128 @fun(__int128, int)(i128 noundef %i, i32 noundef %a) {
entry:
%i.addr = alloca i128, align 16
%a.addr = alloca i32, align 4
%r = alloca i128, align 16
store i128 %i, ptr %i.addr, align 16
store i32 %a, ptr %a.addr, align 4
%0 = load i128, ptr %i.addr, align 16
%1 = load i32, ptr %a.addr, align 4
%sh_prom = zext i32 %1 to i128
%shl = shl i128 %0, %sh_prom
store i128 %shl, ptr %r, align 16
%2 = load i128, ptr %r, align 16
ret i128 %2
} In case of large struct, yes, the frontend wants to pass them as a reference. Code: https://godbolt.org/z/4M1cEa5hh define dso_local i128 @fun(LargeStruct, int)(%struct.LargeStruct* byval(%struct.LargeStruct) align 8 %0, i32 %a) {
entry:
%i = alloca %struct.LargeStruct, align 16
%a.addr = alloca i32, align 4
%r = alloca i128, align 16
%1 = bitcast %struct.LargeStruct* %i to i8*
%2 = bitcast %struct.LargeStruct* %0 to i8*
call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 16 %1, i8* align 8 %2, i32 80, i1 false)
store i32 %a, i32* %a.addr, align 4
%a1 = getelementptr inbounds %struct.LargeStruct, %struct.LargeStruct* %i, i32 0, i32 0
%3 = load i128, i128* %a1, align 16
%4 = load i32, i32* %a.addr, align 4
%sh_prom = zext i32 %4 to i128
%shl = shl i128 %3, %sh_prom
store i128 %shl, i128* %r, align 16
%5 = load i128, i128* %r, align 16
ret i128 %5
} |
I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32. |
The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the |
And they're trying to use GlobalISel too? |
No they don't. But shouldn't we convert the full back-end to globalisel? And if we want to do that we also need to pass 128 bit values I think. I know, that a front-end fix would solve it for GlobalISel, but then that front end fix should be introduced to other front-ends like the Rust front-end, the Zig front-end and so on. So I think fixing this in the back-end is better. But if you think, that this should really go to the front-end I would be glad to implement that too. |
Looks like the clang frontend intentionally defers indirect scalar handling to the backend. I thought maybe -fforce-enable-int128 had just been overlooked. I didn't think there were any scalars larger than 2*XLen for rv32 by default. But I forgot about fp128 and Large |
I wrote this assuming that lowering of large integer types is intentionally left to the backend, which doesn't appear so. |
Looks like we intentionally defer the handling for scalars to the backend. Except for _BitInt |
Could you change in the summary: |
Updated the description. I have also finished that sentence :) . |
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
(I'd expect someone from RISC-V team to take a look too.) |
@michaelmaitland @topperc @preames Could you please also take a look at (or another look at) this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
Outdated
Show resolved
Hide resolved
There is already an approve on this PR from the GlobalISel team (@s-barannikov ) and from the RISCV team (@michaelmaitland ). If there is no new review on this PR I will merge it in 12 hours. |
FWIW I am not |
Sorry for assuming. I thought that you are a GlobalISel team member, because you could help me with lots of stuff regarding it so you must understand the framework and its working very well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1). This commit implements this indirect parameter passing in GlobalISel. --------- Co-authored-by: Gabor Spaits <Gabor.Spaits@hightec-rt.com>
Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y).
RISC-V specific
In RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions.
GlobalISel framework change and relation to SelectionDAG
In SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at
TLI->getRegisterTypeForCallingConv
) in GlobalISel, very similar logic can be found inCallLowering::determineAssignments
.SelectionDAG then calls the target specific
TargetLowering::LowerFormalArguments
callbacks. Then in this callback based on the target dependent code calls generic SelectionDAG functions to complete the assignment handling. This part of the code is responsibel for making the indirect parameter passing in RISC-V (In RISCVISelLowering.cpp RISCVTargetLowering::LowerFormalArguments around line 19410). In a GlobalISel backend the generic mir building is handled byCallLowering::handleAssignments
. This is not a callback. This function is usually called byCallLowering::lowerCall
orCallLowering::lowerFormalArguments
.Based on these observations, I think that the best way to implement indirect parameter passing is in
CallLowering::handleAssignments
since it kind of generalizes the relevant part of the logic found inTargetLowering::LowerFormalArguments
so all the targets can have indirect parameter passing that useCCValAssign
isCCValAssign::Indirect
in the calling convention. Currently I put the code inCallLowering::handleAssignments
and I think it is general enough for all the targets to use.In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a
CCValAssign
withCCValAssign::Indirect
.Another approach
Should I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a
CCValAssign
isCCValAssign::Indirect
.What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea?
If this functionality does not belong to the framework, then how should I proceed? Override the
handleAssignments
functio for RISCVCallLowering? This would create code duplication, since most of the function is good for RISC-V but it misses the indirect passing for large scalars required by the standard.Results of the current approach
For this llvm IR on rv32 target:
We get this after the IRtranslator:
Callee
Caller
Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem.
Thank you for your time and patience in advance.