[GISel][RISCV]Implement indirect parameter passing #95429

spaits · 2024-06-13T16:25:17Z

Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y).

RISC-V specific

In RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions.

GlobalISel framework change and relation to SelectionDAG

In SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at TLI->getRegisterTypeForCallingConv) in GlobalISel, very similar logic can be found in CallLowering::determineAssignments.

SelectionDAG then calls the target specific TargetLowering::LowerFormalArguments callbacks. Then in this callback based on the target dependent code calls generic SelectionDAG functions to complete the assignment handling. This part of the code is responsibel for making the indirect parameter passing in RISC-V (In RISCVISelLowering.cpp RISCVTargetLowering::LowerFormalArguments around line 19410). In a GlobalISel backend the generic mir building is handled by CallLowering::handleAssignments. This is not a callback. This function is usually called by CallLowering::lowerCall or CallLowering::lowerFormalArguments.

Based on these observations, I think that the best way to implement indirect parameter passing is in CallLowering::handleAssignments since it kind of generalizes the relevant part of the logic found in TargetLowering::LowerFormalArguments so all the targets can have indirect parameter passing that use CCValAssign is CCValAssign::Indirect in the calling convention. Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to use.

In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a CCValAssign with CCValAssign::Indirect.

Another approach

Should I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a CCValAssign is CCValAssign::Indirect.

What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea?

If this functionality does not belong to the framework, then how should I proceed? Override the handleAssignments functio for RISCVCallLowering? This would create code duplication, since most of the function is good for RISC-V but it misses the indirect passing for large scalars required by the standard.

Results of the current approach

For this llvm IR on rv32 target:

define i64 @fun(i128 %x, i128 %y ) {
  %a = shl i128 %x, %y
  %2 = trunc i128 %a to i64
  ret i64 %2
}

define i32 @fun_caller( ) {
  %1 = call i64 @fun(i128 1, i128 2)
  %2 = trunc i64 %1 to i32
  ret i32 %2
}

We get this after the IRtranslator:
Callee

Function Live Ins: $x10, $x11

bb.1 (%ir-block.0):
  liveins: $x10, $x11
  %2:_(p0) = COPY $x10
  %0:_(s128) = G_LOAD %2:_(p0) :: (load (s128), align 1)
  %3:_(p0) = COPY $x11
  %1:_(s128) = G_LOAD %3:_(p0) :: (load (s128), align 1)
  %4:_(s128) = G_SHL %0:_, %1:_(s128)
  %5:_(s64) = G_TRUNC %4:_(s128)
  %6:_(s32), %7:_(s32) = G_UNMERGE_VALUES %5:_(s64)
  $x10 = COPY %6:_(s32)
  $x11 = COPY %7:_(s32)
  PseudoRET implicit $x10, implicit $x11

Caller

Frame Objects:
  fi#0: size=128, align=8, at location [SP]
  fi#1: size=128, align=8, at location [SP]

bb.1 (%ir-block.0):
  %1:_(s128) = G_CONSTANT i128 1
  %2:_(s128) = G_CONSTANT i128 2
  ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
  %3:_(p0) = G_FRAME_INDEX %stack.0
  G_STORE %1:_(s128), %3:_(p0) :: (store (s128), align 8)
  $x10 = COPY %3:_(p0)
  %4:_(p0) = G_FRAME_INDEX %stack.1
  G_STORE %2:_(s128), %4:_(p0) :: (store (s128), align 8)
  $x11 = COPY %4:_(p0)
  PseudoCALL target-flags(riscv-call) @fun, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x8_x9 $x18_x19 $x20_x21 $x22_x23 $x24_x25 $x26_x27>, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
  ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
  %5:_(s32) = COPY $x10
  %6:_(s32) = COPY $x11
  %0:_(s64) = G_MERGE_VALUES %5:_(s32), %6:_(s32)
  %7:_(s32) = G_TRUNC %0:_(s64)
  $x10 = COPY %7:_(s32)
  PseudoRET implicit $x10

Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem.

Thank you for your time and patience in advance.

llvmbot · 2024-06-13T16:25:49Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Gábor Spaits (spaits)

Changes

Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y).

RISC-V specific

In RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions.

GlobalISel framework change and relation to SelectionDAG

In SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at TLI->getRegisterTypeForCallingConv) in GlobalISel, very similar logic can be found in CallLowering::determineAssignments.

SelectionDAG then calls the target specific TargetLowering::LowerFormalArguments callbacks. Then in this callback based on the target dependent code calls generic SelectionDAG functions to complete the assignment handling. This part of the code is responsibel for making the indirect parameter passing in RISC-V (In RISCVISelLowering.cpp RISCVTargetLowering::LowerFormalArguments around line 19410). In a GlobalISel backend the generic mir building is handled by CallLowering::handleAssignments. This is not a callback. This function is usually called by CallLowering::lowerCall or CallLowering::lowerFormalArguments.

Based on these observations, I think that the best way to implement indirect parameter passing is in CallLowering::handleAssignments since it kind of generalizes the relevant part of the logic found in TargetLowering::LowerFormalArguments so all the targets can have indirect parameter passing that use CCValAssign is CCValAssign::Indirect in the calling convention. Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to

In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a CCValAssign with CCValAssign::Indirect.

Another approach

Should I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a CCValAssign is CCValAssign::Indirect.

What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea?

If this functionality does not belong to the framework, then how should I proceed? Override the handleAssignments functio for RISCVCallLowering? This would create code duplication, since most of the function is good for RISC-V but it misses the indirect passing for large scalars required by the standard.

Results of the current approach

For this llvm IR on rv32 target:

define i64 @<!-- -->fun(i128 %x, i128 %y ) {
  %a = shl i128 %x, %y
  %2 = trunc i128 %a to i64
  ret i64 %2
}

define i32 @<!-- -->fun_caller( ) {
  %1 = call i64 @<!-- -->fun(i128 1, i128 2)
  %2 = trunc i64 %1 to i32
  ret i32 %2
}

We get this after the IRtranslator:
Callee

Function Live Ins: $x10, $x11

bb.1 (%ir-block.0):
  liveins: $x10, $x11
  %2:_(p0) = COPY $x10
  %0:_(s128) = G_LOAD %2:_(p0) :: (load (s128), align 1)
  %3:_(p0) = COPY $x11
  %1:_(s128) = G_LOAD %3:_(p0) :: (load (s128), align 1)
  %4:_(s128) = G_SHL %0:_, %1:_(s128)
  %5:_(s64) = G_TRUNC %4:_(s128)
  %6:_(s32), %7:_(s32) = G_UNMERGE_VALUES %5:_(s64)
  $x10 = COPY %6:_(s32)
  $x11 = COPY %7:_(s32)
  PseudoRET implicit $x10, implicit $x11

Caller

Frame Objects:
  fi#<!-- -->0: size=128, align=8, at location [SP]
  fi#<!-- -->1: size=128, align=8, at location [SP]

bb.1 (%ir-block.0):
  %1:_(s128) = G_CONSTANT i128 1
  %2:_(s128) = G_CONSTANT i128 2
  ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
  %3:_(p0) = G_FRAME_INDEX %stack.0
  G_STORE %1:_(s128), %3:_(p0) :: (store (s128), align 8)
  $x10 = COPY %3:_(p0)
  %4:_(p0) = G_FRAME_INDEX %stack.1
  G_STORE %2:_(s128), %4:_(p0) :: (store (s128), align 8)
  $x11 = COPY %4:_(p0)
  PseudoCALL target-flags(riscv-call) @<!-- -->fun, &lt;regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x8_x9 $x18_x19 $x20_x21 $x22_x23 $x24_x25 $x26_x27&gt;, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
  ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
  %5:_(s32) = COPY $x10
  %6:_(s32) = COPY $x11
  %0:_(s64) = G_MERGE_VALUES %5:_(s32), %6:_(s32)
  %7:_(s32) = G_TRUNC %0:_(s64)
  $x10 = COPY %7:_(s32)
  PseudoRET implicit $x10

Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem.

Thank you for your time and patience in advance.

Patch is 22.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95429.diff

4 Files Affected:

(modified) llvm/lib/CodeGen/GlobalISel/CallLowering.cpp (+49-12)
(modified) llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp (+7-8)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll (+167)
(modified) llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll (+79)

diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index 412cd0a21ad41..daa5e0c07e9b7 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -751,6 +751,7 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
     const LLT NewLLT = Handler.isIncomingArgumentHandler() ? LocTy : ValTy;
     const EVT OrigVT = EVT::getEVT(Args[i].Ty);
     const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);
+    const LLT PointerTy = LLT::pointer(0, DL.getPointerSizeInBits(0));
 
     // Expected to be multiple regs for a single incoming arg.
     // There should be Regs.size() ArgLocs per argument.
@@ -764,19 +765,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       // If we can't directly assign the register, we need one or more
       // intermediate values.
       Args[i].Regs.resize(NumParts);
-
-      // For each split register, create and assign a vreg that will store
-      // the incoming component of the larger value. These will later be
-      // merged to form the final vreg.
-      for (unsigned Part = 0; Part < NumParts; ++Part)
-        Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+    
+      // When we have indirect parameter passing we are receiving a pointer,
+      // that points to the actual value. In that case we need a pointer.
+      if (VA.getLocInfo() == CCValAssign::Indirect &&
+          Args[i].Flags[0].isSplit()) {
+        if (Handler.isIncomingArgumentHandler())
+          Args[i].Regs[0] = MRI.createGenericVirtualRegister(PointerTy);
+      } else {
+        // For each split register, create and assign a vreg that will store
+        // the incoming component of the larger value. These will later be
+        // merged to form the final vreg.
+        for (unsigned Part = 0; Part < NumParts; ++Part)
+          Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+      }
     }
 
     assert((j + (NumParts - 1)) < ArgLocs.size() &&
            "Too many regs for number of args");
 
     // Coerce into outgoing value types before register assignment.
-    if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy) {
+    if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy &&
+        VA.getLocInfo() != CCValAssign::Indirect) {
       assert(Args[i].OrigRegs.size() == 1);
       buildCopyToRegs(MIRBuilder, Args[i].Regs, Args[i].OrigRegs[0], OrigTy,
                       ValTy, extendOpFromFlags(Args[i].Flags[0]));
@@ -790,6 +800,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       CCValAssign &VA = ArgLocs[j + Idx];
       const ISD::ArgFlagsTy Flags = Args[i].Flags[Part];
 
+      // We found an indirect parameter passing and we are at the first part of
+      // the value being passed. In this case copy the incoming pointer into a
+      // virtual register so later we can load it.
+      if (VA.getLocInfo() == CCValAssign::Indirect && Flags.isSplit()) {
+        if (Handler.isIncomingArgumentHandler())
+          Handler.assignValueToReg(ArgReg, VA.getLocReg(), VA);
+        else {
+          MachineFrameInfo &MFI = MF.getFrameInfo();
+          int FrameIdx = MFI.CreateStackObject(OrigTy.getScalarSizeInBits(),
+                                               Align(8), false);
+
+          auto PointerToStackReg =
+              MIRBuilder.buildFrameIndex(PointerTy, FrameIdx)
+                  ->getOperand(0)
+                  .getReg();
+          Handler.assignValueToAddress(Args[i].OrigRegs[Part], PointerToStackReg,
+                                       OrigTy, MachinePointerInfo{}, VA);
+          Handler.assignValueToReg(PointerToStackReg, VA.getLocReg(), VA);
+        }
+        break;
+      }
+
       if (VA.isMemLoc() && !Flags.isByVal()) {
         // Individual pieces may have been spilled to the stack and others
         // passed in registers.
@@ -866,11 +898,16 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       }
     }
 
-    // Now that all pieces have been assigned, re-pack the register typed values
-    // into the original value typed registers.
-    if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
-      // Merge the split registers into the expected larger result vregs of
-      // the original call.
+    // In case of indirect parameter passing load the value referred to by
+    // the argument.
+    if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT &&
+        VA.getLocInfo() == CCValAssign::Indirect) {
+      Handler.assignValueToAddress(Args[i].OrigRegs[0], Args[i].Regs[0], OrigTy,
+                                   MachinePointerInfo{}, VA);
+
+    } else if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
+      // Now that all pieces have been assigned, re-pack the register typed values
+      // into the original value typed registers.
       buildCopyFromRegs(MIRBuilder, Args[i].OrigRegs, Args[i].Regs, OrigTy,
                         LocTy, Args[i].Flags[0]);
     }
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
index 2bfee45852b20..b1f381f4b30ad 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
@@ -89,12 +89,13 @@ struct RISCVOutgoingValueHandler : public CallLowering::OutgoingValueHandler {
                             const MachinePointerInfo &MPO,
                             const CCValAssign &VA) override {
     MachineFunction &MF = MIRBuilder.getMF();
-    uint64_t LocMemOffset = VA.getLocMemOffset();
-
+    uint64_t Offset = 0;
+    if (VA.isMemLoc())
+      Offset = VA.getLocMemOffset();
+    
     // TODO: Move StackAlignment to subtarget and share with FrameLowering.
-    auto MMO =
-        MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
-                                commonAlignment(Align(16), LocMemOffset));
+    auto *MMO = MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
+                                       commonAlignment(Align(16), Offset));
 
     Register ExtReg = extendRegister(ValVReg, VA);
     MIRBuilder.buildStore(ExtReg, Addr, *MMO);
@@ -341,10 +342,8 @@ static bool isLegalElementTypeForRVV(Type *EltTy,
 // TODO: Remove IsLowerArgs argument by adding support for vectors in lowerCall.
 static bool isSupportedArgumentType(Type *T, const RISCVSubtarget &Subtarget,
                                     bool IsLowerArgs = false) {
-  // TODO: Integers larger than 2*XLen are passed indirectly which is not
-  // supported yet.
   if (T->isIntegerTy())
-    return T->getIntegerBitWidth() <= Subtarget.getXLen() * 2;
+    return true;
   if (T->isHalfTy() || T->isFloatTy() || T->isDoubleTy())
     return true;
   if (T->isPointerTy())
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
index 1a3489521af19..92f5f6220f096 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
@@ -16,6 +16,173 @@
 ; Check that on RV32, i64 is passed in a pair of registers. Unlike
 ; the convention for varargs, this need not be an aligned pair.
 
+define i64 @callee_128i_in_regs(i128 %x, i128 %y ) {
+  ; RV32I-LABEL: name: callee_128i_in_regs
+  ; RV32I: bb.1 (%ir-block.0):
+  ; RV32I-NEXT:   liveins: $x10, $x11
+  ; RV32I-NEXT: {{  $}}
+  ; RV32I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV32I-NEXT:   [[LOAD:%[0-9]+]]:_(s128) = G_LOAD [[COPY]](p0) :: (load (s128), align 1)
+  ; RV32I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV32I-NEXT:   [[LOAD1:%[0-9]+]]:_(s128) = G_LOAD [[COPY1]](p0) :: (load (s128), align 1)
+  ; RV32I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s128)
+  ; RV32I-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+  ; RV32I-NEXT:   $x10 = COPY [[UV]](s32)
+  ; RV32I-NEXT:   $x11 = COPY [[UV1]](s32)
+  ; RV32I-NEXT:   PseudoRET implicit $x10, implicit $x11
+  %2 = trunc i128 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_128i_in_regs( ) {
+  ; ILP32-LABEL: name: caller_128i_in_regs
+  ; ILP32: bb.1 (%ir-block.0):
+  ; ILP32-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32F-LABEL: name: caller_128i_in_regs
+  ; ILP32F: bb.1 (%ir-block.0):
+  ; ILP32F-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32F-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32F-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32F-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32F-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32F-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32F-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32F-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32F-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32F-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32F-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32F-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32D-LABEL: name: caller_128i_in_regs
+  ; ILP32D: bb.1 (%ir-block.0):
+  ; ILP32D-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32D-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32D-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32D-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32D-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32D-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32D-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32D-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32D-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32D-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32D-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32D-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32D-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32D-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32D-NEXT:   PseudoRET implicit $x10
+  %1 = call i64 @callee_128i_in_regs(i128 1, i128 2)
+  %2 = trunc i64 %1 to i32
+  ret i32 %2
+}
+
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+  ; RV32I-LABEL: name: callee_256i_in_regs
+  ; RV32I: bb.1 (%ir-block.0):
+  ; RV32I-NEXT:   liveins: $x10, $x11
+  ; RV32I-NEXT: {{  $}}
+  ; RV32I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV32I-NEXT:   [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+  ; RV32I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV32I-NEXT:   [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+  ; RV32I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+  ; RV32I-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+  ; RV32I-NEXT:   $x10 = COPY [[UV]](s32)
+  ; RV32I-NEXT:   $x11 = COPY [[UV1]](s32)
+  ; RV32I-NEXT:   PseudoRET implicit $x10, implicit $x11
+  %2 = trunc i256 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+  ; ILP32-LABEL: name: caller_256i_in_regs
+  ; ILP32: bb.1 (%ir-block.0):
+  ; ILP32-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32F-LABEL: name: caller_256i_in_regs
+  ; ILP32F: bb.1 (%ir-block.0):
+  ; ILP32F-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32F-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32F-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32F-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32F-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32F-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32F-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32F-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32F-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32F-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32F-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32F-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32D-LABEL: name: caller_256i_in_regs
+  ; ILP32D: bb.1 (%ir-block.0):
+  ; ILP32D-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32D-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32D-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32D-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32D-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32D-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32D-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32D-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32D-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32D-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32D-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32D-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32D-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32D-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32D-NEXT:   PseudoRET implicit $x10
+  %1 = call i64 @callee_256i_in_regs(i256 1, i256 2)
+  %2 = trunc i64 %1 to i32
+  ret i32 %2
+}
+
 define i32 @callee_i64_in_regs(i32 %a, i64 %b) nounwind {
   ; RV32I-LABEL: name: callee_i64_in_regs
   ; RV32I: bb.1 (%ir-block.0):
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
index b175b8d92e6c9..0467bbe3d41bb 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
@@ -16,6 +16,85 @@
 ; Check that on RV64, i128 is passed in a pair of registers. Unlike
 ; the convention for varargs, this need not be an aligned pair.
 
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+  ; RV64I-LABEL: name: callee_256i_in_regs
+  ; RV64I: bb.1 (%ir-block.0):
+  ; RV64I-NEXT:   liveins: $x10, $x11
+  ; RV64I-NEXT: {{  $}}
+  ; RV64I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV64I-NEXT:   [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+  ; RV64I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV64I-NEXT:   [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+  ; RV64I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+  ; RV64I-NEXT:   $x10 = COPY [[TRUNC]](s64)
+  ; RV64I-NEXT:   PseudoRET implicit $x10
+  %2 = trunc i256 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+  ; LP64-LABEL: name: caller_256i_in_regs
+  ; LP64: bb.1 (%ir-block.0):
+  ; LP64-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; LP64-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; LP64-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; LP64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; LP64-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; LP64-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; LP64-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; LP64-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; LP64-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; LP64-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10
+  ; LP64-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; LP64-NEXT:   [[COPY:%[0-9]+]]:_(s64) = COPY $x10
+  ; LP64-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
+  ; LP64-NEXT:   [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[TRUNC]](s32)
+  ; LP64-NEXT:   $x10 = COPY [[ANYEXT]](s64)
+  ; LP64-NEXT:   PseudoRET implicit $x10
+  ;
+  ; LP64F-LABEL: name: caller_256i_in_regs
+  ; LP64F: bb.1 (%ir-block.0):
+  ; LP64F-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; LP64F-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; LP64F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; LP64F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; LP64F-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; LP64F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; LP64F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_IN...
[truncated]

github-actions · 2024-06-13T16:28:21Z

✅ With the latest revision this PR passed the C/C++ code formatter.

s-barannikov · 2024-06-13T16:49:34Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

topperc · 2024-06-13T17:03:32Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

arsenm · 2024-06-13T17:09:10Z

The IR has its own ABI, independent of whatever any particular frontend does. You still should handle any IR function signature regardless of what clang does

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp

spaits · 2024-06-13T17:20:11Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

define dso_local noundef i128 @fun(__int128, int)(i128 noundef %i, i32 noundef %a) {
entry:
  %i.addr = alloca i128, align 16
  %a.addr = alloca i32, align 4
  %r = alloca i128, align 16
  store i128 %i, ptr %i.addr, align 16
  store i32 %a, ptr %a.addr, align 4
  %0 = load i128, ptr %i.addr, align 16
  %1 = load i32, ptr %a.addr, align 4
  %sh_prom = zext i32 %1 to i128
  %shl = shl i128 %0, %sh_prom
  store i128 %shl, ptr %r, align 16
  %2 = load i128, ptr %r, align 16
  ret i128 %2
}

In case of large struct, yes, the frontend wants to pass them as a reference. Code: https://godbolt.org/z/4M1cEa5hh
This is the IR after the final step in before the irtranslator:

define dso_local i128 @fun(LargeStruct, int)(%struct.LargeStruct* byval(%struct.LargeStruct) align 8 %0, i32 %a) {
entry:
  %i = alloca %struct.LargeStruct, align 16
  %a.addr = alloca i32, align 4
  %r = alloca i128, align 16
  %1 = bitcast %struct.LargeStruct* %i to i8*
  %2 = bitcast %struct.LargeStruct* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 16 %1, i8* align 8 %2, i32 80, i1 false)
  store i32 %a, i32* %a.addr, align 4
  %a1 = getelementptr inbounds %struct.LargeStruct, %struct.LargeStruct* %i, i32 0, i32 0
  %3 = load i128, i128* %a1, align 16
  %4 = load i32, i32* %a.addr, align 4
  %sh_prom = zext i32 %4 to i128
  %shl = shl i128 %3, %sh_prom
  store i128 %shl, i128* %r, align 16
  %5 = load i128, i128* %r, align 16
  ret i128 %5
}

topperc · 2024-06-13T17:28:58Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

spaits · 2024-06-13T17:36:39Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

topperc · 2024-06-13T17:38:36Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

spaits · 2024-06-13T17:43:21Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

No they don't. But shouldn't we convert the full back-end to globalisel? And if we want to do that we also need to pass 128 bit values I think. I know, that a front-end fix would solve it for GlobalISel, but then that front end fix should be introduced to other front-ends like the Rust front-end, the Zig front-end and so on. So I think fixing this in the back-end is better.

But if you think, that this should really go to the front-end I would be glad to implement that too.

topperc · 2024-06-13T17:59:49Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

No they don't. But shouldn't we convert the full back-end to globalisel? And if we want to do that we also need to pass 128 bit values I think. I know, that a front-end fix would solve it for GlobalISel, but then that front end fix should be introduced to other front-ends like the Rust front-end, the Zig front-end and so on. So I think fixing this in the back-end is better.

But if you think, that this should really go to the front-end I would be glad to implement that too.

Looks like the clang frontend intentionally defers indirect scalar handling to the backend. I thought maybe -fforce-enable-int128 had just been overlooked. I didn't think there were any scalars larger than 2*XLen for rv32 by default. But I forgot about fp128 and __BitInt.

Large __BitInt does force indirect in the frontend. Though which sizes is affected by -fforce-enable-int128 which is probably a separate bug.

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

s-barannikov · 2024-06-13T20:04:54Z

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

The IR has its own ABI, independent of whatever any particular frontend does. You still should handle any IR function signature regardless of what clang does

I wrote this assuming that lowering of large integer types is intentionally left to the backend, which doesn't appear so.
If the front end does the lowering, then i128 can only appear on functions if the IR was created by other means (such as middle end optimizations or by an external tool), in which case language ABI rules do not necessarily apply and i128 could as well be passed in registers.
Supporting Indirect flag in GlobalISel is not a bad thing and may be desired by other targets that prefer leaving some lowering to the backend. I just don't see much value in it for RISC-V.

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

topperc · 2024-06-13T21:02:06Z

I wrote this assuming that lowering of large integer types is intentionally left to the backend, which doesn't appear so.

Looks like we intentionally defer the handling for scalars to the backend. Except for _BitInt

tschuett · 2024-06-14T05:53:23Z

Could you change in the summary:
Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to
to
Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to

spaits · 2024-06-14T05:56:50Z

Could you change in the summary: Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to to Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to

Updated the description. I have also finished that sentence :) .

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll

s-barannikov

LGTM

s-barannikov · 2024-06-17T21:12:33Z

(I'd expect someone from RISC-V team to take a look too.)

spaits · 2024-06-17T21:13:57Z

@michaelmaitland @topperc @preames Could you please also take a look at (or another look at) this PR?

michaelmaitland

LGTM as well

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll

spaits · 2024-06-19T10:28:20Z

There is already an approve on this PR from the GlobalISel team (@s-barannikov ) and from the RISCV team (@michaelmaitland ). If there is no new review on this PR I will merge it in 12 hours.

s-barannikov · 2024-06-19T10:32:27Z

from the GlobalISel team (@s-barannikov )

FWIW I am not

spaits · 2024-06-19T10:54:53Z

from the GlobalISel team (@s-barannikov )

FWIW I am not

Sorry for assuming. I thought that you are a GlobalISel team member, because you could help me with lots of stuff regarding it so you must understand the framework and its working very well.

topperc

LGTM

Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1). This commit implements this indirect parameter passing in GlobalISel. --------- Co-authored-by: Gabor Spaits <Gabor.Spaits@hightec-rt.com>

spaits requested a review from topperc June 13, 2024 16:25

llvmbot added backend:RISC-V llvm:globalisel labels Jun 13, 2024

tschuett requested review from aemerson and arsenm June 13, 2024 16:44

michaelmaitland self-requested a review June 13, 2024 16:54

michaelmaitland reviewed Jun 13, 2024

View reviewed changes

topperc reviewed Jun 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

tschuett reviewed Jun 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 14, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 14, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 14, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 14, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

s-barannikov reviewed Jun 17, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

Use i128 instead of i127

d026b72

s-barannikov reviewed Jun 17, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

Gabor Spaits added 2 commits June 17, 2024 21:56

Do assignment for incoming value handler too

5ee2d68

Decide offset based on type

05d719d

michaelmaitland reviewed Jun 17, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll Show resolved Hide resolved

Use the large registers

9196bc5

s-barannikov reviewed Jun 17, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

Use getUnknownStack

8135987

s-barannikov reviewed Jun 17, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

Add an assertion

a6caddc

s-barannikov approved these changes Jun 17, 2024

View reviewed changes

spaits requested a review from preames June 17, 2024 21:19

Remove includes added by clangd

e31266a

michaelmaitland approved these changes Jun 17, 2024

View reviewed changes

topperc reviewed Jun 17, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/CallLowering.cpp Outdated Show resolved Hide resolved

topperc reviewed Jun 17, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

topperc reviewed Jun 17, 2024

View reviewed changes

llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll Outdated Show resolved Hide resolved

Gabor Spaits added 2 commits June 18, 2024 07:43

Change the name of the test cases and reorder them

188a96a

[NFC] Rename some variables

dd795cd

spaits requested a review from topperc June 18, 2024 09:13

topperc approved these changes Jun 19, 2024

View reviewed changes

spaits merged commit f450408 into llvm:main Jun 19, 2024
7 checks passed

[GISel][RISCV]Implement indirect parameter passing #95429

[GISel][RISCV]Implement indirect parameter passing #95429

Uh oh!

Conversation

spaits commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RISC-V specific

GlobalISel framework change and relation to SelectionDAG

Another approach

Results of the current approach

Uh oh!

llvmbot commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RISC-V specific

GlobalISel framework change and relation to SelectionDAG

Another approach

Results of the current approach

Uh oh!

github-actions bot commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-barannikov commented Jun 13, 2024

Uh oh!

topperc commented Jun 13, 2024

Uh oh!

arsenm commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spaits commented Jun 13, 2024

Uh oh!

topperc commented Jun 13, 2024

Uh oh!

spaits commented Jun 13, 2024

Uh oh!

topperc commented Jun 13, 2024

Uh oh!

spaits commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

topperc commented Jun 13, 2024

Uh oh!

Uh oh!

s-barannikov commented Jun 13, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

topperc commented Jun 13, 2024

Uh oh!

tschuett commented Jun 14, 2024

Uh oh!

spaits commented Jun 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

s-barannikov left a comment

Choose a reason for hiding this comment

Uh oh!

s-barannikov commented Jun 17, 2024

Uh oh!

spaits commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelmaitland left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spaits commented Jun 13, 2024 •

edited

Loading

llvmbot commented Jun 13, 2024 •

edited

Loading

github-actions bot commented Jun 13, 2024 •

edited

Loading

spaits commented Jun 13, 2024 •

edited

Loading

spaits commented Jun 17, 2024 •

edited

Loading

spaits commented Jun 19, 2024 •

edited

Loading