Skip to content

[GISel][RISCV]Implement indirect parameter passing #95429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

spaits
Copy link
Contributor

@spaits spaits commented Jun 13, 2024

Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y).

RISC-V specific

In RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions.

GlobalISel framework change and relation to SelectionDAG

In SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at TLI->getRegisterTypeForCallingConv) in GlobalISel, very similar logic can be found in CallLowering::determineAssignments.

SelectionDAG then calls the target specific TargetLowering::LowerFormalArguments callbacks. Then in this callback based on the target dependent code calls generic SelectionDAG functions to complete the assignment handling. This part of the code is responsibel for making the indirect parameter passing in RISC-V (In RISCVISelLowering.cpp RISCVTargetLowering::LowerFormalArguments around line 19410). In a GlobalISel backend the generic mir building is handled by CallLowering::handleAssignments. This is not a callback. This function is usually called by CallLowering::lowerCall or CallLowering::lowerFormalArguments.

Based on these observations, I think that the best way to implement indirect parameter passing is in CallLowering::handleAssignments since it kind of generalizes the relevant part of the logic found in TargetLowering::LowerFormalArguments so all the targets can have indirect parameter passing that use CCValAssign is CCValAssign::Indirect in the calling convention. Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to use.

In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a CCValAssign with CCValAssign::Indirect.

Another approach

Should I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a CCValAssign is CCValAssign::Indirect.

What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea?

If this functionality does not belong to the framework, then how should I proceed? Override the handleAssignments functio for RISCVCallLowering? This would create code duplication, since most of the function is good for RISC-V but it misses the indirect passing for large scalars required by the standard.

Results of the current approach

For this llvm IR on rv32 target:

define i64 @fun(i128 %x, i128 %y ) {
  %a = shl i128 %x, %y
  %2 = trunc i128 %a to i64
  ret i64 %2
}

define i32 @fun_caller( ) {
  %1 = call i64 @fun(i128 1, i128 2)
  %2 = trunc i64 %1 to i32
  ret i32 %2
}

We get this after the IRtranslator:
Callee

Function Live Ins: $x10, $x11

bb.1 (%ir-block.0):
  liveins: $x10, $x11
  %2:_(p0) = COPY $x10
  %0:_(s128) = G_LOAD %2:_(p0) :: (load (s128), align 1)
  %3:_(p0) = COPY $x11
  %1:_(s128) = G_LOAD %3:_(p0) :: (load (s128), align 1)
  %4:_(s128) = G_SHL %0:_, %1:_(s128)
  %5:_(s64) = G_TRUNC %4:_(s128)
  %6:_(s32), %7:_(s32) = G_UNMERGE_VALUES %5:_(s64)
  $x10 = COPY %6:_(s32)
  $x11 = COPY %7:_(s32)
  PseudoRET implicit $x10, implicit $x11

Caller

Frame Objects:
  fi#0: size=128, align=8, at location [SP]
  fi#1: size=128, align=8, at location [SP]

bb.1 (%ir-block.0):
  %1:_(s128) = G_CONSTANT i128 1
  %2:_(s128) = G_CONSTANT i128 2
  ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
  %3:_(p0) = G_FRAME_INDEX %stack.0
  G_STORE %1:_(s128), %3:_(p0) :: (store (s128), align 8)
  $x10 = COPY %3:_(p0)
  %4:_(p0) = G_FRAME_INDEX %stack.1
  G_STORE %2:_(s128), %4:_(p0) :: (store (s128), align 8)
  $x11 = COPY %4:_(p0)
  PseudoCALL target-flags(riscv-call) @fun, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x8_x9 $x18_x19 $x20_x21 $x22_x23 $x24_x25 $x26_x27>, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
  ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
  %5:_(s32) = COPY $x10
  %6:_(s32) = COPY $x11
  %0:_(s64) = G_MERGE_VALUES %5:_(s32), %6:_(s32)
  %7:_(s32) = G_TRUNC %0:_(s64)
  $x10 = COPY %7:_(s32)
  PseudoRET implicit $x10

Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem.

Thank you for your time and patience in advance.

@llvmbot
Copy link
Member

llvmbot commented Jun 13, 2024

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Gábor Spaits (spaits)

Changes

Some targets like RISC-V pass scalars wider than 2×XLEN bits by reference, so those arguments are replaced in the argument list with an address (See RISC-V ABIs Specification 1.0 section 2.1) and also based on the behavior of the armv7a target, probably the same happens on 32 bit arm (https://godbolt.org/z/8frPb7b3Y).

RISC-V specific

In RISC-Vs GlobalISel backend the passing of large scalars have been disabled by the commit f533e8c since it generated wrong code. For example in 32bit RISC-V when passing 128 bit value instead of using four registers, it used one registers and assigned value to that one register four times, so basically it just overwrote the data over and over again. And even if it were used four register it still wouldn't be compliant with the ABI's calling conventions.

GlobalISel framework change and relation to SelectionDAG

In SelectionDAG, the assignments needed for a function call are determined in SelectionDAGBuilder.cpp SelectionDAGISel::LowerArguments (around line 11440 there beginning at TLI-&gt;getRegisterTypeForCallingConv) in GlobalISel, very similar logic can be found in CallLowering::determineAssignments.

SelectionDAG then calls the target specific TargetLowering::LowerFormalArguments callbacks. Then in this callback based on the target dependent code calls generic SelectionDAG functions to complete the assignment handling. This part of the code is responsibel for making the indirect parameter passing in RISC-V (In RISCVISelLowering.cpp RISCVTargetLowering::LowerFormalArguments around line 19410). In a GlobalISel backend the generic mir building is handled by CallLowering::handleAssignments. This is not a callback. This function is usually called by CallLowering::lowerCall or CallLowering::lowerFormalArguments.

Based on these observations, I think that the best way to implement indirect parameter passing is in CallLowering::handleAssignments since it kind of generalizes the relevant part of the logic found in TargetLowering::LowerFormalArguments so all the targets can have indirect parameter passing that use CCValAssign is CCValAssign::Indirect in the calling convention. Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to

In short, with this PR this can work for all targets, if the targets calling convention decides that it needs indirect parameter passing by marking a CCValAssign with CCValAssign::Indirect.

Another approach

Should I rather extend the interface of CallLowering? We could add a callback named "shouldPassIndirectly" and it could decide whether it would be appropiate to do indirect parameter passing? If it is appropiate then we would call the passParameterIndirectly callback, that would do the copies stores and loads. So targets could disable indirect parameter passing even if a a CCValAssign is CCValAssign::Indirect.

What do you think? Is it a good idea, to implement indirect passing on GlobalISel framework level? If it is, then is my current approach correct or we should rather do the CallLowering interface change or you have another idea?

If this functionality does not belong to the framework, then how should I proceed? Override the handleAssignments functio for RISCVCallLowering? This would create code duplication, since most of the function is good for RISC-V but it misses the indirect passing for large scalars required by the standard.

Results of the current approach

For this llvm IR on rv32 target:

define i64 @<!-- -->fun(i128 %x, i128 %y ) {
  %a = shl i128 %x, %y
  %2 = trunc i128 %a to i64
  ret i64 %2
}

define i32 @<!-- -->fun_caller( ) {
  %1 = call i64 @<!-- -->fun(i128 1, i128 2)
  %2 = trunc i64 %1 to i32
  ret i32 %2
}

We get this after the IRtranslator:
Callee

Function Live Ins: $x10, $x11

bb.1 (%ir-block.0):
  liveins: $x10, $x11
  %2:_(p0) = COPY $x10
  %0:_(s128) = G_LOAD %2:_(p0) :: (load (s128), align 1)
  %3:_(p0) = COPY $x11
  %1:_(s128) = G_LOAD %3:_(p0) :: (load (s128), align 1)
  %4:_(s128) = G_SHL %0:_, %1:_(s128)
  %5:_(s64) = G_TRUNC %4:_(s128)
  %6:_(s32), %7:_(s32) = G_UNMERGE_VALUES %5:_(s64)
  $x10 = COPY %6:_(s32)
  $x11 = COPY %7:_(s32)
  PseudoRET implicit $x10, implicit $x11

Caller

Frame Objects:
  fi#<!-- -->0: size=128, align=8, at location [SP]
  fi#<!-- -->1: size=128, align=8, at location [SP]

bb.1 (%ir-block.0):
  %1:_(s128) = G_CONSTANT i128 1
  %2:_(s128) = G_CONSTANT i128 2
  ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
  %3:_(p0) = G_FRAME_INDEX %stack.0
  G_STORE %1:_(s128), %3:_(p0) :: (store (s128), align 8)
  $x10 = COPY %3:_(p0)
  %4:_(p0) = G_FRAME_INDEX %stack.1
  G_STORE %2:_(s128), %4:_(p0) :: (store (s128), align 8)
  $x11 = COPY %4:_(p0)
  PseudoCALL target-flags(riscv-call) @<!-- -->fun, &lt;regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x8_x9 $x18_x19 $x20_x21 $x22_x23 $x24_x25 $x26_x27&gt;, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
  ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
  %5:_(s32) = COPY $x10
  %6:_(s32) = COPY $x11
  %0:_(s64) = G_MERGE_VALUES %5:_(s32), %6:_(s32)
  %7:_(s32) = G_TRUNC %0:_(s64)
  $x10 = COPY %7:_(s32)
  PseudoRET implicit $x10

Please help checking the PR and give your opinions and observation. I am open to implement any possible solution to solve this problem.

Thank you for your time and patience in advance.


Patch is 22.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95429.diff

4 Files Affected:

  • (modified) llvm/lib/CodeGen/GlobalISel/CallLowering.cpp (+49-12)
  • (modified) llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp (+7-8)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll (+167)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll (+79)
diff --git a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
index 412cd0a21ad41..daa5e0c07e9b7 100644
--- a/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/CallLowering.cpp
@@ -751,6 +751,7 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
     const LLT NewLLT = Handler.isIncomingArgumentHandler() ? LocTy : ValTy;
     const EVT OrigVT = EVT::getEVT(Args[i].Ty);
     const LLT OrigTy = getLLTForType(*Args[i].Ty, DL);
+    const LLT PointerTy = LLT::pointer(0, DL.getPointerSizeInBits(0));
 
     // Expected to be multiple regs for a single incoming arg.
     // There should be Regs.size() ArgLocs per argument.
@@ -764,19 +765,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       // If we can't directly assign the register, we need one or more
       // intermediate values.
       Args[i].Regs.resize(NumParts);
-
-      // For each split register, create and assign a vreg that will store
-      // the incoming component of the larger value. These will later be
-      // merged to form the final vreg.
-      for (unsigned Part = 0; Part < NumParts; ++Part)
-        Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+    
+      // When we have indirect parameter passing we are receiving a pointer,
+      // that points to the actual value. In that case we need a pointer.
+      if (VA.getLocInfo() == CCValAssign::Indirect &&
+          Args[i].Flags[0].isSplit()) {
+        if (Handler.isIncomingArgumentHandler())
+          Args[i].Regs[0] = MRI.createGenericVirtualRegister(PointerTy);
+      } else {
+        // For each split register, create and assign a vreg that will store
+        // the incoming component of the larger value. These will later be
+        // merged to form the final vreg.
+        for (unsigned Part = 0; Part < NumParts; ++Part)
+          Args[i].Regs[Part] = MRI.createGenericVirtualRegister(NewLLT);
+      }
     }
 
     assert((j + (NumParts - 1)) < ArgLocs.size() &&
            "Too many regs for number of args");
 
     // Coerce into outgoing value types before register assignment.
-    if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy) {
+    if (!Handler.isIncomingArgumentHandler() && OrigTy != ValTy &&
+        VA.getLocInfo() != CCValAssign::Indirect) {
       assert(Args[i].OrigRegs.size() == 1);
       buildCopyToRegs(MIRBuilder, Args[i].Regs, Args[i].OrigRegs[0], OrigTy,
                       ValTy, extendOpFromFlags(Args[i].Flags[0]));
@@ -790,6 +800,28 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       CCValAssign &VA = ArgLocs[j + Idx];
       const ISD::ArgFlagsTy Flags = Args[i].Flags[Part];
 
+      // We found an indirect parameter passing and we are at the first part of
+      // the value being passed. In this case copy the incoming pointer into a
+      // virtual register so later we can load it.
+      if (VA.getLocInfo() == CCValAssign::Indirect && Flags.isSplit()) {
+        if (Handler.isIncomingArgumentHandler())
+          Handler.assignValueToReg(ArgReg, VA.getLocReg(), VA);
+        else {
+          MachineFrameInfo &MFI = MF.getFrameInfo();
+          int FrameIdx = MFI.CreateStackObject(OrigTy.getScalarSizeInBits(),
+                                               Align(8), false);
+
+          auto PointerToStackReg =
+              MIRBuilder.buildFrameIndex(PointerTy, FrameIdx)
+                  ->getOperand(0)
+                  .getReg();
+          Handler.assignValueToAddress(Args[i].OrigRegs[Part], PointerToStackReg,
+                                       OrigTy, MachinePointerInfo{}, VA);
+          Handler.assignValueToReg(PointerToStackReg, VA.getLocReg(), VA);
+        }
+        break;
+      }
+
       if (VA.isMemLoc() && !Flags.isByVal()) {
         // Individual pieces may have been spilled to the stack and others
         // passed in registers.
@@ -866,11 +898,16 @@ bool CallLowering::handleAssignments(ValueHandler &Handler,
       }
     }
 
-    // Now that all pieces have been assigned, re-pack the register typed values
-    // into the original value typed registers.
-    if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
-      // Merge the split registers into the expected larger result vregs of
-      // the original call.
+    // In case of indirect parameter passing load the value referred to by
+    // the argument.
+    if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT &&
+        VA.getLocInfo() == CCValAssign::Indirect) {
+      Handler.assignValueToAddress(Args[i].OrigRegs[0], Args[i].Regs[0], OrigTy,
+                                   MachinePointerInfo{}, VA);
+
+    } else if (Handler.isIncomingArgumentHandler() && OrigVT != LocVT) {
+      // Now that all pieces have been assigned, re-pack the register typed values
+      // into the original value typed registers.
       buildCopyFromRegs(MIRBuilder, Args[i].OrigRegs, Args[i].Regs, OrigTy,
                         LocTy, Args[i].Flags[0]);
     }
diff --git a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
index 2bfee45852b20..b1f381f4b30ad 100644
--- a/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVCallLowering.cpp
@@ -89,12 +89,13 @@ struct RISCVOutgoingValueHandler : public CallLowering::OutgoingValueHandler {
                             const MachinePointerInfo &MPO,
                             const CCValAssign &VA) override {
     MachineFunction &MF = MIRBuilder.getMF();
-    uint64_t LocMemOffset = VA.getLocMemOffset();
-
+    uint64_t Offset = 0;
+    if (VA.isMemLoc())
+      Offset = VA.getLocMemOffset();
+    
     // TODO: Move StackAlignment to subtarget and share with FrameLowering.
-    auto MMO =
-        MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
-                                commonAlignment(Align(16), LocMemOffset));
+    auto *MMO = MF.getMachineMemOperand(MPO, MachineMemOperand::MOStore, MemTy,
+                                       commonAlignment(Align(16), Offset));
 
     Register ExtReg = extendRegister(ValVReg, VA);
     MIRBuilder.buildStore(ExtReg, Addr, *MMO);
@@ -341,10 +342,8 @@ static bool isLegalElementTypeForRVV(Type *EltTy,
 // TODO: Remove IsLowerArgs argument by adding support for vectors in lowerCall.
 static bool isSupportedArgumentType(Type *T, const RISCVSubtarget &Subtarget,
                                     bool IsLowerArgs = false) {
-  // TODO: Integers larger than 2*XLen are passed indirectly which is not
-  // supported yet.
   if (T->isIntegerTy())
-    return T->getIntegerBitWidth() <= Subtarget.getXLen() * 2;
+    return true;
   if (T->isHalfTy() || T->isFloatTy() || T->isDoubleTy())
     return true;
   if (T->isPointerTy())
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
index 1a3489521af19..92f5f6220f096 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-ilp32-ilp32f-ilp32d-common.ll
@@ -16,6 +16,173 @@
 ; Check that on RV32, i64 is passed in a pair of registers. Unlike
 ; the convention for varargs, this need not be an aligned pair.
 
+define i64 @callee_128i_in_regs(i128 %x, i128 %y ) {
+  ; RV32I-LABEL: name: callee_128i_in_regs
+  ; RV32I: bb.1 (%ir-block.0):
+  ; RV32I-NEXT:   liveins: $x10, $x11
+  ; RV32I-NEXT: {{  $}}
+  ; RV32I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV32I-NEXT:   [[LOAD:%[0-9]+]]:_(s128) = G_LOAD [[COPY]](p0) :: (load (s128), align 1)
+  ; RV32I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV32I-NEXT:   [[LOAD1:%[0-9]+]]:_(s128) = G_LOAD [[COPY1]](p0) :: (load (s128), align 1)
+  ; RV32I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s128)
+  ; RV32I-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+  ; RV32I-NEXT:   $x10 = COPY [[UV]](s32)
+  ; RV32I-NEXT:   $x11 = COPY [[UV1]](s32)
+  ; RV32I-NEXT:   PseudoRET implicit $x10, implicit $x11
+  %2 = trunc i128 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_128i_in_regs( ) {
+  ; ILP32-LABEL: name: caller_128i_in_regs
+  ; ILP32: bb.1 (%ir-block.0):
+  ; ILP32-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32F-LABEL: name: caller_128i_in_regs
+  ; ILP32F: bb.1 (%ir-block.0):
+  ; ILP32F-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32F-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32F-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32F-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32F-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32F-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32F-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32F-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32F-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32F-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32F-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32F-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32D-LABEL: name: caller_128i_in_regs
+  ; ILP32D: bb.1 (%ir-block.0):
+  ; ILP32D-NEXT:   [[C:%[0-9]+]]:_(s128) = G_CONSTANT i128 1
+  ; ILP32D-NEXT:   [[C1:%[0-9]+]]:_(s128) = G_CONSTANT i128 2
+  ; ILP32D-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32D-NEXT:   G_STORE [[C]](s128), [[FRAME_INDEX]](p0) :: (store (s128))
+  ; ILP32D-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32D-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32D-NEXT:   G_STORE [[C1]](s128), [[FRAME_INDEX1]](p0) :: (store (s128))
+  ; ILP32D-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32D-NEXT:   PseudoCALL target-flags(riscv-call) @callee_128i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32D-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32D-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32D-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32D-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32D-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32D-NEXT:   PseudoRET implicit $x10
+  %1 = call i64 @callee_128i_in_regs(i128 1, i128 2)
+  %2 = trunc i64 %1 to i32
+  ret i32 %2
+}
+
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+  ; RV32I-LABEL: name: callee_256i_in_regs
+  ; RV32I: bb.1 (%ir-block.0):
+  ; RV32I-NEXT:   liveins: $x10, $x11
+  ; RV32I-NEXT: {{  $}}
+  ; RV32I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV32I-NEXT:   [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+  ; RV32I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV32I-NEXT:   [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+  ; RV32I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+  ; RV32I-NEXT:   [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[TRUNC]](s64)
+  ; RV32I-NEXT:   $x10 = COPY [[UV]](s32)
+  ; RV32I-NEXT:   $x11 = COPY [[UV1]](s32)
+  ; RV32I-NEXT:   PseudoRET implicit $x10, implicit $x11
+  %2 = trunc i256 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+  ; ILP32-LABEL: name: caller_256i_in_regs
+  ; ILP32: bb.1 (%ir-block.0):
+  ; ILP32-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32F-LABEL: name: caller_256i_in_regs
+  ; ILP32F: bb.1 (%ir-block.0):
+  ; ILP32F-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32F-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32F-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32F-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32F-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32F-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32f_lp64f, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32F-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32F-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32F-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32F-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32F-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32F-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32F-NEXT:   PseudoRET implicit $x10
+  ;
+  ; ILP32D-LABEL: name: caller_256i_in_regs
+  ; ILP32D: bb.1 (%ir-block.0):
+  ; ILP32D-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; ILP32D-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; ILP32D-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; ILP32D-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; ILP32D-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; ILP32D-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; ILP32D-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; ILP32D-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; ILP32D-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32d_lp64d, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10, implicit-def $x11
+  ; ILP32D-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; ILP32D-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $x10
+  ; ILP32D-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $x11
+  ; ILP32D-NEXT:   [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+  ; ILP32D-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[MV]](s64)
+  ; ILP32D-NEXT:   $x10 = COPY [[TRUNC]](s32)
+  ; ILP32D-NEXT:   PseudoRET implicit $x10
+  %1 = call i64 @callee_256i_in_regs(i256 1, i256 2)
+  %2 = trunc i64 %1 to i32
+  ret i32 %2
+}
+
 define i32 @callee_i64_in_regs(i32 %a, i64 %b) nounwind {
   ; RV32I-LABEL: name: callee_i64_in_regs
   ; RV32I: bb.1 (%ir-block.0):
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
index b175b8d92e6c9..0467bbe3d41bb 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/irtranslator/calling-conv-lp64-lp64f-lp64d-common.ll
@@ -16,6 +16,85 @@
 ; Check that on RV64, i128 is passed in a pair of registers. Unlike
 ; the convention for varargs, this need not be an aligned pair.
 
+define i64 @callee_256i_in_regs(i256 %x, i256 %y ) {
+
+  ; RV64I-LABEL: name: callee_256i_in_regs
+  ; RV64I: bb.1 (%ir-block.0):
+  ; RV64I-NEXT:   liveins: $x10, $x11
+  ; RV64I-NEXT: {{  $}}
+  ; RV64I-NEXT:   [[COPY:%[0-9]+]]:_(p0) = COPY $x10
+  ; RV64I-NEXT:   [[LOAD:%[0-9]+]]:_(s256) = G_LOAD [[COPY]](p0) :: (load (s256), align 1)
+  ; RV64I-NEXT:   [[COPY1:%[0-9]+]]:_(p0) = COPY $x11
+  ; RV64I-NEXT:   [[LOAD1:%[0-9]+]]:_(s256) = G_LOAD [[COPY1]](p0) :: (load (s256), align 1)
+  ; RV64I-NEXT:   [[TRUNC:%[0-9]+]]:_(s64) = G_TRUNC [[LOAD]](s256)
+  ; RV64I-NEXT:   $x10 = COPY [[TRUNC]](s64)
+  ; RV64I-NEXT:   PseudoRET implicit $x10
+  %2 = trunc i256 %x to i64
+  ret i64 %2
+}
+
+define i32 @caller_256i_in_regs( ) {
+  ; LP64-LABEL: name: caller_256i_in_regs
+  ; LP64: bb.1 (%ir-block.0):
+  ; LP64-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; LP64-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; LP64-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; LP64-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; LP64-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; LP64-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; LP64-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1
+  ; LP64-NEXT:   G_STORE [[C1]](s256), [[FRAME_INDEX1]](p0) :: (store (s256), align 16)
+  ; LP64-NEXT:   $x11 = COPY [[FRAME_INDEX1]](p0)
+  ; LP64-NEXT:   PseudoCALL target-flags(riscv-call) @callee_256i_in_regs, csr_ilp32_lp64, implicit-def $x1, implicit $x10, implicit $x11, implicit-def $x10
+  ; LP64-NEXT:   ADJCALLSTACKUP 0, 0, implicit-def $x2, implicit $x2
+  ; LP64-NEXT:   [[COPY:%[0-9]+]]:_(s64) = COPY $x10
+  ; LP64-NEXT:   [[TRUNC:%[0-9]+]]:_(s32) = G_TRUNC [[COPY]](s64)
+  ; LP64-NEXT:   [[ANYEXT:%[0-9]+]]:_(s64) = G_ANYEXT [[TRUNC]](s32)
+  ; LP64-NEXT:   $x10 = COPY [[ANYEXT]](s64)
+  ; LP64-NEXT:   PseudoRET implicit $x10
+  ;
+  ; LP64F-LABEL: name: caller_256i_in_regs
+  ; LP64F: bb.1 (%ir-block.0):
+  ; LP64F-NEXT:   [[C:%[0-9]+]]:_(s256) = G_CONSTANT i256 1
+  ; LP64F-NEXT:   [[C1:%[0-9]+]]:_(s256) = G_CONSTANT i256 2
+  ; LP64F-NEXT:   ADJCALLSTACKDOWN 0, 0, implicit-def $x2, implicit $x2
+  ; LP64F-NEXT:   [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0
+  ; LP64F-NEXT:   G_STORE [[C]](s256), [[FRAME_INDEX]](p0) :: (store (s256), align 16)
+  ; LP64F-NEXT:   $x10 = COPY [[FRAME_INDEX]](p0)
+  ; LP64F-NEXT:   [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_IN...
[truncated]

Copy link

github-actions bot commented Jun 13, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@tschuett tschuett requested review from aemerson and arsenm June 13, 2024 16:44
@s-barannikov
Copy link
Contributor

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

@michaelmaitland michaelmaitland self-requested a review June 13, 2024 16:54
@topperc
Copy link
Collaborator

topperc commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

Copy link
Contributor

arsenm commented Jun 13, 2024

The IR has its own ABI, independent of whatever any particular frontend does. You still should handle any IR function signature regardless of what clang does

@spaits
Copy link
Contributor Author

spaits commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

define dso_local noundef i128 @fun(__int128, int)(i128 noundef %i, i32 noundef %a) {
entry:
  %i.addr = alloca i128, align 16
  %a.addr = alloca i32, align 4
  %r = alloca i128, align 16
  store i128 %i, ptr %i.addr, align 16
  store i32 %a, ptr %a.addr, align 4
  %0 = load i128, ptr %i.addr, align 16
  %1 = load i32, ptr %a.addr, align 4
  %sh_prom = zext i32 %1 to i128
  %shl = shl i128 %0, %sh_prom
  store i128 %shl, ptr %r, align 16
  %2 = load i128, ptr %r, align 16
  ret i128 %2
}

In case of large struct, yes, the frontend wants to pass them as a reference. Code: https://godbolt.org/z/4M1cEa5hh
This is the IR after the final step in before the irtranslator:

define dso_local i128 @fun(LargeStruct, int)(%struct.LargeStruct* byval(%struct.LargeStruct) align 8 %0, i32 %a) {
entry:
  %i = alloca %struct.LargeStruct, align 16
  %a.addr = alloca i32, align 4
  %r = alloca i128, align 16
  %1 = bitcast %struct.LargeStruct* %i to i8*
  %2 = bitcast %struct.LargeStruct* %0 to i8*
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 16 %1, i8* align 8 %2, i32 80, i1 false)
  store i32 %a, i32* %a.addr, align 4
  %a1 = getelementptr inbounds %struct.LargeStruct, %struct.LargeStruct* %i, i32 0, i32 0
  %3 = load i128, i128* %a1, align 16
  %4 = load i32, i32* %a.addr, align 4
  %sh_prom = zext i32 %4 to i128
  %shl = shl i128 %3, %sh_prom
  store i128 %shl, i128* %r, align 16
  %5 = load i128, i128* %r, align 16
  ret i128 %5
}

@topperc
Copy link
Collaborator

topperc commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

@spaits
Copy link
Contributor Author

spaits commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

@topperc
Copy link
Collaborator

topperc commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

@spaits
Copy link
Contributor Author

spaits commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

No they don't. But shouldn't we convert the full back-end to globalisel? And if we want to do that we also need to pass 128 bit values I think. I know, that a front-end fix would solve it for GlobalISel, but then that front end fix should be introduced to other front-ends like the Rust front-end, the Zig front-end and so on. So I think fixing this in the back-end is better.

But if you think, that this should really go to the front-end I would be glad to implement that too.

@topperc
Copy link
Collaborator

topperc commented Jun 13, 2024

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

I think we do.

In case of this code https://godbolt.org/z/bqY5Kd5n8 we don't do that in the frontend. This is the IR after the final step in before the irtranslator:

I think this is a bug in clang IRGen. The RISC-V code wasn't written to expect int128 for rv32.

The reason why I did this PR is because one of my teammates was trying to compile Picolib or maybe compiler-rt for RISC-V and it has used 128bit bit shifts. So to compile the projects they needed the -fforce-enable-int128 flag. Plus also the standard talks about this scenario, so I decided to work on the issue.

And they're trying to use GlobalISel too?

No they don't. But shouldn't we convert the full back-end to globalisel? And if we want to do that we also need to pass 128 bit values I think. I know, that a front-end fix would solve it for GlobalISel, but then that front end fix should be introduced to other front-ends like the Rust front-end, the Zig front-end and so on. So I think fixing this in the back-end is better.

But if you think, that this should really go to the front-end I would be glad to implement that too.

Looks like the clang frontend intentionally defers indirect scalar handling to the backend. I thought maybe -fforce-enable-int128 had just been overlooked. I didn't think there were any scalars larger than 2*XLen for rv32 by default. But I forgot about fp128 and __BitInt.

Large __BitInt does force indirect in the frontend. Though which sizes is affected by -fforce-enable-int128 which is probably a separate bug.

@s-barannikov
Copy link
Contributor

Why don't just do the lowering in the frontend? I.e. return getNaturalAlignIndirect(...) if argument's size exceeds 2xXLEN

The IR has its own ABI, independent of whatever any particular frontend does. You still should handle any IR function signature regardless of what clang does

I wrote this assuming that lowering of large integer types is intentionally left to the backend, which doesn't appear so.
If the front end does the lowering, then i128 can only appear on functions if the IR was created by other means (such as middle end optimizations or by an external tool), in which case language ABI rules do not necessarily apply and i128 could as well be passed in registers.
Supporting Indirect flag in GlobalISel is not a bad thing and may be desired by other targets that prefer leaving some lowering to the backend. I just don't see much value in it for RISC-V.

@topperc
Copy link
Collaborator

topperc commented Jun 13, 2024

I wrote this assuming that lowering of large integer types is intentionally left to the backend, which doesn't appear so.

Looks like we intentionally defer the handling for scalars to the backend. Except for _BitInt

@tschuett
Copy link

Could you change in the summary:
Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to
to
Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to

@spaits
Copy link
Contributor Author

spaits commented Jun 14, 2024

Could you change in the summary: Currently I put the code in TargetLowering::handleAssignments and I think it is general enough for all the targets to to Currently I put the code in CallLowering::handleAssignments and I think it is general enough for all the targets to

Updated the description. I have also finished that sentence :) .

Copy link
Contributor

@s-barannikov s-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s-barannikov
Copy link
Contributor

(I'd expect someone from RISC-V team to take a look too.)

@spaits
Copy link
Contributor Author

spaits commented Jun 17, 2024

@michaelmaitland @topperc @preames Could you please also take a look at (or another look at) this PR?

@spaits spaits requested a review from preames June 17, 2024 21:19
Copy link
Contributor

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@spaits spaits requested a review from topperc June 18, 2024 09:13
@spaits
Copy link
Contributor Author

spaits commented Jun 19, 2024

There is already an approve on this PR from the GlobalISel team (@s-barannikov ) and from the RISCV team (@michaelmaitland ). If there is no new review on this PR I will merge it in 12 hours.

@s-barannikov
Copy link
Contributor

from the GlobalISel team (@s-barannikov )

FWIW I am not

@spaits
Copy link
Contributor Author

spaits commented Jun 19, 2024

from the GlobalISel team (@s-barannikov )

FWIW I am not

Sorry for assuming. I thought that you are a GlobalISel team member, because you could help me with lots of stuff regarding it so you must understand the framework and its working very well.

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@spaits spaits merged commit f450408 into llvm:main Jun 19, 2024
7 checks passed
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
Some targets like RISC-V pass scalars wider than 2×XLEN bits by
reference, so those arguments are replaced in the argument list with an
address (See RISC-V ABIs Specification 1.0 section 2.1).

This commit implements this indirect parameter passing in GlobalISel.

---------

Co-authored-by: Gabor Spaits <Gabor.Spaits@hightec-rt.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants