Skip to content

[Clang][CodeGen][X86] don't coerce int128 into {i64,i64} for SysV-like ABIs #135230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

T0b1-iOS
Copy link

Currently, clang coerces (u)int128_t to two i64 IR parameters when they are passed in registers. This leads to broken debug info for them after applying SROA+InstCombine. SROA generates IR like this (godbolt):

define dso_local { i64, i64 } @add(i64 noundef %a.coerce0, i64 noundef %a.coerce1)  {
entry:
  %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128
  %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64
  %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128
  %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext
    #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18)
// ...
!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14)
// ...

and InstCombine then removes the or, moving it into the DIExpression, and the shl at which point the debug info salvaging in Transforms/Local replaces the arguments with poison as it does not allow constants larger than 64 bit in DIExpressions.

I'm working under the assumption that there is interest in fixing this. If not, please tell me.
By not coercing int128_ts into {i64, i64} but keeping them as i128, the debug info stays intact and SelectionDAG then generates two DW_OP_LLVM_fragment expressions for the two corresponding argument registers.

Given that the ABI code for x64 seems to not coerce the argument when it is passed on the stack, it should not lead to any problems keeping it as an i128 when it is passed in registers.

Alternatively, this could be fixed by checking if a constant value fits in 64 bits in the debug info salvaging code and then extending the value on the expression stack to the necessary width. This fixes InstCombine breaking the debug info but then SelectionDAG removes the expression and that seems significantly more complex to debug.

Another fix may be to generate DW_OP_LLVM_fragment expressions when removing the or as it gets marked as disjoint by InstCombine. However, I don't know if the KnownBits information is still available at the time the or gets removed and it would probably require refactoring of the debug info salvaging code as that currently only seems to replace single expressions and is not designed to support generating new debug records.

Converting (u)int128_t arguments to i128 in the IR seems like the simpler solution, if it doesn't cause any ABI issues.

tagging potential reviewers: @nikic @phoebewang @rnk

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. labels Apr 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: None (T0b1-iOS)

Changes

Currently, clang coerces (u)int128_t to two i64 IR parameters when they are passed in registers. This leads to broken debug info for them after applying SROA+InstCombine. SROA generates IR like this (godbolt):

define dso_local { i64, i64 } @<!-- -->add(i64 noundef %a.coerce0, i64 noundef %a.coerce1)  {
entry:
  %a.sroa.2.0.insert.ext = zext i64 %a.coerce1 to i128
  %a.sroa.2.0.insert.shift = shl nuw i128 %a.sroa.2.0.insert.ext, 64
  %a.sroa.0.0.insert.ext = zext i64 %a.coerce0 to i128
  %a.sroa.0.0.insert.insert = or i128 %a.sroa.2.0.insert.shift, %a.sroa.0.0.insert.ext
    #dbg_value(i128 %a.sroa.0.0.insert.insert, !17, !DIExpression(), !18)
// ...
!17 = !DILocalVariable(name: "a", arg: 1, scope: !10, file: !11, line: 1, type: !14)
// ...

and InstCombine then removes the or, moving it into the DIExpression, and the shl at which point the debug info salvaging in Transforms/Local replaces the arguments with poison as it does not allow constants larger than 64 bit in DIExpressions.

I'm working under the assumption that there is interest in fixing this. If not, please tell me.
By not coercing int128_ts into {i64, i64} but keeping them as i128, the debug info stays intact and SelectionDAG then generates two DW_OP_LLVM_fragment expressions for the two corresponding argument registers.

Given that the ABI code for x64 seems to not coerce the argument when it is passed on the stack, it should not lead to any problems keeping it as an i128 when it is passed in registers.

Alternatively, this could be fixed by checking if a constant value fits in 64 bits in the debug info salvaging code and then extending the value on the expression stack to the necessary width. This fixes InstCombine breaking the debug info but then SelectionDAG removes the expression and that seems significantly more complex to debug.

Another fix may be to generate DW_OP_LLVM_fragment expressions when removing the or as it gets marked as disjoint by InstCombine. However, I don't know if the KnownBits information is still available at the time the or gets removed and it would probably require refactoring of the debug info salvaging code as that currently only seems to replace single expressions and is not designed to support generating new debug records.

Converting (u)int128_t arguments to i128 in the IR seems like the simpler solution, if it doesn't cause any ABI issues.

tagging potential reviewers: @nikic @phoebewang @rnk


Full diff: https://github.com/llvm/llvm-project/pull/135230.diff

6 Files Affected:

  • (modified) clang/lib/CodeGen/Targets/X86.cpp (+18)
  • (added) clang/test/CodeGen/X86/i128-debuginfo.c (+10)
  • (modified) clang/test/CodeGen/X86/x86_64-arguments.c (+21)
  • (modified) clang/test/CodeGen/alloc-align-attr.c (+17-41)
  • (modified) clang/test/CodeGen/builtins.c (+3-15)
  • (modified) clang/test/CodeGen/extend-arg-64.c (+1-1)
diff --git a/clang/lib/CodeGen/Targets/X86.cpp b/clang/lib/CodeGen/Targets/X86.cpp
index b36a6e1396653..aa4ab1c2e5490 100644
--- a/clang/lib/CodeGen/Targets/X86.cpp
+++ b/clang/lib/CodeGen/Targets/X86.cpp
@@ -2595,6 +2595,14 @@ GetX86_64ByValArgumentPair(llvm::Type *Lo, llvm::Type *Hi,
 
 ABIArgInfo X86_64ABIInfo::
 classifyReturnType(QualType RetTy) const {
+  // return int128 as i128
+  if (const BuiltinType *BT = RetTy->getAs<BuiltinType>()) {
+    BuiltinType::Kind k = BT->getKind();
+    if (k == BuiltinType::Int128 || k == BuiltinType::UInt128) {
+      return ABIArgInfo::getDirect();
+    }
+  }
+
   // AMD64-ABI 3.2.3p4: Rule 1. Classify the return type with the
   // classification algorithm.
   X86_64ABIInfo::Class Lo, Hi;
@@ -2727,6 +2735,16 @@ X86_64ABIInfo::classifyArgumentType(QualType Ty, unsigned freeIntRegs,
                                     bool isNamedArg, bool IsRegCall) const {
   Ty = useFirstFieldIfTransparentUnion(Ty);
 
+  // represent int128 as i128
+  if (const BuiltinType *BT = Ty->getAs<BuiltinType>()) {
+    BuiltinType::Kind k = BT->getKind();
+    if (k == BuiltinType::Int128 || k == BuiltinType::UInt128) {
+      neededInt = 2;
+      neededSSE = 0;
+      return ABIArgInfo::getDirect();
+    }
+  }
+
   X86_64ABIInfo::Class Lo, Hi;
   classify(Ty, 0, Lo, Hi, isNamedArg, IsRegCall);
 
diff --git a/clang/test/CodeGen/X86/i128-debuginfo.c b/clang/test/CodeGen/X86/i128-debuginfo.c
new file mode 100644
index 0000000000000..4b865c1bed9f0
--- /dev/null
+++ b/clang/test/CodeGen/X86/i128-debuginfo.c
@@ -0,0 +1,10 @@
+// no autogeneration since update_cc_test_checks does not support -g
+// RUN: %clang_cc1 -triple x86_64-pc-linux -O1 -debug-info-kind=limited -emit-llvm -o - %s | FileCheck %s
+
+// CHECK-LABEL: define{{.*}} i128 @add(i128 noundef %a)
+// CHECK: #dbg_value(i128 %a, ![[DI:.*]], !DIExpression()
+__int128_t add(__int128_t a) {
+  return a + a;
+}
+
+// CHECK: ![[DI]] = !DILocalVariable(name: "a", arg: 1
diff --git a/clang/test/CodeGen/X86/x86_64-arguments.c b/clang/test/CodeGen/X86/x86_64-arguments.c
index 82845f0a2b31f..fd73fbaa8e54f 100644
--- a/clang/test/CodeGen/X86/x86_64-arguments.c
+++ b/clang/test/CodeGen/X86/x86_64-arguments.c
@@ -551,6 +551,27 @@ struct s68 {
 void f68(struct s68 x) {
 }
 
+// CHECK-LABEL: define{{.*}} i128 @f69(i128 noundef %a)
+__int128_t f69(__int128_t a) {
+  return a;
+}
+
+// CHECK-LABEL: define{{.*}} i128 @f70(i128 noundef %a)
+__uint128_t f70(__uint128_t a) {
+  return a;
+}
+
+// check that registers are correctly counted for (u)int128_t arguments
+struct s71 {
+  long long a, b;
+};
+// CHECK-LABEL: define{{.*}} void @f71(i128 noundef %a, i128 noundef %b, i64 noundef %c, ptr noundef byval(%struct.s71) align 8 %d)
+void f71(__int128_t a, __int128_t b, long long c, struct s71 d) {
+}
+// CHECK-LABEL: define{{.*}} void @f72(i128 noundef %a, i128 noundef %b, i64 %d.coerce0, i64 %d.coerce1)
+void f72(__int128_t a, __int128_t b, struct s71 d) {
+}
+
 /// The synthesized __va_list_tag does not have file/line fields.
 // CHECK:      = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "__va_list_tag",
 // CHECK-NOT:  file:
diff --git a/clang/test/CodeGen/alloc-align-attr.c b/clang/test/CodeGen/alloc-align-attr.c
index 76e5d1041b19f..c4c4e76eaaa04 100644
--- a/clang/test/CodeGen/alloc-align-attr.c
+++ b/clang/test/CodeGen/alloc-align-attr.c
@@ -70,66 +70,42 @@ __INT32_TYPE__ test4(__SIZE_TYPE__ a) {
 
 struct Empty {};
 struct MultiArgs { __INT64_TYPE__ a, b;};
-// Struct parameter doesn't take up an IR parameter, 'i' takes up 2.
+// Struct parameter doesn't take up an IR parameter, 'i' takes up 1.
 // Truncation to i64 is permissible, since alignments of greater than 2^64 are insane.
 __INT32_TYPE__ *m3(struct Empty s, __int128_t i) __attribute__((alloc_align(2)));
 // CHECK-LABEL: @test5(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[A:%.*]] = alloca i128, align 16
 // CHECK-NEXT:    [[A_ADDR:%.*]] = alloca i128, align 16
 // CHECK-NEXT:    [[E:%.*]] = alloca [[STRUCT_EMPTY:%.*]], align 1
-// CHECK-NEXT:    [[COERCE:%.*]] = alloca i128, align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 0
-// CHECK-NEXT:    store i64 [[A_COERCE0:%.*]], ptr [[TMP0]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 1
-// CHECK-NEXT:    store i64 [[A_COERCE1:%.*]], ptr [[TMP1]], align 8
-// CHECK-NEXT:    [[A1:%.*]] = load i128, ptr [[A]], align 16
-// CHECK-NEXT:    store i128 [[A1]], ptr [[A_ADDR]], align 16
-// CHECK-NEXT:    [[TMP2:%.*]] = load i128, ptr [[A_ADDR]], align 16
-// CHECK-NEXT:    store i128 [[TMP2]], ptr [[COERCE]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 0
-// CHECK-NEXT:    [[TMP4:%.*]] = load i64, ptr [[TMP3]], align 16
-// CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 1
-// CHECK-NEXT:    [[TMP6:%.*]] = load i64, ptr [[TMP5]], align 8
-// CHECK-NEXT:    [[CALL:%.*]] = call ptr @m3(i64 noundef [[TMP4]], i64 noundef [[TMP6]])
-// CHECK-NEXT:    [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP2]] to i64
+// CHECK-NEXT:    store i128 [[A:%.*]], ptr [[A_ADDR]], align 16
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr [[A_ADDR]], align 16
+// CHECK-NEXT:    [[CALL:%.*]] = call ptr @m3(i128 noundef [[TMP0]])
+// CHECK-NEXT:    [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP0]] to i64
 // CHECK-NEXT:    call void @llvm.assume(i1 true) [ "align"(ptr [[CALL]], i64 [[CASTED_ALIGN]]) ]
-// CHECK-NEXT:    [[TMP7:%.*]] = load i32, ptr [[CALL]], align 4
-// CHECK-NEXT:    ret i32 [[TMP7]]
+// CHECK-NEXT:    [[TMP1:%.*]] = load i32, ptr [[CALL]], align 4
+// CHECK-NEXT:    ret i32 [[TMP1]]
 //
 __INT32_TYPE__ test5(__int128_t a) {
   struct Empty e;
   return *m3(e, a);
 }
-// Struct parameter takes up 2 parameters, 'i' takes up 2.
+// Struct parameter takes up 2 parameters, 'i' takes up 1.
 __INT32_TYPE__ *m4(struct MultiArgs s, __int128_t i) __attribute__((alloc_align(2)));
 // CHECK-LABEL: @test6(
 // CHECK-NEXT:  entry:
-// CHECK-NEXT:    [[A:%.*]] = alloca i128, align 16
 // CHECK-NEXT:    [[A_ADDR:%.*]] = alloca i128, align 16
 // CHECK-NEXT:    [[E:%.*]] = alloca [[STRUCT_MULTIARGS:%.*]], align 8
-// CHECK-NEXT:    [[COERCE:%.*]] = alloca i128, align 16
-// CHECK-NEXT:    [[TMP0:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 0
-// CHECK-NEXT:    store i64 [[A_COERCE0:%.*]], ptr [[TMP0]], align 16
-// CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[A]], i32 0, i32 1
-// CHECK-NEXT:    store i64 [[A_COERCE1:%.*]], ptr [[TMP1]], align 8
-// CHECK-NEXT:    [[A1:%.*]] = load i128, ptr [[A]], align 16
-// CHECK-NEXT:    store i128 [[A1]], ptr [[A_ADDR]], align 16
-// CHECK-NEXT:    [[TMP2:%.*]] = load i128, ptr [[A_ADDR]], align 16
-// CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 0
+// CHECK-NEXT:    store i128 [[A:%.*]], ptr [[A_ADDR]], align 16
+// CHECK-NEXT:    [[TMP0:%.*]] = load i128, ptr [[A_ADDR]], align 16
+// CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 0
+// CHECK-NEXT:    [[TMP2:%.*]] = load i64, ptr [[TMP1]], align 8
+// CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 1
 // CHECK-NEXT:    [[TMP4:%.*]] = load i64, ptr [[TMP3]], align 8
-// CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[E]], i32 0, i32 1
-// CHECK-NEXT:    [[TMP6:%.*]] = load i64, ptr [[TMP5]], align 8
-// CHECK-NEXT:    store i128 [[TMP2]], ptr [[COERCE]], align 16
-// CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 0
-// CHECK-NEXT:    [[TMP8:%.*]] = load i64, ptr [[TMP7]], align 16
-// CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds nuw { i64, i64 }, ptr [[COERCE]], i32 0, i32 1
-// CHECK-NEXT:    [[TMP10:%.*]] = load i64, ptr [[TMP9]], align 8
-// CHECK-NEXT:    [[CALL:%.*]] = call ptr @m4(i64 [[TMP4]], i64 [[TMP6]], i64 noundef [[TMP8]], i64 noundef [[TMP10]])
-// CHECK-NEXT:    [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP2]] to i64
+// CHECK-NEXT:    [[CALL:%.*]] = call ptr @m4(i64 [[TMP2]], i64 [[TMP4]], i128 noundef [[TMP0]])
+// CHECK-NEXT:    [[CASTED_ALIGN:%.*]] = trunc i128 [[TMP0]] to i64
 // CHECK-NEXT:    call void @llvm.assume(i1 true) [ "align"(ptr [[CALL]], i64 [[CASTED_ALIGN]]) ]
-// CHECK-NEXT:    [[TMP11:%.*]] = load i32, ptr [[CALL]], align 4
-// CHECK-NEXT:    ret i32 [[TMP11]]
+// CHECK-NEXT:    [[TMP5:%.*]] = load i32, ptr [[CALL]], align 4
+// CHECK-NEXT:    ret i32 [[TMP5]]
 //
 __INT32_TYPE__ test6(__int128_t a) {
   struct MultiArgs e;
diff --git a/clang/test/CodeGen/builtins.c b/clang/test/CodeGen/builtins.c
index eda6c67fdad00..aa9965b815983 100644
--- a/clang/test/CodeGen/builtins.c
+++ b/clang/test/CodeGen/builtins.c
@@ -956,36 +956,24 @@ void test_builtin_os_log_errno(void) {
 void test_builtin_os_log_long_double(void *buf, long double ld) {
   // CHECK: %[[BUF_ADDR:.*]] = alloca ptr, align 8
   // CHECK: %[[LD_ADDR:.*]] = alloca x86_fp80, align 16
-  // CHECK: %[[COERCE:.*]] = alloca i128, align 16
   // CHECK: store ptr %[[BUF]], ptr %[[BUF_ADDR]], align 8
   // CHECK: store x86_fp80 %[[LD]], ptr %[[LD_ADDR]], align 16
   // CHECK: %[[V0:.*]] = load ptr, ptr %[[BUF_ADDR]], align 8
   // CHECK: %[[V1:.*]] = load x86_fp80, ptr %[[LD_ADDR]], align 16
   // CHECK: %[[V2:.*]] = bitcast x86_fp80 %[[V1]] to i80
   // CHECK: %[[V3:.*]] = zext i80 %[[V2]] to i128
-  // CHECK: store i128 %[[V3]], ptr %[[COERCE]], align 16
-  // CHECK: %[[V5:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[COERCE]], i32 0, i32 0
-  // CHECK: %[[V6:.*]] = load i64, ptr %[[V5]], align 16
-  // CHECK: %[[V7:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[COERCE]], i32 0, i32 1
-  // CHECK: %[[V8:.*]] = load i64, ptr %[[V7]], align 8
-  // CHECK: call void @__os_log_helper_1_0_1_16_0(ptr noundef %[[V0]], i64 noundef %[[V6]], i64 noundef %[[V8]])
+  // CHECK: call void @__os_log_helper_1_0_1_16_0(ptr noundef %[[V0]], i128 noundef %[[V3]])
 
   __builtin_os_log_format(buf, "%Lf", ld);
 }
 
 // CHECK-LABEL: define linkonce_odr hidden void @__os_log_helper_1_0_1_16_0
-// CHECK: (ptr noundef %[[BUFFER:.*]], i64 noundef %[[ARG0_COERCE0:.*]], i64 noundef %[[ARG0_COERCE1:.*]])
+// CHECK: (ptr noundef %[[BUFFER:.*]], i128 noundef %[[ARG0:.*]])
 
-// CHECK: %[[ARG0:.*]] = alloca i128, align 16
 // CHECK: %[[BUFFER_ADDR:.*]] = alloca ptr, align 8
 // CHECK: %[[ARG0_ADDR:.*]] = alloca i128, align 16
-// CHECK: %[[V1:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[ARG0]], i32 0, i32 0
-// CHECK: store i64 %[[ARG0_COERCE0]], ptr %[[V1]], align 16
-// CHECK: %[[V2:.*]] = getelementptr inbounds nuw { i64, i64 }, ptr %[[ARG0]], i32 0, i32 1
-// CHECK: store i64 %[[ARG0_COERCE1]], ptr %[[V2]], align 8
-// CHECK: %[[ARG01:.*]] = load i128, ptr %[[ARG0]], align 16
 // CHECK: store ptr %[[BUFFER]], ptr %[[BUFFER_ADDR]], align 8
-// CHECK: store i128 %[[ARG01]], ptr %[[ARG0_ADDR]], align 16
+// CHECK: store i128 %[[ARG0]], ptr %[[ARG0_ADDR]], align 16
 // CHECK: %[[BUF:.*]] = load ptr, ptr %[[BUFFER_ADDR]], align 8
 // CHECK: %[[SUMMARY:.*]] = getelementptr i8, ptr %[[BUF]], i64 0
 // CHECK: store i8 0, ptr %[[SUMMARY]], align 1
diff --git a/clang/test/CodeGen/extend-arg-64.c b/clang/test/CodeGen/extend-arg-64.c
index 2cb56d35af21d..8b99c01807ecc 100644
--- a/clang/test/CodeGen/extend-arg-64.c
+++ b/clang/test/CodeGen/extend-arg-64.c
@@ -84,7 +84,7 @@ int test(void) {
 #ifdef D128
   knr(i128);
   // CHECKEXT: load i128
-  // CHECKEXT: call{{.*}} void (i64, i64, ...) @knr
+  // CHECKEXT: call{{.*}} void (i128, ...) @knr
 #endif
 
   knr(u32, s32, u16, s16, u8, s8);

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked the implementation, but conceptually this looks fine. Passing as i128 instead of {i64, i64} is better for optimization purposes. Rust already passes these as i128, so we know that works, and the change will improve x-lang LTO. (Clang also passes as i128 already, but only for ones that are known to be passed in memory -- go figure.)

@nikic nikic requested review from rnk and efriedma-quic April 10, 2025 19:04
@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Apr 10, 2025

There isn't any reason we can't do this... but it doesn't really solve the issue in general. And there are a lot of other ways to trip over this. CC @dwblaikie @hvdijk @jmorse

Do we want to try to handle _BitInt, or a struct containing an i128?

@hvdijk
Copy link
Contributor

hvdijk commented Apr 10, 2025

Both agreed with @nikic that this PR is a good idea and with @efriedma-quic that this does not solve the problem in general. I don't think the latter should stop this from being merged but it would be nice to find something that addresses all cases.

Another fix may be to generate DW_OP_LLVM_fragment expressions when removing the or

Maybe a bad suggestion, I have not looked into the implementation, but at the point where the i128 gets split into two i64s, could we already update the #dbg_value there to use fragments? At that point we have all the required information, and it sounds like it would avoid the problem since there is then no longer any larger-than-i64 debug info to cause poison?

return ABIArgInfo::getDirect();
}
}

// AMD64-ABI 3.2.3p4: Rule 1. Classify the return type with the
// classification algorithm.
X86_64ABIInfo::Class Lo, Hi;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably remove the handling for int128 in classify. It's better to use classify as the single entry, but I can't get any idea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to handle i128 in structs.

We could maybe extend GetINTEGERTypeAtOffset to allow it to return i128 when appropriate. (That code is pretty fragile, though; see #76017.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you could make GetINTEGERTypeAtOffset return an i128 if it sees an i128 as the IRType and IROffset is 0 but then you need to check if it did that in classifyArgumentType and return immediately or make GetINTEGERTypeAtOffset return nullptr if the IROffset is >0 to prevent it from creating an aggregate argument? The latter seems like something one could easily trip over though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like that, yes. Or maybe it's simpler to do it in a separate function; whatever's more readable.

@jmorse
Copy link
Member

jmorse commented Apr 11, 2025

There's precedent for splitting the debug-info into fragments early, we do something similar for C++ structured-bindings in SROA and a very similar process happens when the i128 argument is split up in instruction-selection.

Split-up locations become brittle to optimisations though: any source-expression that gets salvaged into a DIExpression can't be decomposed into two fragments, and we have to mark them optimised out. Any salvaging into DIExpression for an i128 will almost certainly fail when the value gets split up in isel. Plus I believe DWARF doesn't support values on the expression stack larger than the machine-address-size, but that's an assumption that can be bent.

Presumably the desire here is to improve the variable availability of parameters rather than fully supporting i128's through LLVM debug-info? If so, the least invasive approach may be to teach SROA to decompose into fragments similar to structured-bindings for the first variable location in the function. Or, passing an i128 instead works alright, I have no opinion on ABI matters.

(Paging @OCHyams as he worked on structured bindings; and @slinder1 as this is another scenario that might be easier with a typed expression stack)

@T0b1-iOS
Copy link
Author

Maybe a bad suggestion, I have not looked into the implementation, but at the point where the i128 gets split into two i64s, could we already update the #dbg_value there to use fragments? At that point we have all the required information, and it sounds like it would avoid the problem since there is then no longer any larger-than-i64 debug info to cause poison?

I think would be possible but currently the code that inserts the #dbg_declare records (CodeGenFunction::EmitParmDecl) is oblivious to the splitting. It only gets handed the i128 value that is loaded from the temporary {i64, i64} alloca (created in CodeGenFunction::EmitFunctionProlog) where both i64 arguments are stored into. Thus you would need to add debug handling code into the part that creates the temporary alloca which would require a refactoring of the way debug info metadata is created because the code responsible for that currently creates the #dbg_declare records together with the metadata (c.f. CGDebugInfo::EmitDeclare).
Passing as i128 seems like the better choice in that case.

To solve this problem in general, it would probably(?) be best to make SROA create fragment #dbg_value records if it decomposes an alloca into multiple IR values but I have no idea how difficult that would be to accomplish.

Presumably the desire here is to improve the variable availability of parameters rather than fully supporting i128's through LLVM debug-info

Yes, at least my primary motivation is for me to be able to retrieve the registers the parameter is passed in without having to rely on any crude heuristics.

@jmorse
Copy link
Member

jmorse commented Apr 23, 2025

To solve this problem in general, it would probably(?) be best to make SROA create fragment #dbg_value records if it decomposes an alloca into multiple IR values but I have no idea how difficult that would be to accomplish.

An example of this happening today with C++ structured bindings is here: https://godbolt.org/z/rn3os13M7 . Specifically, the i64 return value from "f", a squashed-together pair of i32s, is decomposed into two #dbg_values corresponding to the two variables that the structured-binding assigns to. Each #dbg_value refers to different IR Values, which seems similar to what's desired here. The major difference is that structured bindings have independent source variables contained within one alloca; wheras the situation here is one source variable being split into fragments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants