Skip to content

LowerTypeTests: Shrink check size by 1 instruction on x86. #142887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/pcc/spr/main.lowertypetests-shrink-check-size-by-1-instruction-on-x86
Choose a base branch
from

Conversation

pcc
Copy link
Contributor

@pcc pcc commented Jun 5, 2025

We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:

lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx # compute address - base
ror $0x3, %rcx # compute (address - base) ror 3 i.e. index
cmp $0x4, %rcx # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1

A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:

lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax # compute last element - address
ror $0x3, %rax # compute (last element - address) ror 3 i.e. 4 - index
cmp $0x4, %rax # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1

Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.

Created using spr 1.3.6-beta.1
@pcc pcc requested a review from fmayer June 5, 2025 02:11
@llvmbot
Copy link
Member

llvmbot commented Jun 5, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Peter Collingbourne (pcc)

Changes

We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:

lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx # compute address - base
ror $0x3, %rcx # compute (address - base) ror 3 i.e. index
cmp $0x4, %rcx # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1

A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:

lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax # compute last element - address
ror $0x3, %rax # compute (last element - address) ror 3 i.e. 4 - index
cmp $0x4, %rax # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1

Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.


Patch is 28.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142887.diff

13 Files Affected:

  • (modified) llvm/lib/Transforms/IPO/LowerTypeTests.cpp (+19-7)
  • (modified) llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll (+1-1)
  • (modified) llvm/test/Transforms/LowerTypeTests/export-allones.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerTypeTests/export-bytearray.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerTypeTests/export-icall.ll (+1-1)
  • (modified) llvm/test/Transforms/LowerTypeTests/export-inline.ll (+2-2)
  • (modified) llvm/test/Transforms/LowerTypeTests/function.ll (+20-22)
  • (modified) llvm/test/Transforms/LowerTypeTests/import.ll (+12-12)
  • (modified) llvm/test/Transforms/LowerTypeTests/simple.ll (+4-4)
  • (modified) llvm/test/Transforms/LowerTypeTests/simplify.ll (+1-1)
  • (modified) llvm/test/Transforms/MergeFunc/cfi-thunk-merging.ll (+1-1)
  • (modified) llvm/test/Transforms/SimplifyTypeTests/basic.ll (+1-1)
  • (modified) llvm/unittests/Transforms/IPO/LowerTypeTests.cpp (+4-4)
diff --git a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
index edddc52fa950f..156f26da982e7 100644
--- a/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
+++ b/llvm/lib/Transforms/IPO/LowerTypeTests.cpp
@@ -143,7 +143,7 @@ bool BitSetInfo::containsGlobalOffset(uint64_t Offset) const {
   if (BitOffset >= BitSize)
     return false;
 
-  return Bits.count(BitOffset);
+  return Bits.count(BitSize - 1 - BitOffset);
 }
 
 void BitSetInfo::print(raw_ostream &OS) const {
@@ -188,7 +188,11 @@ BitSetInfo BitSetBuilder::build() {
   BSI.BitSize = ((Max - Min) >> BSI.AlignLog2) + 1;
   for (uint64_t Offset : Offsets) {
     Offset >>= BSI.AlignLog2;
-    BSI.Bits.insert(Offset);
+    // We invert the order of bits when adding them to the bitset. This is
+    // because the offset that we test against is computed by subtracting the
+    // address that we are testing from the global's address, which means that
+    // the offset increases as the tested address decreases.
+    BSI.Bits.insert(BSI.BitSize - 1 - Offset);
   }
 
   return BSI;
@@ -465,7 +469,8 @@ class LowerTypeTestsModule {
   struct TypeIdLowering {
     TypeTestResolution::Kind TheKind = TypeTestResolution::Unsat;
 
-    /// All except Unsat: the start address within the combined global.
+    /// All except Unsat: the address of the last element within the combined
+    /// global.
     Constant *OffsetedGlobal;
 
     /// ByteArray, Inline, AllOnes: log2 of the required global alignment
@@ -772,7 +777,11 @@ Value *LowerTypeTestsModule::lowerTypeTestCall(Metadata *TypeId, CallInst *CI,
   if (TIL.TheKind == TypeTestResolution::Single)
     return B.CreateICmpEQ(PtrAsInt, OffsetedGlobalAsInt);
 
-  Value *PtrOffset = B.CreateSub(PtrAsInt, OffsetedGlobalAsInt);
+  // Here we compute `last element - address`. The reason why we do this instead
+  // of computing `address - first element` is that it leads to a slightly
+  // shorter instruction sequence on x86. Because it doesn't matter how we do
+  // the subtraction on other architectures, we do so unconditionally.
+  Value *PtrOffset = B.CreateSub(OffsetedGlobalAsInt, PtrAsInt);
 
   // We need to check that the offset both falls within our range and is
   // suitably aligned. We can check both properties at the same time by
@@ -1154,8 +1163,11 @@ void LowerTypeTestsModule::lowerTypeTestCalls(
 
     ByteArrayInfo *BAI = nullptr;
     TypeIdLowering TIL;
+
+    uint64_t GlobalOffset =
+        BSI.ByteOffset + ((BSI.BitSize - 1) << BSI.AlignLog2);
     TIL.OffsetedGlobal = ConstantExpr::getGetElementPtr(
-        Int8Ty, CombinedGlobalAddr, ConstantInt::get(IntPtrTy, BSI.ByteOffset)),
+        Int8Ty, CombinedGlobalAddr, ConstantInt::get(IntPtrTy, GlobalOffset)),
     TIL.AlignLog2 = ConstantInt::get(IntPtrTy, BSI.AlignLog2);
     TIL.SizeM1 = ConstantInt::get(IntPtrTy, BSI.BitSize - 1);
     if (BSI.isAllOnes()) {
@@ -2531,9 +2543,9 @@ PreservedAnalyses SimplifyTypeTestsPass::run(Module &M,
         continue;
       for (Use &U : make_early_inc_range(CE->uses())) {
         auto *CE = dyn_cast<ConstantExpr>(U.getUser());
-        if (U.getOperandNo() == 1 && CE &&
+        if (U.getOperandNo() == 0 && CE &&
             CE->getOpcode() == Instruction::Sub &&
-            MaySimplifyInt(CE->getOperand(0))) {
+            MaySimplifyInt(CE->getOperand(1))) {
           // This is a computation of PtrOffset as generated by
           // LowerTypeTestsModule::lowerTypeTestCall above. If
           // isKnownTypeIdMember passes we just pretend it evaluated to 0. This
diff --git a/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll b/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
index c932236dffacb..8a90174bb3ff1 100644
--- a/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
+++ b/llvm/test/Transforms/LowerTypeTests/aarch64-jumptable.ll
@@ -41,7 +41,7 @@ define i1 @foo(ptr %p) {
 ; AARCH64-LABEL: define i1 @foo
 ; AARCH64-SAME: (ptr [[P:%.*]]) {
 ; AARCH64-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; AARCH64-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @.cfi.jumptable to i64)
+; AARCH64-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr getelementptr (i8, ptr @.cfi.jumptable, i64 8) to i64), [[TMP1]]
 ; AARCH64-NEXT:    [[TMP3:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 3)
 ; AARCH64-NEXT:    [[TMP4:%.*]] = icmp ule i64 [[TMP3]], 1
 ; AARCH64-NEXT:    ret i1 [[TMP4]]
diff --git a/llvm/test/Transforms/LowerTypeTests/export-allones.ll b/llvm/test/Transforms/LowerTypeTests/export-allones.ll
index 908c9320b039a..b5b4f8c5360c1 100644
--- a/llvm/test/Transforms/LowerTypeTests/export-allones.ll
+++ b/llvm/test/Transforms/LowerTypeTests/export-allones.ll
@@ -141,11 +141,11 @@
 
 ; CHECK: [[G:@[0-9]+]] = private constant { [2048 x i8] } zeroinitializer
 
-; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, ptr [[G]]
+; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 2)
 ; X86: @__typeid_typeid1_align = hidden alias i8, inttoptr (i64 1 to ptr)
 ; X86: @__typeid_typeid1_size_m1 = hidden alias i8, inttoptr (i64 1 to ptr)
 
-; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 4)
+; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 516)
 ; X86: @__typeid_typeid2_align = hidden alias i8, inttoptr (i64 2 to ptr)
 ; X86: @__typeid_typeid2_size_m1 = hidden alias i8, inttoptr (i64 128 to ptr)
 
diff --git a/llvm/test/Transforms/LowerTypeTests/export-bytearray.ll b/llvm/test/Transforms/LowerTypeTests/export-bytearray.ll
index 0ef9f584f767c..89b3f0663be1d 100644
--- a/llvm/test/Transforms/LowerTypeTests/export-bytearray.ll
+++ b/llvm/test/Transforms/LowerTypeTests/export-bytearray.ll
@@ -14,13 +14,13 @@
 ; CHECK: [[G:@[0-9]+]] = private constant { [2048 x i8] } zeroinitializer
 ; CHECK: [[B:@[0-9]+]] = private constant [258 x i8] c"\03\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\02\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\01"
 
-; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, ptr [[G]]
+; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 130)
 ; X86: @__typeid_typeid1_align = hidden alias i8, inttoptr (i64 1 to ptr)
 ; X86: @__typeid_typeid1_size_m1 = hidden alias i8, inttoptr (i64 65 to ptr)
 ; CHECK: @__typeid_typeid1_byte_array = hidden alias i8, ptr @bits.1
 ; X86: @__typeid_typeid1_bit_mask = hidden alias i8, inttoptr (i8 2 to ptr)
 
-; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 4)
+; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 1032)
 ; X86: @__typeid_typeid2_align = hidden alias i8, inttoptr (i64 2 to ptr)
 ; X86: @__typeid_typeid2_size_m1 = hidden alias i8, inttoptr (i64 257 to ptr)
 ; CHECK: @__typeid_typeid2_byte_array = hidden alias i8, ptr @bits
diff --git a/llvm/test/Transforms/LowerTypeTests/export-icall.ll b/llvm/test/Transforms/LowerTypeTests/export-icall.ll
index abd4097865907..47156deb57de7 100644
--- a/llvm/test/Transforms/LowerTypeTests/export-icall.ll
+++ b/llvm/test/Transforms/LowerTypeTests/export-icall.ll
@@ -36,7 +36,7 @@ define void @f3(i32 %x) !type !8 {
 !8 = !{i64 0, !"typeid3"}
 
 
-; CHECK-DAG: @__typeid_typeid1_global_addr = hidden alias i8, ptr [[JT1:.*]]
+; CHECK-DAG: @__typeid_typeid1_global_addr = hidden alias i8, getelementptr (i8, ptr [[JT1:.*]], i64 32)
 ; CHECK-DAG: @__typeid_typeid1_align = hidden alias i8, inttoptr (i64 3 to ptr)
 ; CHECK-DAG: @__typeid_typeid1_size_m1 = hidden alias i8, inttoptr (i64 4 to ptr)
 
diff --git a/llvm/test/Transforms/LowerTypeTests/export-inline.ll b/llvm/test/Transforms/LowerTypeTests/export-inline.ll
index 23d96a0c86840..956f0e3bfbbf1 100644
--- a/llvm/test/Transforms/LowerTypeTests/export-inline.ll
+++ b/llvm/test/Transforms/LowerTypeTests/export-inline.ll
@@ -13,12 +13,12 @@
 
 ; CHECK: [[G:@[0-9]+]] = private constant { [2048 x i8] } zeroinitializer
 
-; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, ptr [[G]]
+; CHECK: @__typeid_typeid1_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 6)
 ; CHECK-X86: @__typeid_typeid1_align = hidden alias i8, inttoptr (i8 1 to ptr)
 ; CHECK-X86: @__typeid_typeid1_size_m1 = hidden alias i8, inttoptr (i64 3 to ptr)
 ; CHECK-X86: @__typeid_typeid1_inline_bits = hidden alias i8, inttoptr (i32 9 to ptr)
 
-; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 4)
+; CHECK: @__typeid_typeid2_global_addr = hidden alias i8, getelementptr (i8, ptr [[G]], i64 136)
 ; CHECK-X86: @__typeid_typeid2_align = hidden alias i8, inttoptr (i8 2 to ptr)
 ; CHECK-X86: @__typeid_typeid2_size_m1 = hidden alias i8, inttoptr (i64 33 to ptr)
 ; CHECK-X86: @__typeid_typeid2_inline_bits = hidden alias i8, inttoptr (i64 8589934593 to ptr)
diff --git a/llvm/test/Transforms/LowerTypeTests/function.ll b/llvm/test/Transforms/LowerTypeTests/function.ll
index f80e99ebfba2c..5b0852c82ea68 100644
--- a/llvm/test/Transforms/LowerTypeTests/function.ll
+++ b/llvm/test/Transforms/LowerTypeTests/function.ll
@@ -1,21 +1,21 @@
-; RUN: opt -S -passes=lowertypetests -mtriple=i686-unknown-linux-gnu %s | FileCheck --check-prefixes=X86,X86-LINUX,NATIVE %s
-; RUN: opt -S -passes=lowertypetests -mtriple=x86_64-unknown-linux-gnu %s | FileCheck --check-prefixes=X86,X86-LINUX,NATIVE %s
-; RUN: opt -S -passes=lowertypetests -mtriple=i686-pc-win32 %s | FileCheck --check-prefixes=X86,X86-WIN32,NATIVE %s
-; RUN: opt -S -passes=lowertypetests -mtriple=x86_64-pc-win32 %s | FileCheck --check-prefixes=X86,X86-WIN32,NATIVE %s
-; RUN: opt -S -passes=lowertypetests -mtriple=riscv32-unknown-linux-gnu %s | FileCheck --check-prefixes=RISCV,NATIVE %s
-; RUN: opt -S -passes=lowertypetests -mtriple=riscv64-unknown-linux-gnu %s | FileCheck --check-prefixes=RISCV,NATIVE %s
+; RUN: opt -S -passes=lowertypetests -mtriple=i686-unknown-linux-gnu %s | FileCheck --check-prefixes=X86,X86-LINUX,NATIVE,JT8 %s
+; RUN: opt -S -passes=lowertypetests -mtriple=x86_64-unknown-linux-gnu %s | FileCheck --check-prefixes=X86,X86-LINUX,NATIVE,JT8 %s
+; RUN: opt -S -passes=lowertypetests -mtriple=i686-pc-win32 %s | FileCheck --check-prefixes=X86,X86-WIN32,NATIVE,JT8 %s
+; RUN: opt -S -passes=lowertypetests -mtriple=x86_64-pc-win32 %s | FileCheck --check-prefixes=X86,X86-WIN32,NATIVE,JT8 %s
+; RUN: opt -S -passes=lowertypetests -mtriple=riscv32-unknown-linux-gnu %s | FileCheck --check-prefixes=RISCV,NATIVE,JT8 %s
+; RUN: opt -S -passes=lowertypetests -mtriple=riscv64-unknown-linux-gnu %s | FileCheck --check-prefixes=RISCV,NATIVE,JT8 %s
 ; RUN: opt -S -passes=lowertypetests -mtriple=wasm32-unknown-unknown %s | FileCheck --check-prefix=WASM32 %s
-; RUN: opt -S -passes=lowertypetests -mtriple=loongarch64-unknown-linux-gnu %s | FileCheck --check-prefixes=LOONGARCH64,NATIVE %s
+; RUN: opt -S -passes=lowertypetests -mtriple=loongarch64-unknown-linux-gnu %s | FileCheck --check-prefixes=LOONGARCH64,NATIVE,JT8 %s
 
 ; The right format for Arm jump tables depends on the selected
 ; subtarget, so we can't get these tests right without the Arm target
 ; compiled in.
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=arm-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE %s %}
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv7m-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMB,NATIVE %s %}
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv8m.base-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMB,NATIVE %s %}
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv6m-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMBV6M,NATIVE %s %}
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv5-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE %s %}
-; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=aarch64-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=arm-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE,JT4 %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv7m-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMB,NATIVE,JT4 %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv8m.base-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMB,NATIVE,JT4 %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv6m-unknown-linux-gnu %s | FileCheck --check-prefixes=THUMBV6M,NATIVE,JT16 %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=thumbv5-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE,JT4 %s %}
+; RUN: %if arm-registered-target %{ opt -S -passes=lowertypetests -mtriple=aarch64-unknown-linux-gnu %s | FileCheck --check-prefixes=ARM,NATIVE,JT4 %s %}
 
 ; Tests that we correctly handle bitsets containing 2 or more functions.
 
@@ -54,20 +54,18 @@ define internal void @g() !type !0 {
 declare i1 @llvm.type.test(ptr %ptr, metadata %bitset) noinline readnone
 
 define i1 @foo(ptr %p) {
-  ; NATIVE: sub i64 {{.*}}, ptrtoint (ptr @[[JT]] to i64)
-  ; WASM32: sub i64 {{.*}}, ptrtoint (ptr getelementptr (i8, ptr null, i64 1) to i64)
+  ; JT4: sub i64 ptrtoint (ptr getelementptr (i8, ptr @[[JT]], i64 4) to i64), {{.*}}
+  ; JT8: sub i64 ptrtoint (ptr getelementptr (i8, ptr @[[JT]], i64 8) to i64), {{.*}}
+  ; JT16: sub i64 ptrtoint (ptr getelementptr (i8, ptr @[[JT]], i64 16) to i64), {{.*}}
+  ; WASM32: sub i64 ptrtoint (ptr getelementptr (i8, ptr null, i64 2) to i64), {{.*}}
   ; WASM32: icmp ule i64 {{.*}}, 1
   %x = call i1 @llvm.type.test(ptr %p, metadata !"typeid1")
   ret i1 %x
 }
 
-; X86-LINUX:   define private void @[[JT]]() #[[ATTR:.*]] align 8 {
-; X86-WIN32:   define private void @[[JT]]() #[[ATTR:.*]] align 8 {
-; ARM:         define private void @[[JT]]() #[[ATTR:.*]] align 4 {
-; THUMB:       define private void @[[JT]]() #[[ATTR:.*]] align 4 {
-; THUMBV6M:    define private void @[[JT]]() #[[ATTR:.*]] align 16 {
-; RISCV:       define private void @[[JT]]() #[[ATTR:.*]] align 8 {
-; LOONGARCH64: define private void @[[JT]]() #[[ATTR:.*]] align 8 {
+; JT4:  define private void @[[JT]]() #[[ATTR:.*]] align 4 {
+; JT8:  define private void @[[JT]]() #[[ATTR:.*]] align 8 {
+; JT16: define private void @[[JT]]() #[[ATTR:.*]] align 16 {
 
 ; X86:      jmp ${0:c}@plt
 ; X86-SAME: int3
diff --git a/llvm/test/Transforms/LowerTypeTests/import.ll b/llvm/test/Transforms/LowerTypeTests/import.ll
index 31b4f20e6fd7c..819ede96f997e 100644
--- a/llvm/test/Transforms/LowerTypeTests/import.ll
+++ b/llvm/test/Transforms/LowerTypeTests/import.ll
@@ -36,7 +36,7 @@ define i1 @allones7(ptr %p) {
 ; X86-LABEL: define i1 @allones7(
 ; X86-SAME: ptr [[P:%.*]]) {
 ; X86-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; X86-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones7_global_addr to i64)
+; X86-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_allones7_global_addr to i64), [[TMP1]]
 ; X86-NEXT:    [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 ptrtoint (ptr @__typeid_allones7_align to i64))
 ; X86-NEXT:    [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_allones7_size_m1 to i64)
 ; X86-NEXT:    ret i1 [[TMP8]]
@@ -44,7 +44,7 @@ define i1 @allones7(ptr %p) {
 ; ARM-LABEL: define i1 @allones7(
 ; ARM-SAME: ptr [[P:%.*]]) {
 ; ARM-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; ARM-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones7_global_addr to i64)
+; ARM-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_allones7_global_addr to i64), [[TMP1]]
 ; ARM-NEXT:    [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 1)
 ; ARM-NEXT:    [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 42
 ; ARM-NEXT:    ret i1 [[TMP6]]
@@ -57,7 +57,7 @@ define i1 @allones32(ptr %p) {
 ; X86-LABEL: define i1 @allones32(
 ; X86-SAME: ptr [[P:%.*]]) {
 ; X86-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; X86-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones32_global_addr to i64)
+; X86-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_allones32_global_addr to i64), [[TMP1]]
 ; X86-NEXT:    [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 ptrtoint (ptr @__typeid_allones32_align to i64))
 ; X86-NEXT:    [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_allones32_size_m1 to i64)
 ; X86-NEXT:    ret i1 [[TMP8]]
@@ -65,7 +65,7 @@ define i1 @allones32(ptr %p) {
 ; ARM-LABEL: define i1 @allones32(
 ; ARM-SAME: ptr [[P:%.*]]) {
 ; ARM-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; ARM-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_allones32_global_addr to i64)
+; ARM-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_allones32_global_addr to i64), [[TMP1]]
 ; ARM-NEXT:    [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 2)
 ; ARM-NEXT:    [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 12345
 ; ARM-NEXT:    ret i1 [[TMP6]]
@@ -78,7 +78,7 @@ define i1 @bytearray7(ptr %p) {
 ; X86-LABEL: define i1 @bytearray7(
 ; X86-SAME: ptr [[P:%.*]]) {
 ; X86-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; X86-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray7_global_addr to i64)
+; X86-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_bytearray7_global_addr to i64), [[TMP1]]
 ; X86-NEXT:    [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 ptrtoint (ptr @__typeid_bytearray7_align to i64))
 ; X86-NEXT:    [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_bytearray7_size_m1 to i64)
 ; X86-NEXT:    br i1 [[TMP8]], label %[[TMP9:.*]], label %[[TMP14:.*]]
@@ -95,7 +95,7 @@ define i1 @bytearray7(ptr %p) {
 ; ARM-LABEL: define i1 @bytearray7(
 ; ARM-SAME: ptr [[P:%.*]]) {
 ; ARM-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; ARM-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray7_global_addr to i64)
+; ARM-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_bytearray7_global_addr to i64), [[TMP1]]
 ; ARM-NEXT:    [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 3)
 ; ARM-NEXT:    [[TMP6:%.*]] = icmp ule i64 [[TMP5]], 43
 ; ARM-NEXT:    br i1 [[TMP6]], label [[TMP7:%.*]], label [[TMP12:%.*]]
@@ -117,7 +117,7 @@ define i1 @bytearray32(ptr %p) {
 ; X86-LABEL: define i1 @bytearray32(
 ; X86-SAME: ptr [[P:%.*]]) {
 ; X86-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; X86-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray32_global_addr to i64)
+; X86-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_bytearray32_global_addr to i64), [[TMP1]]
 ; X86-NEXT:    [[TMP7:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 ptrtoint (ptr @__typeid_bytearray32_align to i64))
 ; X86-NEXT:    [[TMP8:%.*]] = icmp ule i64 [[TMP7]], ptrtoint (ptr @__typeid_bytearray32_size_m1 to i64)
 ; X86-NEXT:    br i1 [[TMP8]], label %[[TMP9:.*]], label %[[TMP14:.*]]
@@ -134,7 +134,7 @@ define i1 @bytearray32(ptr %p) {
 ; ARM-LABEL: define i1 @bytearray32(
 ; ARM-SAME: ptr [[P:%.*]]) {
 ; ARM-NEXT:    [[TMP1:%.*]] = ptrtoint ptr [[P]] to i64
-; ARM-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], ptrtoint (ptr @__typeid_bytearray32_global_addr to i64)
+; ARM-NEXT:    [[TMP2:%.*]] = sub i64 ptrtoint (ptr @__typeid_bytearray32_global_addr to i64), [[TMP1]]
 ; ARM-NEXT:    [[TMP5:%.*]] = call i64 @llvm.fshr.i64(i64 [[TMP2]], i64 [[TMP2]], i64 4)
 ; ARM-NEXT:    [[TMP6:%.*]] = icmp ule i64 [[TMP...
[truncated]

@pcc pcc requested a review from vitalybuka June 5, 2025 02:12
Copy link
Contributor

@fmayer fmayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test that demonstrates the new better instruction sequence (by precommiting to show the diff here)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants