Skip to content

InstCombine: Fix a crash in PointerReplacer when constructing a new PHI #130256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 10, 2025

Conversation

changpeng
Copy link
Contributor

@changpeng changpeng commented Mar 7, 2025

When constructing a PHI node in PointerReplacer::replace, the incoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from getReplacement. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search.

Fixes: SWDEV-516420

  When constructing a PHI node in `PointerReplacer::replace`, the imcoming operands
are expected to have already been replaced and in the replacement map. However, when
one of the incoming operands is a load, the search of the map is unsuccessful, and a
nullptr is returned from `getReplacement`. The reason is that, when a load is replaced,
all the uses of the load has been actually replaced by the new load. It is useless to
insert the original load into the map. Instead, we should place the new load into the
map to meet the expectation of the later map search.

Fixes: SWDEV-516420.
@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

When constructing a PHI node in PointerReplacer::replace, the imcoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from getReplacement. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search.

Fixes: SWDEV-516420


Full diff: https://github.com/llvm/llvm-project/pull/130256.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+4-1)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll (+48)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index c1f5e286ab3ed..c29cba6f675c5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -360,7 +360,10 @@ void PointerReplacer::replace(Instruction *I) {
 
     IC.InsertNewInstWith(NewI, LT->getIterator());
     IC.replaceInstUsesWith(*LT, NewI);
-    WorkMap[LT] = NewI;
+    // LT has actually been replaced by NewI. It is useless to insert LT into
+    // the map. Instead, we insert NewI into the map to indicate this is the
+    // replacement (new value).
+    WorkMap[NewI] = NewI;
   } else if (auto *PHI = dyn_cast<PHINode>(I)) {
     Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();
     auto *NewPHI = PHINode::Create(NewTy, PHI->getNumIncomingValues(),
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
new file mode 100644
index 0000000000000..eb15db6f3e079
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -O1 -S -o - %s | FileCheck %s
+; REQUIRES: amdgpu-registered-target
+
+target triple = "amdgcn-amd-amdhsa"
+
+%"doube_double" = type { double, double}
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
+
+; Function Attrs: convergent mustprogress norecurse nounwind sanitize_address uwtable
+define amdgpu_kernel void @_test(ptr addrspace(4) noundef byref(%"doube_double") align 8 %0) #0  {
+; CHECK-LABEL: define amdgpu_kernel void @_test(
+; CHECK-SAME: ptr addrspace(4) noundef readonly byref([[DOUBE_DOUBLE:%.*]]) align 8 captures(none) [[TMP0:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[ALPHA_UNION:%.*]] = addrspacecast ptr addrspace(4) [[TMP0]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr addrspace(5) null, align 2147483648
+; CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP1]] to i1
+; CHECK-NEXT:    br i1 [[LOADEDV]], label %[[COND_END:.*]], label %[[COND_FALSE:.*]]
+; CHECK:       [[COND_FALSE]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr addrspace(4) [[TMP0]], align 8
+; CHECK-NEXT:    br label %[[COND_END]]
+; CHECK:       [[COND_END]]:
+; CHECK-NEXT:    [[COND1:%.*]] = phi ptr [ [[TMP2]], %[[COND_FALSE]] ], [ [[ALPHA_UNION]], %[[ENTRY]] ]
+; CHECK-NEXT:    tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) poison, ptr noundef nonnull align 8 dereferenceable(16) [[COND1]], i64 16, i1 false)
+; CHECK-NEXT:    ret void
+;
+entry:
+  %coerce = alloca %"doube_double", align 8, addrspace(5)
+  %alpha_union = addrspacecast ptr addrspace(5) %coerce to ptr
+  %is_host_mode.addr.ascast = addrspacecast ptr addrspace(5) null to ptr
+  call void @llvm.memcpy.p0.p4.i64(ptr align 8 %alpha_union, ptr addrspace(4) align 8 %0, i64 16, i1 false)
+  %1 = load i8, ptr %is_host_mode.addr.ascast, align 1
+  %loadedv = trunc i8 %1 to i1
+  br i1 %loadedv, label %cond.end, label %cond.false
+
+cond.false:                                       ; preds = %entry
+  %2 = load ptr, ptr %alpha_union, align 8
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %entry
+  %cond = phi ptr [ %2, %cond.false ], [ %alpha_union, %entry ]
+  call void @llvm.memcpy.p0.p0.i64(ptr align 8 poison, ptr align 8 %cond, i64 16, i1 false)
+  ret void
+}
+
+attributes #0 = { convergent mustprogress norecurse nounwind sanitize_address uwtable "amdgpu-flat-work-group-size"="1,128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xnack" "uniform-work-group-size"="true" }

@llvmbot
Copy link
Member

llvmbot commented Mar 7, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Changpeng Fang (changpeng)

Changes

When constructing a PHI node in PointerReplacer::replace, the imcoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from getReplacement. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search.

Fixes: SWDEV-516420


Full diff: https://github.com/llvm/llvm-project/pull/130256.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp (+4-1)
  • (added) llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll (+48)
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index c1f5e286ab3ed..c29cba6f675c5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -360,7 +360,10 @@ void PointerReplacer::replace(Instruction *I) {
 
     IC.InsertNewInstWith(NewI, LT->getIterator());
     IC.replaceInstUsesWith(*LT, NewI);
-    WorkMap[LT] = NewI;
+    // LT has actually been replaced by NewI. It is useless to insert LT into
+    // the map. Instead, we insert NewI into the map to indicate this is the
+    // replacement (new value).
+    WorkMap[NewI] = NewI;
   } else if (auto *PHI = dyn_cast<PHINode>(I)) {
     Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();
     auto *NewPHI = PHINode::Create(NewTy, PHI->getNumIncomingValues(),
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
new file mode 100644
index 0000000000000..eb15db6f3e079
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -O1 -S -o - %s | FileCheck %s
+; REQUIRES: amdgpu-registered-target
+
+target triple = "amdgcn-amd-amdhsa"
+
+%"doube_double" = type { double, double}
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
+
+; Function Attrs: convergent mustprogress norecurse nounwind sanitize_address uwtable
+define amdgpu_kernel void @_test(ptr addrspace(4) noundef byref(%"doube_double") align 8 %0) #0  {
+; CHECK-LABEL: define amdgpu_kernel void @_test(
+; CHECK-SAME: ptr addrspace(4) noundef readonly byref([[DOUBE_DOUBLE:%.*]]) align 8 captures(none) [[TMP0:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:  [[ENTRY:.*]]:
+; CHECK-NEXT:    [[ALPHA_UNION:%.*]] = addrspacecast ptr addrspace(4) [[TMP0]] to ptr
+; CHECK-NEXT:    [[TMP1:%.*]] = load i8, ptr addrspace(5) null, align 2147483648
+; CHECK-NEXT:    [[LOADEDV:%.*]] = trunc i8 [[TMP1]] to i1
+; CHECK-NEXT:    br i1 [[LOADEDV]], label %[[COND_END:.*]], label %[[COND_FALSE:.*]]
+; CHECK:       [[COND_FALSE]]:
+; CHECK-NEXT:    [[TMP2:%.*]] = load ptr, ptr addrspace(4) [[TMP0]], align 8
+; CHECK-NEXT:    br label %[[COND_END]]
+; CHECK:       [[COND_END]]:
+; CHECK-NEXT:    [[COND1:%.*]] = phi ptr [ [[TMP2]], %[[COND_FALSE]] ], [ [[ALPHA_UNION]], %[[ENTRY]] ]
+; CHECK-NEXT:    tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) poison, ptr noundef nonnull align 8 dereferenceable(16) [[COND1]], i64 16, i1 false)
+; CHECK-NEXT:    ret void
+;
+entry:
+  %coerce = alloca %"doube_double", align 8, addrspace(5)
+  %alpha_union = addrspacecast ptr addrspace(5) %coerce to ptr
+  %is_host_mode.addr.ascast = addrspacecast ptr addrspace(5) null to ptr
+  call void @llvm.memcpy.p0.p4.i64(ptr align 8 %alpha_union, ptr addrspace(4) align 8 %0, i64 16, i1 false)
+  %1 = load i8, ptr %is_host_mode.addr.ascast, align 1
+  %loadedv = trunc i8 %1 to i1
+  br i1 %loadedv, label %cond.end, label %cond.false
+
+cond.false:                                       ; preds = %entry
+  %2 = load ptr, ptr %alpha_union, align 8
+  br label %cond.end
+
+cond.end:                                         ; preds = %cond.false, %entry
+  %cond = phi ptr [ %2, %cond.false ], [ %alpha_union, %entry ]
+  call void @llvm.memcpy.p0.p0.i64(ptr align 8 poison, ptr align 8 %cond, i64 16, i1 false)
+  ret void
+}
+
+attributes #0 = { convergent mustprogress norecurse nounwind sanitize_address uwtable "amdgpu-flat-work-group-size"="1,128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xnack" "uniform-work-group-size"="true" }

@changpeng changpeng requested a review from arsenm March 7, 2025 07:21
Copy link
Contributor

@antoniofrighetto antoniofrighetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we RAUW the load but not the store here? This might need to be done for the memcpy too, which looks dead by the time we insert it into the map.

@antoniofrighetto antoniofrighetto requested a review from dtcxzyw March 7, 2025 08:32
@changpeng
Copy link
Contributor Author

Why do we RAUW the load but not the store here? This might need to be done for the memcpy too, which looks dead by the time we insert it into the map.

My understanding is that it does not matter for memcpy and store because they are not used. We may possibly just do not insert them into the map. But need to additional research which is beyond this work.

@arsenm arsenm requested a review from yxsamliu March 7, 2025 08:57
@changpeng
Copy link
Contributor Author

ping

@changpeng changpeng requested a review from bcahoon March 8, 2025 06:19
// LT has actually been replaced by NewI. It is useless to insert LT into
// the map. Instead, we insert NewI into the map to indicate this is the
// replacement (new value).
WorkMap[NewI] = NewI;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just not add it to the map? I'm not sure what the point of the identity mapping would be

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just not add it to the map? I'm not sure what the point of the identity mapping would be

The whole logic is that, when replacing a PHI instruction, all its operands are expected to have already been replaced and can be found in the map. So getReplacement (Operand) should always return the replacement (never a NULL).
Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();

However, when a load is replaced, all the uses of the load has already been actually replaced by the new load:
IC.replaceInstUsesWith(*LT, NewI);

We will never search the map with the original load (LT). And we expect to find the replacement for the new load.
So we have to add the newI into the map instead.

@changpeng changpeng merged commit fa45bf4 into llvm:main Mar 10, 2025
11 checks passed
@changpeng changpeng deleted the phi branch March 10, 2025 03:22
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 12, 2025
… PHI (llvm#130256)

When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.

Fixes: SWDEV-516420
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Mar 12, 2025
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request May 27, 2025
… PHI (llvm#130256)

When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.

Fixes: SWDEV-516420
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants