-
Notifications
You must be signed in to change notification settings - Fork 13.6k
InstCombine: Fix a crash in PointerReplacer
when constructing a new PHI
#130256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When constructing a PHI node in `PointerReplacer::replace`, the imcoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from `getReplacement`. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search. Fixes: SWDEV-516420.
@llvm/pr-subscribers-backend-amdgpu Author: Changpeng Fang (changpeng) ChangesWhen constructing a PHI node in Fixes: SWDEV-516420 Full diff: https://github.com/llvm/llvm-project/pull/130256.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index c1f5e286ab3ed..c29cba6f675c5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -360,7 +360,10 @@ void PointerReplacer::replace(Instruction *I) {
IC.InsertNewInstWith(NewI, LT->getIterator());
IC.replaceInstUsesWith(*LT, NewI);
- WorkMap[LT] = NewI;
+ // LT has actually been replaced by NewI. It is useless to insert LT into
+ // the map. Instead, we insert NewI into the map to indicate this is the
+ // replacement (new value).
+ WorkMap[NewI] = NewI;
} else if (auto *PHI = dyn_cast<PHINode>(I)) {
Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();
auto *NewPHI = PHINode::Create(NewTy, PHI->getNumIncomingValues(),
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
new file mode 100644
index 0000000000000..eb15db6f3e079
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -O1 -S -o - %s | FileCheck %s
+; REQUIRES: amdgpu-registered-target
+
+target triple = "amdgcn-amd-amdhsa"
+
+%"doube_double" = type { double, double}
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
+
+; Function Attrs: convergent mustprogress norecurse nounwind sanitize_address uwtable
+define amdgpu_kernel void @_test(ptr addrspace(4) noundef byref(%"doube_double") align 8 %0) #0 {
+; CHECK-LABEL: define amdgpu_kernel void @_test(
+; CHECK-SAME: ptr addrspace(4) noundef readonly byref([[DOUBE_DOUBLE:%.*]]) align 8 captures(none) [[TMP0:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[ALPHA_UNION:%.*]] = addrspacecast ptr addrspace(4) [[TMP0]] to ptr
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr addrspace(5) null, align 2147483648
+; CHECK-NEXT: [[LOADEDV:%.*]] = trunc i8 [[TMP1]] to i1
+; CHECK-NEXT: br i1 [[LOADEDV]], label %[[COND_END:.*]], label %[[COND_FALSE:.*]]
+; CHECK: [[COND_FALSE]]:
+; CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr addrspace(4) [[TMP0]], align 8
+; CHECK-NEXT: br label %[[COND_END]]
+; CHECK: [[COND_END]]:
+; CHECK-NEXT: [[COND1:%.*]] = phi ptr [ [[TMP2]], %[[COND_FALSE]] ], [ [[ALPHA_UNION]], %[[ENTRY]] ]
+; CHECK-NEXT: tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) poison, ptr noundef nonnull align 8 dereferenceable(16) [[COND1]], i64 16, i1 false)
+; CHECK-NEXT: ret void
+;
+entry:
+ %coerce = alloca %"doube_double", align 8, addrspace(5)
+ %alpha_union = addrspacecast ptr addrspace(5) %coerce to ptr
+ %is_host_mode.addr.ascast = addrspacecast ptr addrspace(5) null to ptr
+ call void @llvm.memcpy.p0.p4.i64(ptr align 8 %alpha_union, ptr addrspace(4) align 8 %0, i64 16, i1 false)
+ %1 = load i8, ptr %is_host_mode.addr.ascast, align 1
+ %loadedv = trunc i8 %1 to i1
+ br i1 %loadedv, label %cond.end, label %cond.false
+
+cond.false: ; preds = %entry
+ %2 = load ptr, ptr %alpha_union, align 8
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %entry
+ %cond = phi ptr [ %2, %cond.false ], [ %alpha_union, %entry ]
+ call void @llvm.memcpy.p0.p0.i64(ptr align 8 poison, ptr align 8 %cond, i64 16, i1 false)
+ ret void
+}
+
+attributes #0 = { convergent mustprogress norecurse nounwind sanitize_address uwtable "amdgpu-flat-work-group-size"="1,128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xnack" "uniform-work-group-size"="true" }
|
@llvm/pr-subscribers-llvm-transforms Author: Changpeng Fang (changpeng) ChangesWhen constructing a PHI node in Fixes: SWDEV-516420 Full diff: https://github.com/llvm/llvm-project/pull/130256.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
index c1f5e286ab3ed..c29cba6f675c5 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -360,7 +360,10 @@ void PointerReplacer::replace(Instruction *I) {
IC.InsertNewInstWith(NewI, LT->getIterator());
IC.replaceInstUsesWith(*LT, NewI);
- WorkMap[LT] = NewI;
+ // LT has actually been replaced by NewI. It is useless to insert LT into
+ // the map. Instead, we insert NewI into the map to indicate this is the
+ // replacement (new value).
+ WorkMap[NewI] = NewI;
} else if (auto *PHI = dyn_cast<PHINode>(I)) {
Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();
auto *NewPHI = PHINode::Create(NewTy, PHI->getNumIncomingValues(),
diff --git a/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
new file mode 100644
index 0000000000000..eb15db6f3e079
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
@@ -0,0 +1,48 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -O1 -S -o - %s | FileCheck %s
+; REQUIRES: amdgpu-registered-target
+
+target triple = "amdgcn-amd-amdhsa"
+
+%"doube_double" = type { double, double}
+
+; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
+declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg)
+
+; Function Attrs: convergent mustprogress norecurse nounwind sanitize_address uwtable
+define amdgpu_kernel void @_test(ptr addrspace(4) noundef byref(%"doube_double") align 8 %0) #0 {
+; CHECK-LABEL: define amdgpu_kernel void @_test(
+; CHECK-SAME: ptr addrspace(4) noundef readonly byref([[DOUBE_DOUBLE:%.*]]) align 8 captures(none) [[TMP0:%.*]]) local_unnamed_addr #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[ALPHA_UNION:%.*]] = addrspacecast ptr addrspace(4) [[TMP0]] to ptr
+; CHECK-NEXT: [[TMP1:%.*]] = load i8, ptr addrspace(5) null, align 2147483648
+; CHECK-NEXT: [[LOADEDV:%.*]] = trunc i8 [[TMP1]] to i1
+; CHECK-NEXT: br i1 [[LOADEDV]], label %[[COND_END:.*]], label %[[COND_FALSE:.*]]
+; CHECK: [[COND_FALSE]]:
+; CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr addrspace(4) [[TMP0]], align 8
+; CHECK-NEXT: br label %[[COND_END]]
+; CHECK: [[COND_END]]:
+; CHECK-NEXT: [[COND1:%.*]] = phi ptr [ [[TMP2]], %[[COND_FALSE]] ], [ [[ALPHA_UNION]], %[[ENTRY]] ]
+; CHECK-NEXT: tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) poison, ptr noundef nonnull align 8 dereferenceable(16) [[COND1]], i64 16, i1 false)
+; CHECK-NEXT: ret void
+;
+entry:
+ %coerce = alloca %"doube_double", align 8, addrspace(5)
+ %alpha_union = addrspacecast ptr addrspace(5) %coerce to ptr
+ %is_host_mode.addr.ascast = addrspacecast ptr addrspace(5) null to ptr
+ call void @llvm.memcpy.p0.p4.i64(ptr align 8 %alpha_union, ptr addrspace(4) align 8 %0, i64 16, i1 false)
+ %1 = load i8, ptr %is_host_mode.addr.ascast, align 1
+ %loadedv = trunc i8 %1 to i1
+ br i1 %loadedv, label %cond.end, label %cond.false
+
+cond.false: ; preds = %entry
+ %2 = load ptr, ptr %alpha_union, align 8
+ br label %cond.end
+
+cond.end: ; preds = %cond.false, %entry
+ %cond = phi ptr [ %2, %cond.false ], [ %alpha_union, %entry ]
+ call void @llvm.memcpy.p0.p0.i64(ptr align 8 poison, ptr align 8 %cond, i64 16, i1 false)
+ ret void
+}
+
+attributes #0 = { convergent mustprogress norecurse nounwind sanitize_address uwtable "amdgpu-flat-work-group-size"="1,128" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="gfx90a" "target-features"="+16-bit-insts,+atomic-buffer-global-pk-add-f16-insts,+atomic-fadd-rtn-insts,+ci-insts,+dl-insts,+dot1-insts,+dot10-insts,+dot2-insts,+dot3-insts,+dot4-insts,+dot5-insts,+dot6-insts,+dot7-insts,+dpp,+gfx8-insts,+gfx9-insts,+gfx90a-insts,+mai-insts,+s-memrealtime,+s-memtime-inst,+wavefrontsize64,+xnack" "uniform-work-group-size"="true" }
|
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
Update the LIT tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we RAUW the load but not the store here? This might need to be done for the memcpy too, which looks dead by the time we insert it into the map.
My understanding is that it does not matter for memcpy and store because they are not used. We may possibly just do not insert them into the map. But need to additional research which is beyond this work. |
Update LIT test based on the comment.
llvm/test/Transforms/InstCombine/AMDGPU/phi-with-incoming-from-load.ll
Outdated
Show resolved
Hide resolved
Use all named variables in the LIT test
ping |
// LT has actually been replaced by NewI. It is useless to insert LT into | ||
// the map. Instead, we insert NewI into the map to indicate this is the | ||
// replacement (new value). | ||
WorkMap[NewI] = NewI; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just not add it to the map? I'm not sure what the point of the identity mapping would be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just not add it to the map? I'm not sure what the point of the identity mapping would be
The whole logic is that, when replacing a PHI instruction, all its operands are expected to have already been replaced and can be found in the map. So getReplacement (Operand) should always return the replacement (never a NULL).
Type *NewTy = getReplacement(PHI->getIncomingValue(0))->getType();
However, when a load is replaced, all the uses of the load has already been actually replaced by the new load:
IC.replaceInstUsesWith(*LT, NewI);
We will never search the map with the original load (LT). And we expect to find the replacement for the new load.
So we have to add the newI into the map instead.
… PHI (llvm#130256) When constructing a PHI node in `PointerReplacer::replace`, the incoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from `getReplacement`. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search. Fixes: SWDEV-516420
… PHI (llvm#130256) When constructing a PHI node in `PointerReplacer::replace`, the incoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned from `getReplacement`. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search. Fixes: SWDEV-516420
When constructing a PHI node in
PointerReplacer::replace
, the incoming operands are expected to have already been replaced and in the replacement map. However, when one of the incoming operands is a load, the search of the map is unsuccessful, and a nullptr is returned fromgetReplacement
. The reason is that, when a load is replaced, all the uses of the load has been actually replaced by the new load. It is useless to insert the original load into the map. Instead, we should place the new load into the map to meet the expectation of the later map search.Fixes: SWDEV-516420