Skip to content

[HEXAGON] [MachinePipeliner] Fix the DAG in case of dependent phis. #135925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

quic-asaravan
Copy link
Contributor

@quic-asaravan quic-asaravan commented Apr 16, 2025

%3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0 %7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1 %27:intregs = A2_zxtb %3:intregs - SU2
%13:intregs = C2_muxri %45:predregs, 0, %46:intreg

If we have dependent phis, SU0 should be the successor of SU1 not the other way around.

Authored by: Sumanth Gundapaneni

%3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0
%7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1
%27:intregs = A2_zxtb %3:intregs - SU2
%13:intregs = C2_muxri %45:predregs, 0, %46:intreg

If we have dependent phis, SU0 should be the successor of SU1 not
the other way around.
@quic-asaravan quic-asaravan marked this pull request as ready for review April 16, 2025 07:26
@llvmbot
Copy link
Member

llvmbot commented Apr 16, 2025

@llvm/pr-subscribers-backend-hexagon

Author: Abinaya Saravanan (quic-asaravan)

Changes

%3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0 %7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1 %27:intregs = A2_zxtb %3:intregs - SU2
%13:intregs = C2_muxri %45:predregs, 0, %46:intreg

If we have dependent phis, SU0 should be the successor of SU1 not the other way around.


Full diff: https://github.com/llvm/llvm-project/pull/135925.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/MachinePipeliner.cpp (+19-1)
  • (added) llvm/test/CodeGen/Hexagon/swp-dependent-phis.ll (+96)
  • (modified) llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll (+4-2)
diff --git a/llvm/lib/CodeGen/MachinePipeliner.cpp b/llvm/lib/CodeGen/MachinePipeliner.cpp
index 6cb0299a30d7a..de10402fe7f48 100644
--- a/llvm/lib/CodeGen/MachinePipeliner.cpp
+++ b/llvm/lib/CodeGen/MachinePipeliner.cpp
@@ -962,8 +962,26 @@ void SwingSchedulerDAG::updatePhiDependences() {
               HasPhiDef = Reg;
               // Add a chain edge to a dependent Phi that isn't an existing
               // predecessor.
+
+              // %3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0
+              // %7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1
+              // %27:intregs = A2_zxtb %3:intregs - SU2
+              // %13:intregs = C2_muxri %45:predregs, 0, %46:intreg
+              // If we have dependent phis, SU0 should be the successor of SU1
+              // not the other way around. (it used to be SU1 is the successor
+              // of SU0). In some cases, SU0 is scheduled earlier than SU1
+              // resulting in bad IR as we do not have a value that can be used
+              // by SU2.
+
+              // Reachability check is to ensure that we do not violate DAG.
+              // %1:intregs = PHI %10:intregs, %bb.0, %3:intregs, %bb.1 - SU0
+              // %2:intregs = PHI %10:intregs, %bb.0, %1:intregs, %bb.1 - SU1
+              // %3:intregs = PHI %11:intregs, %bb.0, %2:intregs, %bb.1 - SU2
+              // S2_storerb_io %0:intregs, 0, %2:intregs
+              // Make sure we do not create an edge between SU2 and SU0.
+
               if (SU->NodeNum < I.NodeNum && !I.isPred(SU))
-                I.addPred(SDep(SU, SDep::Barrier));
+                SU->addPred(SDep(&I, SDep::Barrier));
             }
           }
         }
diff --git a/llvm/test/CodeGen/Hexagon/swp-dependent-phis.ll b/llvm/test/CodeGen/Hexagon/swp-dependent-phis.ll
new file mode 100644
index 0000000000000..bff89f764710a
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/swp-dependent-phis.ll
@@ -0,0 +1,96 @@
+;RUN: llc -march=hexagon -mv71t -O2 < %s -o - 2>&1 > /dev/null
+
+; Validate that we do not crash while running this test.
+;%3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0
+;%7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1
+;%27:intregs = A2_zxtb %3:intregs - SU2
+;%13:intregs = C2_muxri %45:predregs, 0, %46:intreg
+;If we have dependent phis, SU0 should be the successor of SU1 not
+;the other way around. (it used to be SU1 is the successor of SU0).
+;In some cases, SU0 is scheduled earlier than SU1 resulting in bad
+;IR as we do not have a value that can be used by SU2.
+
+@global = common dso_local local_unnamed_addr global ptr null, align 4
+@global.1 = common dso_local local_unnamed_addr global i32 0, align 4
+@global.2 = common dso_local local_unnamed_addr global i16 0, align 2
+@global.3 = common dso_local local_unnamed_addr global i16 0, align 2
+@global.4 = common dso_local local_unnamed_addr global i32 0, align 4
+
+; Function Attrs: nofree norecurse nosync nounwind
+define dso_local i32 @wombat(i8 zeroext %arg) local_unnamed_addr #0 {
+bb:
+  %load = load ptr, ptr @global, align 4
+  %load1 = load i32, ptr @global.1, align 4
+  %add2 = add nsw i32 %load1, -1
+  store i32 %add2, ptr @global.1, align 4
+  %icmp = icmp eq i32 %load1, 0
+  br i1 %icmp, label %bb36, label %bb3
+
+bb3:                                              ; preds = %bb3, %bb
+  %phi = phi i32 [ %add30, %bb3 ], [ %add2, %bb ]
+  %phi4 = phi i8 [ %phi8, %bb3 ], [ %arg, %bb ]
+  %phi5 = phi i16 [ %select23, %bb3 ], [ undef, %bb ]
+  %phi6 = phi i16 [ %select26, %bb3 ], [ undef, %bb ]
+  %phi7 = phi i16 [ %select, %bb3 ], [ undef, %bb ]
+  %phi8 = phi i8 [ %select29, %bb3 ], [ %arg, %bb ]
+  %zext = zext i8 %phi4 to i32
+  %getelementptr = getelementptr inbounds i32, ptr %load, i32 %zext
+  %getelementptr9 = getelementptr inbounds i32, ptr %getelementptr, i32 2
+  %ptrtoint = ptrtoint ptr %getelementptr9 to i32
+  %trunc = trunc i32 %ptrtoint to i16
+  %sext10 = sext i16 %phi7 to i32
+  %shl11 = shl i32 %ptrtoint, 16
+  %ashr = ashr exact i32 %shl11, 16
+  %icmp12 = icmp slt i32 %ashr, %sext10
+  %select = select i1 %icmp12, i16 %trunc, i16 %phi7
+  %getelementptr13 = getelementptr inbounds i32, ptr %getelementptr, i32 3
+  %load14 = load i32, ptr %getelementptr13, align 4
+  %shl = shl i32 %load14, 8
+  %getelementptr15 = getelementptr inbounds i32, ptr %getelementptr, i32 1
+  %load16 = load i32, ptr %getelementptr15, align 4
+  %shl17 = shl i32 %load16, 16
+  %ashr18 = ashr exact i32 %shl17, 16
+  %add = add nsw i32 %ashr18, %load14
+  %lshr = lshr i32 %add, 8
+  %or = or i32 %lshr, %shl
+  %sub = sub i32 %or, %load16
+  %trunc19 = trunc i32 %sub to i16
+  %sext = sext i16 %phi5 to i32
+  %shl20 = shl i32 %sub, 16
+  %ashr21 = ashr exact i32 %shl20, 16
+  %icmp22 = icmp sgt i32 %ashr21, %sext
+  %select23 = select i1 %icmp22, i16 %trunc19, i16 %phi5
+  %sext24 = sext i16 %phi6 to i32
+  %icmp25 = icmp slt i32 %ashr21, %sext24
+  %select26 = select i1 %icmp25, i16 %trunc19, i16 %phi6
+  %icmp27 = icmp eq i8 %phi8, 0
+  %add28 = add i8 %phi8, -1
+  %select29 = select i1 %icmp27, i8 0, i8 %add28
+  %add30 = add nsw i32 %phi, -1
+  %icmp31 = icmp eq i32 %phi, 0
+  br i1 %icmp31, label %bb32, label %bb3
+
+bb32:                                             ; preds = %bb3
+  store i16 %trunc, ptr @global.2, align 2
+  store i16 %trunc19, ptr @global.3, align 2
+  store i32 -1, ptr @global.1, align 4
+  %sext33 = sext i16 %select to i32
+  %sext34 = sext i16 %select23 to i32
+  %sext35 = sext i16 %select26 to i32
+  br label %bb36
+
+bb36:                                             ; preds = %bb32, %bb
+  %phi37 = phi i32 [ %sext33, %bb32 ], [ 0, %bb ]
+  %phi38 = phi i32 [ %sext35, %bb32 ], [ 0, %bb ]
+  %phi39 = phi i32 [ %sext34, %bb32 ], [ 0, %bb ]
+  %sub40 = sub nsw i32 %phi39, %phi38
+  %icmp41 = icmp slt i32 %sub40, %phi37
+  br i1 %icmp41, label %bb43, label %bb42
+
+bb42:                                             ; preds = %bb36
+  store i32 0, ptr @global.4, align 4
+  br label %bb43
+
+bb43:                                             ; preds = %bb42, %bb36
+  ret i32 undef
+}
diff --git a/llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll b/llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll
index af1b848a8cf2d..07b129ffeb3e9 100644
--- a/llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll
+++ b/llvm/test/CodeGen/Hexagon/swp-epilog-phi9.ll
@@ -5,11 +5,13 @@
 ; 2, so the computation for the number of Phis needs to be adjusted when
 ; the incoming prolog block is from prolog 0 or prolog 1.
 ; Note: the pipeliner no longer generates a 3 stage pipeline for this test.
+; Note: the pipeliner has been generating a 4-stage pipelined loop.
 
 ; CHECK: loop0
 ; CHECK: [[REG0:r([0-9]+)]] = add(r{{[0-8]+}},#8)
+; CHECK: r{{[0-9]+}} = [[REG0]]
 ; CHECK: endloop0
-; CHECK: [[REG0]] = add(r{{[0-9]+}},#8)
+; CHECK: r{{[0-9]+}} = add(r{{[0-9]+}},#8)
 
 ; Function Attrs: nounwind
 define void @f0(ptr nocapture readonly %a0, i32 %a1) #0 {
@@ -50,6 +52,6 @@ declare i32 @llvm.hexagon.M2.mpy.ll.s0(i32, i32) #1
 ; Function Attrs: nounwind readnone
 declare i32 @llvm.hexagon.M2.mpy.acc.sat.ll.s0(i32, i32, i32) #1
 
-attributes #0 = { nounwind "target-cpu"="hexagonv60" }
+attributes #0 = { nounwind "target-cpu"="hexagonv68" }
 attributes #1 = { nounwind readnone }
 attributes #2 = { nounwind }

@quic-asaravan
Copy link
Contributor Author

@tru @nunoplopes Any pointers on how to resolve this code_formatter error? I see that this issue is irrelevant to my patch.

@nunoplopes
Copy link
Member

@tru @nunoplopes Any pointers on how to resolve this code_formatter error? I see that this issue is irrelevant to my patch.

I'm on it. The last patch broke the check, sorry.

@quic-asaravan
Copy link
Contributor Author

@iajbar @sgundapa Can you please review ?

%3:intregs = PHI %21:intregs, %bb.6, %7:intregs, %bb.1 - SU0
%7:intregs = PHI %21:intregs, %bb.6, %13:intregs, %bb.1 - SU1
%27:intregs = A2_zxtb %3:intregs - SU2
%13:intregs = C2_muxri %45:predregs, 0, %46:intreg

If we have dependent phis, SU0 should be the successor of SU1 not
the other way around.

Co-authored-by: Sumanth Gundapaneni
Copy link

github-actions bot commented Apr 16, 2025

✅ With the latest revision this PR passed the undef deprecator.

@iajbar
Copy link
Contributor

iajbar commented Apr 18, 2025

LLVM.CodeGen/Hexagon/swp-dependent-phis.ll fails: error: '%dummy' defined with type 'i16' but expected 'i32'

@quic-asaravan
Copy link
Contributor Author

@iajbar I have made the changes. Can you pls review ?

@quic-asaravan
Copy link
Contributor Author

@bcahoon Can you pls review this patch ?

@bcahoon bcahoon requested a review from sgundapa April 21, 2025 14:08
Copy link
Contributor

@iajbar iajbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the failing .ll test. Thank you.

Copy link
Contributor

@kasuga-fj kasuga-fj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. I'm recently working on MachinePipeliner. Let me make a few comments.

Comment on lines +976 to 985
// Reachability check is to ensure that we do not violate DAG.
// %1:intregs = PHI %10:intregs, %bb.0, %3:intregs, %bb.1 - SU0
// %2:intregs = PHI %10:intregs, %bb.0, %1:intregs, %bb.1 - SU1
// %3:intregs = PHI %11:intregs, %bb.0, %2:intregs, %bb.1 - SU2
// S2_storerb_io %0:intregs, 0, %2:intregs
// Make sure we do not create an edge between SU2 and SU0.

if (SU->NodeNum < I.NodeNum && !I.isPred(SU))
I.addPred(SDep(SU, SDep::Barrier));
SU->addPred(SDep(&I, SDep::Barrier));
}
Copy link
Contributor

@kasuga-fj kasuga-fj Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the condition of if-statement must be updated as well. I don't think this check prevents from adding an edge from SU2 to SU0.
IIUC, in the implementation so far, the first expression SU->NodeNum < I.NodeNum is to make sure that the edge is not a back-edge, and the second one !I.isPred(SU) is to check that the edge is not redundant. The expression SU->NodeNum < I.NodeNum was sufficient because we never add a back-edge (i.e., from bottom to top), but if you want to add such one (in this case, from SU1 to SU0), this condition no longer makes sense and we must check the reachability in a different way. Also, adding back-edge here may affect other processes, e.g., the similar part in the same function.

if (SU->NodeNum < I.NodeNum && !I.isPred(SU))
I.addPred(SDep(SU, SDep::Barrier));

IMO, instead of changing an edge to "correct direction", we should give special treatment of PHI-PHI edges as needed, like isLoopCarriedDep does.

@@ -5,11 +5,13 @@
; 2, so the computation for the number of Phis needs to be adjusted when
; the incoming prolog block is from prolog 0 or prolog 1.
; Note: the pipeliner no longer generates a 3 stage pipeline for this test.
; Note: the pipeliner has been generating a 4-stage pipelined loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment one line above is wrong and can be deleted.

Comment on lines +12 to +14
; CHECK: r{{[0-9]+}} = [[REG0]]
; CHECK: endloop0
; CHECK: [[REG0]] = add(r{{[0-9]+}},#8)
; CHECK: r{{[0-9]+}} = add(r{{[0-9]+}},#8)
Copy link
Contributor

@kasuga-fj kasuga-fj Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change correct? The purpose of this test seems to verify if the value is properly propagated to epilog blocks. If it is no longer possible to check it with this case, I think it's better to use XFAIL than modifying the CHECK directives to make this test pass.

Comment on lines -54 to +56
attributes #0 = { nounwind "target-cpu"="hexagonv60" }
attributes #0 = { nounwind "target-cpu"="hexagonv68" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit Unrelated changes should be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants