Skip to content

Commit a91b0d2

Browse files
authored
[PowerPC] hoist xxspltiw instruction out of the loop with FMA mutation pass. (#111696)
Summary: The patch fixes the issue [[PowerPC] missing VSX FMA Mutation optimize in some case for option -schedule-ppc-vsx-fma-mutation-early #111906](#111906) In certain cases, the Register Coalescer pass—which eliminates COPY instructions—can interfere with the PowerPC VSX FMA Mutation pass. Specifically, it can prevent the mutation of a COPY adjacent to an XSMADDADP into a single XSMADDMDP instruction. As a result, the xxspltiw instruction is not hoisted out of the loop as expected, leading to missed optimization opportunities. To address this, the patch ensures that the `VSX FMA Mutation` pass runs before the `Register Coalescer` pass when the -schedule-ppc-vsx-fma-mutation-early option is enabled.
1 parent 8830e38 commit a91b0d2

File tree

2 files changed

+8
-7
lines changed

2 files changed

+8
-7
lines changed

llvm/lib/Target/PowerPC/PPCTargetMachine.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -559,7 +559,8 @@ void PPCPassConfig::addMachineSSAOptimization() {
559559

560560
void PPCPassConfig::addPreRegAlloc() {
561561
if (getOptLevel() != CodeGenOptLevel::None) {
562-
insertPass(VSXFMAMutateEarly ? &RegisterCoalescerID : &MachineSchedulerID,
562+
insertPass(VSXFMAMutateEarly ? &TwoAddressInstructionPassID
563+
: &MachineSchedulerID,
563564
&PPCVSXFMAMutateID);
564565
}
565566

llvm/test/CodeGen/PowerPC/vsx-fma-m-early.ll

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -69,14 +69,14 @@ declare <4 x i32> @llvm.ppc.vsx.xvcmpgtsp(<4 x float>, <4 x float>)
6969
; CHECK64-NEXT: bltlr cr0
7070
; CHECK64-NEXT: # %bb.1: # %for.body.preheader
7171
; CHECK64-NEXT: xxspltiw vs0, 1069066811
72+
; CHECK64-NEXT: xxspltiw vs1, 1170469888
7273
; CHECK64-NEXT: mtctr r5
7374
; CHECK64-NEXT: li r5, 0
7475
; CHECK64-NEXT: {{.*}}align 5
7576
; CHECK64-NEXT: [[L2_bar:.*]]: # %for.body
7677
; CHECK64-NEXT: # =>This Inner Loop Header: Depth=1
77-
; CHECK64-NEXT: lxvx vs1, r4, r5
78-
; CHECK64-NEXT: xxspltiw vs2, 1170469888
79-
; CHECK64-NEXT: xvmaddasp vs2, vs1, vs0
78+
; CHECK64-NEXT: lxvx vs2, r4, r5
79+
; CHECK64-NEXT: xvmaddmsp vs2, vs0, vs1
8080
; CHECK64-NEXT: stxvx vs2, r3, r5
8181
; CHECK64-NEXT: addi r5, r5, 16
8282
; CHECK64-NEXT: bdnz [[L2_bar]]
@@ -139,17 +139,17 @@ declare <4 x i32> @llvm.ppc.vsx.xvcmpgtsp(<4 x float>, <4 x float>)
139139
; CHECK32-NEXT: blelr cr0
140140
; CHECK32-NEXT: # %bb.1: # %for.body.preheader
141141
; CHECK32-NEXT: xxspltiw vs0, 1069066811
142+
; CHECK32-NEXT: xxspltiw vs1, 1170469888
142143
; CHECK32-NEXT: li r6, 0
143144
; CHECK32-NEXT: li r7, 0
144145
; CHECK32-NEXT: .align 4
145146
; CHECK32-NEXT: [[L2_foo:.*]]: # %for.body
146147
; CHECK32-NEXT: # =>This Inner Loop Header: Depth=1
147148
; CHECK32-NEXT: slwi r8, r7, 4
148-
; CHECK32-NEXT: xxspltiw vs2, 1170469888
149149
; CHECK32-NEXT: addic r7, r7, 1
150150
; CHECK32-NEXT: addze r6, r6
151-
; CHECK32-NEXT: lxvx vs1, r4, r8
152-
; CHECK32-NEXT: xvmaddasp vs2, vs1, vs0
151+
; CHECK32-NEXT: lxvx vs2, r4, r8
152+
; CHECK32-NEXT: xvmaddmsp vs2, vs0, vs1
153153
; CHECK32-NEXT: stxvx vs2, r3, r8
154154
; CHECK32-NEXT: xor r8, r7, r5
155155
; CHECK32-NEXT: or. r8, r8, r6

0 commit comments

Comments
 (0)