Skip to content

Commit 58a8800

Browse files
authored
PeepholeOpt: Fix looking for def of current copy to coalesce (#125533)
This fixes the handling of subregister extract copies. This will allow AMDGPU to remove its implementation of shouldRewriteCopySrc, which exists as a 10 year old workaround to this bug. peephole-opt-fold-reg-sequence-subreg.mir will show the expected improvement once the custom implementation is removed. The copy coalescing processing here is overly abstracted from what's actually happening. Previously when visiting coalescable copy-like instructions, we would parse the sources one at a time and then pass the def of the root instruction into findNextSource. This means that the first thing the new ValueTracker constructed would do is getVRegDef to find the instruction we are currently processing. This adds an unnecessary step, placing a useless entry in the RewriteMap, and required skipping the no-op case where getNewSource would return the original source operand. This was a problem since in the case of a subregister extract, shouldRewriteCopySource would always say that it is useful to rewrite and the use-def chain walk would abort, returning the original operand. Move the process to start looking at the source operand to begin with. This does not fix the confused handling in the uncoalescable copy case which is proving to be more difficult. Some currently handled cases have multiple defs from a single source, and other handled cases have 0 input operands. It would be simpler if this was implemented with isCopyLikeInstr, rather than guessing at the operand structure as it does now. There are some improvements and some regressions. The regressions appear to be downstream issues for the most part. One of the uglier regressions is in PPC, where a sequence of insert_subrgs is used to build registers. I opened #125502 to use reg_sequence instead, which may help. The worst regression is an absurd SPARC testcase using a <251 x fp128>, which uses a very long chain of insert_subregs. We need improved subregister handling locally in PeepholeOptimizer, and other pasess like MachineCSE to fix some of the other regressions. We should handle subregister composes and folding more indexes into insert_subreg and reg_sequence.
1 parent 92e3cd7 commit 58a8800

File tree

104 files changed

+2720
-2716
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

104 files changed

+2720
-2716
lines changed

llvm/lib/CodeGen/PeepholeOptimizer.cpp

Lines changed: 32 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,8 @@ class PeepholeOptimizer : private MachineFunction::Delegate {
465465
bool optimizeUncoalescableCopy(MachineInstr &MI,
466466
SmallPtrSetImpl<MachineInstr *> &LocalMIs);
467467
bool optimizeRecurrence(MachineInstr &PHI);
468-
bool findNextSource(RegSubRegPair RegSubReg, RewriteMapTy &RewriteMap);
468+
bool findNextSource(const TargetRegisterClass *DefRC, unsigned DefSubReg,
469+
RegSubRegPair RegSubReg, RewriteMapTy &RewriteMap);
469470
bool isMoveImmediate(MachineInstr &MI, SmallSet<Register, 4> &ImmDefRegs,
470471
DenseMap<Register, MachineInstr *> &ImmDefMIs);
471472
bool foldImmediate(MachineInstr &MI, SmallSet<Register, 4> &ImmDefRegs,
@@ -991,8 +992,10 @@ bool PeepholeOptimizer::optimizeCondBranch(MachineInstr &MI) {
991992
return TII->optimizeCondBranch(MI);
992993
}
993994

994-
/// Try to find the next source that share the same register file
995-
/// for the value defined by \p Reg and \p SubReg.
995+
/// Try to find a better source value that shares the same register file to
996+
/// replace \p RegSubReg in an instruction like
997+
/// `DefRC.DefSubReg = COPY RegSubReg`
998+
///
996999
/// When true is returned, the \p RewriteMap can be used by the client to
9971000
/// retrieve all Def -> Use along the way up to the next source. Any found
9981001
/// Use that is not itself a key for another entry, is the next source to
@@ -1002,17 +1005,15 @@ bool PeepholeOptimizer::optimizeCondBranch(MachineInstr &MI) {
10021005
/// share the same register file as \p Reg and \p SubReg. The client should
10031006
/// then be capable to rewrite all intermediate PHIs to get the next source.
10041007
/// \return False if no alternative sources are available. True otherwise.
1005-
bool PeepholeOptimizer::findNextSource(RegSubRegPair RegSubReg,
1008+
bool PeepholeOptimizer::findNextSource(const TargetRegisterClass *DefRC,
1009+
unsigned DefSubReg,
1010+
RegSubRegPair RegSubReg,
10061011
RewriteMapTy &RewriteMap) {
10071012
// Do not try to find a new source for a physical register.
10081013
// So far we do not have any motivating example for doing that.
10091014
// Thus, instead of maintaining untested code, we will revisit that if
10101015
// that changes at some point.
10111016
Register Reg = RegSubReg.Reg;
1012-
if (Reg.isPhysical())
1013-
return false;
1014-
const TargetRegisterClass *DefRC = MRI->getRegClass(Reg);
1015-
10161017
SmallVector<RegSubRegPair, 4> SrcToLook;
10171018
RegSubRegPair CurSrcPair = RegSubReg;
10181019
SrcToLook.push_back(CurSrcPair);
@@ -1076,7 +1077,7 @@ bool PeepholeOptimizer::findNextSource(RegSubRegPair RegSubReg,
10761077

10771078
// Keep following the chain if the value isn't any better yet.
10781079
const TargetRegisterClass *SrcRC = MRI->getRegClass(CurSrcPair.Reg);
1079-
if (!TRI->shouldRewriteCopySrc(DefRC, RegSubReg.SubReg, SrcRC,
1080+
if (!TRI->shouldRewriteCopySrc(DefRC, DefSubReg, SrcRC,
10801081
CurSrcPair.SubReg))
10811082
continue;
10821083

@@ -1184,21 +1185,33 @@ bool PeepholeOptimizer::optimizeCoalescableCopyImpl(Rewriter &&CpyRewriter) {
11841185
bool Changed = false;
11851186
// Get the right rewriter for the current copy.
11861187
// Rewrite each rewritable source.
1187-
RegSubRegPair Src;
1188+
RegSubRegPair Dst;
11881189
RegSubRegPair TrackPair;
1189-
while (CpyRewriter.getNextRewritableSource(Src, TrackPair)) {
1190+
while (CpyRewriter.getNextRewritableSource(TrackPair, Dst)) {
1191+
if (Dst.Reg.isPhysical()) {
1192+
// Do not try to find a new source for a physical register.
1193+
// So far we do not have any motivating example for doing that.
1194+
// Thus, instead of maintaining untested code, we will revisit that if
1195+
// that changes at some point.
1196+
continue;
1197+
}
1198+
1199+
const TargetRegisterClass *DefRC = MRI->getRegClass(Dst.Reg);
1200+
11901201
// Keep track of PHI nodes and its incoming edges when looking for sources.
11911202
RewriteMapTy RewriteMap;
11921203
// Try to find a more suitable source. If we failed to do so, or get the
11931204
// actual source, move to the next source.
1194-
if (!findNextSource(TrackPair, RewriteMap))
1205+
if (!findNextSource(DefRC, Dst.SubReg, TrackPair, RewriteMap))
11951206
continue;
11961207

11971208
// Get the new source to rewrite. TODO: Only enable handling of multiple
11981209
// sources (PHIs) once we have a motivating example and testcases for it.
11991210
RegSubRegPair NewSrc = getNewSource(MRI, TII, TrackPair, RewriteMap,
12001211
/*HandleMultipleSources=*/false);
1201-
if (Src.Reg == NewSrc.Reg || NewSrc.Reg == 0)
1212+
assert(TrackPair.Reg != NewSrc.Reg &&
1213+
"should not rewrite source to original value");
1214+
if (!NewSrc.Reg)
12021215
continue;
12031216

12041217
// Rewrite source.
@@ -1325,9 +1338,14 @@ bool PeepholeOptimizer::optimizeUncoalescableCopy(
13251338
if (Def.Reg.isPhysical())
13261339
return false;
13271340

1341+
// FIXME: Uncoalescable copies are treated differently by
1342+
// UncoalescableRewriter, and this probably should not share
1343+
// API. getNextRewritableSource really finds rewritable defs.
1344+
const TargetRegisterClass *DefRC = MRI->getRegClass(Def.Reg);
1345+
13281346
// If we do not know how to rewrite this definition, there is no point
13291347
// in trying to kill this instruction.
1330-
if (!findNextSource(Def, RewriteMap))
1348+
if (!findNextSource(DefRC, Def.SubReg, Def, RewriteMap))
13311349
return false;
13321350

13331351
RewritePairs.push_back(Def);

llvm/test/CodeGen/AArch64/Atomics/aarch64-atomicrmw-lse2.ll

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_monotonic(ptr %ptr, i8 %value) {
1212
; -O0: subs w8, w8, w10, uxtb
1313
;
1414
; -O1-LABEL: atomicrmw_xchg_i8_aligned_monotonic:
15-
; -O1: ldxrb w8, [x0]
16-
; -O1: stxrb w9, w1, [x0]
15+
; -O1: ldxrb w0, [x8]
16+
; -O1: stxrb w9, w1, [x8]
1717
%r = atomicrmw xchg ptr %ptr, i8 %value monotonic, align 1
1818
ret i8 %r
1919
}
@@ -27,8 +27,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_acquire(ptr %ptr, i8 %value) {
2727
; -O0: subs w8, w8, w10, uxtb
2828
;
2929
; -O1-LABEL: atomicrmw_xchg_i8_aligned_acquire:
30-
; -O1: ldaxrb w8, [x0]
31-
; -O1: stxrb w9, w1, [x0]
30+
; -O1: ldaxrb w0, [x8]
31+
; -O1: stxrb w9, w1, [x8]
3232
%r = atomicrmw xchg ptr %ptr, i8 %value acquire, align 1
3333
ret i8 %r
3434
}
@@ -42,8 +42,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_release(ptr %ptr, i8 %value) {
4242
; -O0: subs w8, w8, w10, uxtb
4343
;
4444
; -O1-LABEL: atomicrmw_xchg_i8_aligned_release:
45-
; -O1: ldxrb w8, [x0]
46-
; -O1: stlxrb w9, w1, [x0]
45+
; -O1: ldxrb w0, [x8]
46+
; -O1: stlxrb w9, w1, [x8]
4747
%r = atomicrmw xchg ptr %ptr, i8 %value release, align 1
4848
ret i8 %r
4949
}
@@ -57,8 +57,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_acq_rel(ptr %ptr, i8 %value) {
5757
; -O0: subs w8, w8, w10, uxtb
5858
;
5959
; -O1-LABEL: atomicrmw_xchg_i8_aligned_acq_rel:
60-
; -O1: ldaxrb w8, [x0]
61-
; -O1: stlxrb w9, w1, [x0]
60+
; -O1: ldaxrb w0, [x8]
61+
; -O1: stlxrb w9, w1, [x8]
6262
%r = atomicrmw xchg ptr %ptr, i8 %value acq_rel, align 1
6363
ret i8 %r
6464
}
@@ -72,8 +72,8 @@ define dso_local i8 @atomicrmw_xchg_i8_aligned_seq_cst(ptr %ptr, i8 %value) {
7272
; -O0: subs w8, w8, w10, uxtb
7373
;
7474
; -O1-LABEL: atomicrmw_xchg_i8_aligned_seq_cst:
75-
; -O1: ldaxrb w8, [x0]
76-
; -O1: stlxrb w9, w1, [x0]
75+
; -O1: ldaxrb w0, [x8]
76+
; -O1: stlxrb w9, w1, [x8]
7777
%r = atomicrmw xchg ptr %ptr, i8 %value seq_cst, align 1
7878
ret i8 %r
7979
}
@@ -86,8 +86,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_monotonic(ptr %ptr, i16 %value)
8686
; -O0: subs w8, w8, w9, uxth
8787
;
8888
; -O1-LABEL: atomicrmw_xchg_i16_aligned_monotonic:
89-
; -O1: ldxrh w8, [x0]
90-
; -O1: stxrh w9, w1, [x0]
89+
; -O1: ldxrh w0, [x8]
90+
; -O1: stxrh w9, w1, [x8]
9191
%r = atomicrmw xchg ptr %ptr, i16 %value monotonic, align 2
9292
ret i16 %r
9393
}
@@ -100,8 +100,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_acquire(ptr %ptr, i16 %value) {
100100
; -O0: subs w8, w8, w9, uxth
101101
;
102102
; -O1-LABEL: atomicrmw_xchg_i16_aligned_acquire:
103-
; -O1: ldaxrh w8, [x0]
104-
; -O1: stxrh w9, w1, [x0]
103+
; -O1: ldaxrh w0, [x8]
104+
; -O1: stxrh w9, w1, [x8]
105105
%r = atomicrmw xchg ptr %ptr, i16 %value acquire, align 2
106106
ret i16 %r
107107
}
@@ -114,8 +114,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_release(ptr %ptr, i16 %value) {
114114
; -O0: subs w8, w8, w9, uxth
115115
;
116116
; -O1-LABEL: atomicrmw_xchg_i16_aligned_release:
117-
; -O1: ldxrh w8, [x0]
118-
; -O1: stlxrh w9, w1, [x0]
117+
; -O1: ldxrh w0, [x8]
118+
; -O1: stlxrh w9, w1, [x8]
119119
%r = atomicrmw xchg ptr %ptr, i16 %value release, align 2
120120
ret i16 %r
121121
}
@@ -128,8 +128,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_acq_rel(ptr %ptr, i16 %value) {
128128
; -O0: subs w8, w8, w9, uxth
129129
;
130130
; -O1-LABEL: atomicrmw_xchg_i16_aligned_acq_rel:
131-
; -O1: ldaxrh w8, [x0]
132-
; -O1: stlxrh w9, w1, [x0]
131+
; -O1: ldaxrh w0, [x8]
132+
; -O1: stlxrh w9, w1, [x8]
133133
%r = atomicrmw xchg ptr %ptr, i16 %value acq_rel, align 2
134134
ret i16 %r
135135
}
@@ -142,8 +142,8 @@ define dso_local i16 @atomicrmw_xchg_i16_aligned_seq_cst(ptr %ptr, i16 %value) {
142142
; -O0: subs w8, w8, w9, uxth
143143
;
144144
; -O1-LABEL: atomicrmw_xchg_i16_aligned_seq_cst:
145-
; -O1: ldaxrh w8, [x0]
146-
; -O1: stlxrh w9, w1, [x0]
145+
; -O1: ldaxrh w0, [x8]
146+
; -O1: stlxrh w9, w1, [x8]
147147
%r = atomicrmw xchg ptr %ptr, i16 %value seq_cst, align 2
148148
ret i16 %r
149149
}
@@ -392,8 +392,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_monotonic(ptr %ptr, i8 %value)
392392
; -O0: subs w8, w8, w10, uxtb
393393
;
394394
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_monotonic:
395-
; -O1: ldxrb w8, [x0]
396-
; -O1: stxrb w9, w1, [x0]
395+
; -O1: ldxrb w0, [x8]
396+
; -O1: stxrb w9, w1, [x8]
397397
%r = atomicrmw xchg ptr %ptr, i8 %value monotonic, align 1
398398
ret i8 %r
399399
}
@@ -407,8 +407,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_acquire(ptr %ptr, i8 %value) {
407407
; -O0: subs w8, w8, w10, uxtb
408408
;
409409
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_acquire:
410-
; -O1: ldaxrb w8, [x0]
411-
; -O1: stxrb w9, w1, [x0]
410+
; -O1: ldaxrb w0, [x8]
411+
; -O1: stxrb w9, w1, [x8]
412412
%r = atomicrmw xchg ptr %ptr, i8 %value acquire, align 1
413413
ret i8 %r
414414
}
@@ -422,8 +422,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_release(ptr %ptr, i8 %value) {
422422
; -O0: subs w8, w8, w10, uxtb
423423
;
424424
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_release:
425-
; -O1: ldxrb w8, [x0]
426-
; -O1: stlxrb w9, w1, [x0]
425+
; -O1: ldxrb w0, [x8]
426+
; -O1: stlxrb w9, w1, [x8]
427427
%r = atomicrmw xchg ptr %ptr, i8 %value release, align 1
428428
ret i8 %r
429429
}
@@ -437,8 +437,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_acq_rel(ptr %ptr, i8 %value) {
437437
; -O0: subs w8, w8, w10, uxtb
438438
;
439439
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_acq_rel:
440-
; -O1: ldaxrb w8, [x0]
441-
; -O1: stlxrb w9, w1, [x0]
440+
; -O1: ldaxrb w0, [x8]
441+
; -O1: stlxrb w9, w1, [x8]
442442
%r = atomicrmw xchg ptr %ptr, i8 %value acq_rel, align 1
443443
ret i8 %r
444444
}
@@ -452,8 +452,8 @@ define dso_local i8 @atomicrmw_xchg_i8_unaligned_seq_cst(ptr %ptr, i8 %value) {
452452
; -O0: subs w8, w8, w10, uxtb
453453
;
454454
; -O1-LABEL: atomicrmw_xchg_i8_unaligned_seq_cst:
455-
; -O1: ldaxrb w8, [x0]
456-
; -O1: stlxrb w9, w1, [x0]
455+
; -O1: ldaxrb w0, [x8]
456+
; -O1: stlxrb w9, w1, [x8]
457457
%r = atomicrmw xchg ptr %ptr, i8 %value seq_cst, align 1
458458
ret i8 %r
459459
}

0 commit comments

Comments
 (0)