Skip to content

Commit 22780f8

Browse files
committed
[DAGCombiner] Fix to avoid writing outside original store in ReduceLoadOpStoreWidth (#119203)
DAGCombiner::ReduceLoadOpStoreWidth could replace memory accesses with more narrow loads/store, although sometimes the new load/store would touch memory outside the original object. That seemed wrong and this patch is simply avoiding doing the DAG combine in such situations. Also simplifying the expression used to align ShAmt down to a multiple of NewBW. Subtracting (ShAmt % NewBW) should do the same thing as the old more complicated expression. Intention is to follow up with a patch that make more attempts, trying to align the memory accesses at other offsets, allowing to trigger the transform in more situations. The current strategy for deciding size (NewBW) and offset (ShAmt) for the narrowed operations are a bit ad-hoc, and not really considering big endian memory order in same way as little endian.
1 parent bc1f3eb commit 22780f8

File tree

3 files changed

+28
-12
lines changed

3 files changed

+28
-12
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20379,10 +20379,14 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
2037920379
// either. It should be possible to improve that by using
2038020380
// ReduceLoadOpStoreWidthForceNarrowingProfitable.
2038120381

20382-
// If the lsb changed does not start at the type bitwidth boundary,
20383-
// start at the previous one.
20384-
if (ShAmt % NewBW)
20385-
ShAmt = (((ShAmt + NewBW - 1) / NewBW) * NewBW) - NewBW;
20382+
// If the lsb that is modified does not start at the type bitwidth boundary,
20383+
// align to start at the previous boundary.
20384+
ShAmt = ShAmt - (ShAmt % NewBW);
20385+
20386+
// Make sure we do not access memory outside the memory touched by the
20387+
// original load/store.
20388+
if (ShAmt + NewBW > VT.getStoreSizeInBits())
20389+
return SDValue();
2038620390

2038720391
APInt Mask = APInt::getBitsSet(BitWidth, ShAmt,
2038820392
std::min(BitWidth, ShAmt + NewBW));

llvm/test/CodeGen/ARM/dagcombine-ld-op-st.ll

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
; RUN: llc < %s -mtriple armv7 -O1 | FileCheck %s -check-prefix=CHECK-LE-NORMAL
33
; RUN: llc < %s -mtriple armv7 -O1 -combiner-reduce-load-op-store-width-force-narrowing-profitable=1 | FileCheck %s -check-prefix=CHECK-LE-NARROW
44
; RUN: llc < %s -mtriple armv7eb -O1 | FileCheck %s -check-prefix=CHECK-BE-NORMAL
5-
; XXXRUNXXX: llc < %s -mtriple armv7eb -O1 -combiner-reduce-load-op-store-width-force-narrowing-profitable=1 | FileCheck %s -check-prefix=CHECK-BE-NARROW
5+
; RUN: llc < %s -mtriple armv7eb -O1 -combiner-reduce-load-op-store-width-force-narrowing-profitable=1 | FileCheck %s -check-prefix=CHECK-BE-NARROW
66

77
; This is a reproducer for a bug when DAGCombiner::ReduceLoadOpStoreWidth
88
; would end up narrowing the load-op-store sequence into this SDNode sequence
@@ -12,12 +12,12 @@
1212
; t20: i32 = or t18, Constant:i32<65534>
1313
; t21: ch = store<(store (s32) into %ir.p1 + 8, align 8)> t18:1, t20, t17, undef:i32
1414
;
15-
; This is wrong since it accesses memory above %ir.p1+9 which is outside the
15+
; This was wrong since it accesses memory above %ir.p1+9 which is outside the
1616
; "store size" for the original store.
1717
;
18-
; For big-endian we hit an assertion due to passing a negative offset to
19-
; getMemBasePlusOffset (at least after commit 3e1b55cafc95d4ef4, while before
20-
; that commit we got load/store instructions that accessed memory at a
18+
; For big-endian we used to hit an assertion due to passing a negative offset
19+
; to getMemBasePlusOffset (at least after commit 3e1b55cafc95d4ef4, while
20+
; before that commit we got load/store instructions that accessed memory at a
2121
; negative offset from %p1).
2222
;
2323
define i16 @test(ptr %p1) {
@@ -32,10 +32,10 @@ define i16 @test(ptr %p1) {
3232
;
3333
; CHECK-LE-NARROW-LABEL: test:
3434
; CHECK-LE-NARROW: @ %bb.0: @ %entry
35-
; CHECK-LE-NARROW-NEXT: ldr r1, [r0, #8]
35+
; CHECK-LE-NARROW-NEXT: ldrh r1, [r0, #8]
3636
; CHECK-LE-NARROW-NEXT: movw r2, #65534
3737
; CHECK-LE-NARROW-NEXT: orr r1, r1, r2
38-
; CHECK-LE-NARROW-NEXT: str r1, [r0, #8]
38+
; CHECK-LE-NARROW-NEXT: strh r1, [r0, #8]
3939
; CHECK-LE-NARROW-NEXT: mov r0, #0
4040
; CHECK-LE-NARROW-NEXT: bx lr
4141
;
@@ -47,6 +47,15 @@ define i16 @test(ptr %p1) {
4747
; CHECK-BE-NORMAL-NEXT: strh r1, [r0]
4848
; CHECK-BE-NORMAL-NEXT: mov r0, #0
4949
; CHECK-BE-NORMAL-NEXT: bx lr
50+
;
51+
; CHECK-BE-NARROW-LABEL: test:
52+
; CHECK-BE-NARROW: @ %bb.0: @ %entry
53+
; CHECK-BE-NARROW-NEXT: ldrh r1, [r0]
54+
; CHECK-BE-NARROW-NEXT: movw r2, #65534
55+
; CHECK-BE-NARROW-NEXT: orr r1, r1, r2
56+
; CHECK-BE-NARROW-NEXT: strh r1, [r0]
57+
; CHECK-BE-NARROW-NEXT: mov r0, #0
58+
; CHECK-BE-NARROW-NEXT: bx lr
5059
entry:
5160
%load = load i80, ptr %p1, align 32
5261
%mask = shl i80 -1, 65

llvm/test/CodeGen/X86/store_op_load_fold.ll

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,10 @@ define void @test2() nounwind uwtable ssp {
2323
; CHECK-LABEL: test2:
2424
; CHECK: ## %bb.0:
2525
; CHECK-NEXT: movl L_s2$non_lazy_ptr, %eax
26-
; CHECK-NEXT: andl $-262144, 20(%eax) ## imm = 0xFFFC0000
26+
; CHECK-NEXT: movzbl 22(%eax), %ecx
27+
; CHECK-NEXT: andl $-4, %ecx
28+
; CHECK-NEXT: movb %cl, 22(%eax)
29+
; CHECK-NEXT: movw $0, 20(%eax)
2730
; CHECK-NEXT: retl
2831
%bf.load35 = load i56, ptr getelementptr inbounds (%struct.S2, ptr @s2, i32 0, i32 5), align 16
2932
%bf.clear36 = and i56 %bf.load35, -1125895611875329

0 commit comments

Comments
 (0)