Skip to content

Commit 02c2bf8

Browse files
authored
[RISCV] Change heuristic used for load clustering (#75341)
Split out from #73789, so as to leave that PR just for flipping load clustering to on by default. Clusters if the operations are within a cache line of each other (as AMDGPU does in shouldScheduleLoadsNear). X86 does something similar, but does `((Offset2 - Offset1) / 8 > 64)`. I'm not sure if that's intentionally set to 512 bytes or if the division is in error. Adopts the suggestion from @wangpc-pp to query the cache line size and use it if available. We also cap the maximum cluster size to cap the potential register pressure impact (which may lead to additional spills).
1 parent 4b91949 commit 02c2bf8

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2282,9 +2282,14 @@ bool RISCVInstrInfo::shouldClusterMemOps(
22822282
return false;
22832283
}
22842284

2285-
// TODO: Use a more carefully chosen heuristic, e.g. only cluster if offsets
2286-
// indicate they likely share a cache line.
2287-
return ClusterSize <= 4;
2285+
unsigned CacheLineSize =
2286+
BaseOps1.front()->getParent()->getMF()->getSubtarget().getCacheLineSize();
2287+
// Assume a cache line size of 64 bytes if no size is set in RISCVSubtarget.
2288+
CacheLineSize = CacheLineSize ? CacheLineSize : 64;
2289+
// Cluster if the memory operations are on the same or a neighbouring cache
2290+
// line, but limit the maximum ClusterSize to avoid creating too much
2291+
// additional register pressure.
2292+
return ClusterSize <= 4 && std::abs(Offset1 - Offset2) < CacheLineSize;
22882293
}
22892294

22902295
// Set BaseReg (the base register operand), Offset (the byte offset being

0 commit comments

Comments
 (0)