Skip to content

Commit 2cf4f1d

Browse files
committed
[AArch64] New subtarget features to control ldp and stp formation, focused on ampere1 and ampere1a.
On some AArch64 cores, including Ampere's ampere1 and ampere1a architectures, load and store pair instructions are faster compared to simple loads/stores only when the alignment of the pair is at least twice that of the individual element being loaded. Based on that, this patch introduces four new subtarget features, two for controlling ldp and two for controlling stp, to cover the ampere1 and ampere1a alignment needs and to enable optional fine-grained control over ldp and stp generation in general. The latter can be utilized by another cpu, if there are possible benefits with a different policy than the default provided by the compiler. More specifically, for each of the ldp and stp respectively we have: - disable-ldp/disable-stp: Do not emit ldp/stp. - ldp-aligned-only/stp-aligned-only: Emit ldp/stp only if the source pointer is aligned to at least double the alignment of the type. Therefore, for -mcpu=ampere1 and -mcpu=ampere1a ldp-aligned-only/stp-aligned-only become the defaults because, of the benefit from the alignment, whereas for the rest of the cpus the default behaviour of the compiler is maintained.
1 parent 892f955 commit 2cf4f1d

File tree

3 files changed

+440
-2
lines changed

3 files changed

+440
-2
lines changed

llvm/lib/Target/AArch64/AArch64.td

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -570,6 +570,18 @@ def FeatureD128 : SubtargetFeature<"d128", "HasD128",
570570
"and Instructions (FEAT_D128, FEAT_LVA3, FEAT_SYSREG128, FEAT_SYSINSTR128)",
571571
[FeatureLSE128]>;
572572

573+
def FeatureDisableLdp : SubtargetFeature<"disable-ldp", "HasDisableLdp",
574+
"true", "Do not emit ldp">;
575+
576+
def FeatureDisableStp : SubtargetFeature<"disable-stp", "HasDisableStp",
577+
"true", "Do not emit stp">;
578+
579+
def FeatureLdpAlignedOnly : SubtargetFeature<"ldp-aligned-only", "HasLdpAlignedOnly",
580+
"true", "In order to emit ldp, first check if the load will be aligned to 2 * element_size">;
581+
582+
def FeatureStpAlignedOnly : SubtargetFeature<"stp-aligned-only", "HasStpAlignedOnly",
583+
"true", "In order to emit stp, first check if the store will be aligned to 2 * element_size">;
584+
573585
//===----------------------------------------------------------------------===//
574586
// Architectures.
575587
//
@@ -1239,7 +1251,9 @@ def TuneAmpere1 : SubtargetFeature<"ampere1", "ARMProcFamily", "Ampere1",
12391251
FeatureArithmeticBccFusion,
12401252
FeatureCmpBccFusion,
12411253
FeatureFuseAddress,
1242-
FeatureFuseLiterals]>;
1254+
FeatureFuseLiterals,
1255+
FeatureLdpAlignedOnly,
1256+
FeatureStpAlignedOnly]>;
12431257

12441258
def TuneAmpere1A : SubtargetFeature<"ampere1a", "ARMProcFamily", "Ampere1A",
12451259
"Ampere Computing Ampere-1A processors", [
@@ -1252,7 +1266,9 @@ def TuneAmpere1A : SubtargetFeature<"ampere1a", "ARMProcFamily", "Ampere1A",
12521266
FeatureCmpBccFusion,
12531267
FeatureFuseAddress,
12541268
FeatureFuseLiterals,
1255-
FeatureFuseLiterals]>;
1269+
FeatureFuseLiterals,
1270+
FeatureLdpAlignedOnly,
1271+
FeatureStpAlignedOnly]>;
12561272

12571273
def ProcessorFeatures {
12581274
list<SubtargetFeature> A53 = [HasV8_0aOps, FeatureCRC, FeatureCrypto,

llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2136,6 +2136,14 @@ bool AArch64LoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
21362136
if (!TII->isCandidateToMergeOrPair(MI))
21372137
return false;
21382138

2139+
// If disable-ldp feature is opted, do not emit ldp.
2140+
if (MI.mayLoad() && Subtarget->hasDisableLdp())
2141+
return false;
2142+
2143+
// If disable-stp feature is opted, do not emit stp.
2144+
if (MI.mayStore() && Subtarget->hasDisableStp())
2145+
return false;
2146+
21392147
// Early exit if the offset is not possible to match. (6 bits of positive
21402148
// range, plus allow an extra one in case we find a later insn that matches
21412149
// with Offset-1)
@@ -2159,6 +2167,31 @@ bool AArch64LoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
21592167
// Keeping the iterator straight is a pain, so we let the merge routine tell
21602168
// us what the next instruction is after it's done mucking about.
21612169
auto Prev = std::prev(MBBI);
2170+
2171+
// Fetch the memoperand of the load/store that is a candidate for
2172+
// combination.
2173+
MachineMemOperand *MemOp =
2174+
MI.memoperands_empty() ? nullptr : MI.memoperands().front();
2175+
2176+
// Get the needed alignments to check them if
2177+
// ldp-aligned-only/stp-aligned-only features are opted.
2178+
uint64_t MemAlignment = MemOp ? MemOp->getAlign().value() : -1;
2179+
uint64_t TypeAlignment = MemOp ? Align(MemOp->getSize()).value() : -1;
2180+
2181+
// If a load arrives and ldp-aligned-only feature is opted, check that the
2182+
// alignment of the source pointer is at least double the alignment of the
2183+
// type.
2184+
if (MI.mayLoad() && Subtarget->hasLdpAlignedOnly() && MemOp &&
2185+
MemAlignment < 2 * TypeAlignment)
2186+
return false;
2187+
2188+
// If a store arrives and stp-aligned-only feature is opted, check that the
2189+
// alignment of the source pointer is at least double the alignment of the
2190+
// type.
2191+
if (MI.mayStore() && Subtarget->hasStpAlignedOnly() && MemOp &&
2192+
MemAlignment < 2 * TypeAlignment)
2193+
return false;
2194+
21622195
MBBI = mergePairedInsns(MBBI, Paired, Flags);
21632196
// Collect liveness info for instructions between Prev and the new position
21642197
// MBBI.

0 commit comments

Comments
 (0)