-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[RISCV] Generate MIPS load/store pair instructions #124717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Djordje Todorovic (djtodoro) ChangesIntroduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. It bundles loads and stores that operate on consecutive memory locations to take the advantage of hardware load/store bonding. This is part of MIPS extensions for the p8700 CPU. Patch is 39.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124717.diff 9 Files Affected:
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 98d3615ebab58d..7a533841c82700 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -48,6 +48,7 @@ add_llvm_target(RISCVCodeGen
RISCVISelLowering.cpp
RISCVLandingPadSetup.cpp
RISCVMachineFunctionInfo.cpp
+ RISCVLoadStoreOptimizer.cpp
RISCVMergeBaseOffset.cpp
RISCVOptWInstrs.cpp
RISCVPostRAExpandPseudoInsts.cpp
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bb9ebedeea4f7b..ce11db367f1611 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -2738,6 +2738,46 @@ MachineInstr *RISCVInstrInfo::emitLdStWithAddr(MachineInstr &MemI,
.setMIFlags(MemI.getFlags());
}
+bool RISCVInstrInfo::isPairableLdStInstOpc(unsigned Opc) {
+ switch (Opc) {
+ default:
+ return false;
+ case RISCV::SH:
+ case RISCV::LH:
+ case RISCV::LHU:
+ case RISCV::SW:
+ case RISCV::FSW:
+ case RISCV::LW:
+ case RISCV::FLW:
+ case RISCV::SD:
+ case RISCV::FSD:
+ case RISCV::LD:
+ case RISCV::FLD:
+ return true;
+ }
+}
+
+bool RISCVInstrInfo::isLdStSafeToPair(const MachineInstr &LdSt,
+ const TargetRegisterInfo *TRI) {
+ // If this is a volatile load/store, don't mess with it.
+ if (LdSt.hasOrderedMemoryRef() || LdSt.getNumExplicitOperands() != 3)
+ return false;
+
+ if (LdSt.getOperand(1).isFI())
+ return true;
+
+ assert(LdSt.getOperand(1).isReg() && "Expected a reg operand.");
+ // Can't cluster if the instruction modifies the base register
+ // or it is update form. e.g. ld x5,8(x5)
+ if (LdSt.modifiesRegister(LdSt.getOperand(1).getReg(), TRI))
+ return false;
+
+ if (!LdSt.getOperand(2).isImm())
+ return false;
+
+ return true;
+}
+
bool RISCVInstrInfo::getMemOperandsWithOffsetWidth(
const MachineInstr &LdSt, SmallVectorImpl<const MachineOperand *> &BaseOps,
int64_t &Offset, bool &OffsetIsScalable, LocationSize &Width,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 1c81719c767ecb..64cfb243b7bdb6 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -300,6 +300,12 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
std::unique_ptr<TargetInstrInfo::PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const override;
+ /// Return true if pairing the given load or store may be paired with another.
+ static bool isPairableLdStInstOpc(unsigned Opc);
+
+ static bool isLdStSafeToPair(const MachineInstr &LdSt,
+ const TargetRegisterInfo *TRI);
+
protected:
const RISCVSubtarget &STI;
diff --git a/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
new file mode 100644
index 00000000000000..1421ede686d26a
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
@@ -0,0 +1,364 @@
+//===----- RISCVLoadStoreOptimizer.cpp ------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Bundle loads and stores that operate on consecutive memory locations to take
+// the advantage of hardware load/store bonding.
+//
+//===----------------------------------------------------------------------===//
+
+#include "RISCV.h"
+#include "RISCVTargetMachine.h"
+#include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/MC/TargetRegistry.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetOptions.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-load-store-opt"
+#define RISCV_LOAD_STORE_OPT_NAME "RISCV Load / Store Optimizer"
+namespace {
+
+struct RISCVLoadStoreOpt : public MachineFunctionPass {
+ static char ID;
+ bool runOnMachineFunction(MachineFunction &Fn) override;
+
+ RISCVLoadStoreOpt() : MachineFunctionPass(ID) {}
+
+ MachineFunctionProperties getRequiredProperties() const override {
+ return MachineFunctionProperties().set(
+ MachineFunctionProperties::Property::NoVRegs);
+ }
+
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.addRequired<AAResultsWrapperPass>();
+ MachineFunctionPass::getAnalysisUsage(AU);
+ }
+
+ StringRef getPassName() const override { return RISCV_LOAD_STORE_OPT_NAME; }
+
+ // Find and pair load/store instructions.
+ bool tryToPairLdStInst(MachineBasicBlock::iterator &MBBI);
+
+ // Convert load/store pairs to single instructions.
+ bool tryConvertToLdStPair(MachineBasicBlock::iterator First,
+ MachineBasicBlock::iterator Second);
+
+ // Scan the instructions looking for a load/store that can be combined
+ // with the current instruction into a load/store pair.
+ // Return the matching instruction if one is found, else MBB->end().
+ MachineBasicBlock::iterator findMatchingInsn(MachineBasicBlock::iterator I,
+ bool &MergeForward);
+
+ MachineBasicBlock::iterator
+ mergePairedInsns(MachineBasicBlock::iterator I,
+ MachineBasicBlock::iterator Paired, bool MergeForward);
+
+private:
+ AliasAnalysis *AA;
+ MachineRegisterInfo *MRI;
+ const RISCVInstrInfo *TII;
+ const RISCVRegisterInfo *TRI;
+ LiveRegUnits ModifiedRegUnits, UsedRegUnits;
+};
+} // end anonymous namespace
+
+char RISCVLoadStoreOpt::ID = 0;
+INITIALIZE_PASS(RISCVLoadStoreOpt, DEBUG_TYPE, RISCV_LOAD_STORE_OPT_NAME, false,
+ false)
+
+bool RISCVLoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) {
+ if (skipFunction(Fn.getFunction()))
+ return false;
+ const RISCVSubtarget &Subtarget = Fn.getSubtarget<RISCVSubtarget>();
+ if (!Subtarget.useLoadStorePairs())
+ return false;
+
+ bool MadeChange = false;
+ TII = Subtarget.getInstrInfo();
+ TRI = Subtarget.getRegisterInfo();
+ MRI = &Fn.getRegInfo();
+ AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
+ ModifiedRegUnits.init(*TRI);
+ UsedRegUnits.init(*TRI);
+
+ for (MachineBasicBlock &MBB : Fn) {
+ LLVM_DEBUG(dbgs() << "MBB: " << MBB.getName() << "\n");
+
+ for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
+ MBBI != E;) {
+ if (TII->isPairableLdStInstOpc(MBBI->getOpcode()) &&
+ tryToPairLdStInst(MBBI))
+ MadeChange = true;
+ else
+ ++MBBI;
+ }
+ }
+ return MadeChange;
+}
+
+// Find loads and stores that can be merged into a single load or store pair
+// instruction.
+bool RISCVLoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
+ MachineInstr &MI = *MBBI;
+ MachineBasicBlock::iterator E = MI.getParent()->end();
+
+ if (!TII->isLdStSafeToPair(MI, TRI))
+ return false;
+
+ // Look ahead for a pairable instruction.
+ bool MergeForward;
+ MachineBasicBlock::iterator Paired = findMatchingInsn(MBBI, MergeForward);
+ if (Paired != E) {
+ MBBI = mergePairedInsns(MBBI, Paired, MergeForward);
+ return true;
+ }
+ return false;
+}
+
+bool RISCVLoadStoreOpt::tryConvertToLdStPair(
+ MachineBasicBlock::iterator First, MachineBasicBlock::iterator Second) {
+ unsigned PairOpc;
+ // TODO: Handle the rest from RISCVInstrInfo::isPairableLdStInstOpc.
+ switch (First->getOpcode()) {
+ default:
+ return false;
+ case RISCV::SW:
+ PairOpc = RISCV::SWP;
+ break;
+ case RISCV::LW:
+ PairOpc = RISCV::LWP;
+ break;
+ case RISCV::SD:
+ PairOpc = RISCV::SDP;
+ break;
+ case RISCV::LD:
+ PairOpc = RISCV::LDP;
+ break;
+ }
+
+ MachineFunction *MF = First->getMF();
+ const MachineMemOperand *MMO = *First->memoperands_begin();
+ Align MMOAlign = MMO->getAlign();
+ if (const PseudoSourceValue *Source = MMO->getPseudoValue())
+ if (Source->kind() == PseudoSourceValue::FixedStack)
+ MMOAlign = MF->getSubtarget().getFrameLowering()->getStackAlign();
+
+ if (MMOAlign < Align(MMO->getSize().getValue() * 2))
+ return false;
+ int64_t Offset = First->getOperand(2).getImm();
+ if (!isUInt<7>(Offset) ||
+ !isAligned(Align(MMO->getSize().getValue()), Offset))
+ return false;
+ MachineInstrBuilder MIB = BuildMI(
+ *MF,
+ First->getDebugLoc().get() ? First->getDebugLoc() : Second->getDebugLoc(),
+ TII->get(PairOpc));
+ MIB.add(First->getOperand(0))
+ .add(Second->getOperand(0))
+ .add(First->getOperand(1))
+ .add(First->getOperand(2))
+ .cloneMergedMemRefs({&*First, &*Second});
+
+ First->getParent()->insert(First, MIB);
+
+ First->removeFromParent();
+ Second->removeFromParent();
+
+ return true;
+}
+
+static bool mayAlias(MachineInstr &MIa,
+ SmallVectorImpl<MachineInstr *> &MemInsns,
+ AliasAnalysis *AA) {
+ for (MachineInstr *MIb : MemInsns)
+ if (MIa.mayAlias(AA, *MIb, /*UseTBAA*/ false))
+ return true;
+
+ return false;
+}
+
+// Scan the instructions looking for a load/store that can be combined with the
+// current instruction into a wider equivalent or a load/store pair.
+// TODO: Extend pairing logic to consider reordering both instructions
+// to a safe "middle" position rather than only merging forward/backward.
+// This requires more sophisticated checks for aliasing, register
+// liveness, and potential scheduling hazards.
+MachineBasicBlock::iterator
+RISCVLoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
+ bool &MergeForward) {
+ MachineBasicBlock::iterator E = I->getParent()->end();
+ MachineBasicBlock::iterator MBBI = I;
+ MachineInstr &FirstMI = *I;
+ MBBI = next_nodbg(MBBI, E);
+
+ bool MayLoad = FirstMI.mayLoad();
+ Register Reg = FirstMI.getOperand(0).getReg();
+ Register BaseReg = FirstMI.getOperand(1).getReg();
+ int Offset = FirstMI.getOperand(2).getImm();
+ int OffsetStride = (*FirstMI.memoperands_begin())->getSize().getValue();
+
+ MergeForward = false;
+
+ // Track which register units have been modified and used between the first
+ // insn (inclusive) and the second insn.
+ ModifiedRegUnits.clear();
+ UsedRegUnits.clear();
+
+ // Remember any instructions that read/write memory between FirstMI and MI.
+ SmallVector<MachineInstr *, 4> MemInsns;
+
+ for (unsigned Count = 0; MBBI != E && Count < 128;
+ MBBI = next_nodbg(MBBI, E)) {
+ MachineInstr &MI = *MBBI;
+
+ // Don't count transient instructions towards the search limit since there
+ // may be different numbers of them if e.g. debug information is present.
+ if (!MI.isTransient())
+ ++Count;
+
+ if (MI.getOpcode() == FirstMI.getOpcode() &&
+ TII->isLdStSafeToPair(MI, TRI)) {
+ Register MIBaseReg = MI.getOperand(1).getReg();
+ int MIOffset = MI.getOperand(2).getImm();
+
+ if (BaseReg == MIBaseReg) {
+
+ if ((Offset != MIOffset + OffsetStride) &&
+ (Offset + OffsetStride != MIOffset)) {
+ LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
+ TRI);
+ MemInsns.push_back(&MI);
+ continue;
+ }
+
+ // If the destination register of one load is the same register or a
+ // sub/super register of the other load, bail and keep looking.
+ if (MayLoad &&
+ TRI->isSuperOrSubRegisterEq(Reg, MI.getOperand(0).getReg())) {
+ LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
+ TRI);
+ MemInsns.push_back(&MI);
+ continue;
+ }
+
+ // If the BaseReg has been modified, then we cannot do the optimization.
+ if (!ModifiedRegUnits.available(BaseReg))
+ return E;
+
+ // If the Rt of the second instruction was not modified or used between
+ // the two instructions and none of the instructions between the second
+ // and first alias with the second, we can combine the second into the
+ // first.
+ if (ModifiedRegUnits.available(MI.getOperand(0).getReg()) &&
+ !(MI.mayLoad() &&
+ !UsedRegUnits.available(MI.getOperand(0).getReg())) &&
+ !mayAlias(MI, MemInsns, AA)) {
+
+ MergeForward = false;
+ return MBBI;
+ }
+
+ // Likewise, if the Rt of the first instruction is not modified or used
+ // between the two instructions and none of the instructions between the
+ // first and the second alias with the first, we can combine the first
+ // into the second.
+ if (!(MayLoad &&
+ !UsedRegUnits.available(FirstMI.getOperand(0).getReg())) &&
+ !mayAlias(FirstMI, MemInsns, AA)) {
+
+ if (ModifiedRegUnits.available(FirstMI.getOperand(0).getReg())) {
+ MergeForward = true;
+ return MBBI;
+ }
+ }
+ // Unable to combine these instructions due to interference in between.
+ // Keep looking.
+ }
+ }
+
+ // If the instruction wasn't a matching load or store. Stop searching if we
+ // encounter a call instruction that might modify memory.
+ if (MI.isCall())
+ return E;
+
+ // Update modified / uses register units.
+ LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits, TRI);
+
+ // Otherwise, if the base register is modified, we have no match, so
+ // return early.
+ if (!ModifiedRegUnits.available(BaseReg))
+ return E;
+
+ // Update list of instructions that read/write memory.
+ if (MI.mayLoadOrStore())
+ MemInsns.push_back(&MI);
+ }
+ return E;
+}
+
+MachineBasicBlock::iterator
+RISCVLoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
+ MachineBasicBlock::iterator Paired,
+ bool MergeForward) {
+ MachineBasicBlock::iterator E = I->getParent()->end();
+ MachineBasicBlock::iterator NextI = next_nodbg(I, E);
+ if (NextI == Paired)
+ NextI = next_nodbg(NextI, E);
+
+ // Insert our new paired instruction after whichever of the paired
+ // instructions MergeForward indicates.
+ MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I;
+ MachineBasicBlock::iterator DeletionPoint = MergeForward ? I : Paired;
+ int Offset = I->getOperand(2).getImm();
+ int PairedOffset = Paired->getOperand(2).getImm();
+ bool InsertAfter = (Offset < PairedOffset) ^ MergeForward;
+
+ if (!MergeForward)
+ Paired->getOperand(1).setIsKill(false);
+
+ // Kill flags may become invalid when moving stores for pairing.
+ if (I->getOperand(0).isUse()) {
+ if (!MergeForward) {
+ // Clear kill flags on store if moving upwards.
+ I->getOperand(0).setIsKill(false);
+ Paired->getOperand(0).setIsKill(false);
+ } else {
+ // Clear kill flags of the first stores register.
+ Register Reg = I->getOperand(0).getReg();
+ for (MachineInstr &MI : make_range(std::next(I), std::next(Paired)))
+ MI.clearRegisterKills(Reg, TRI);
+ }
+ }
+
+ MachineInstr *ToInsert = DeletionPoint->removeFromParent();
+ MachineBasicBlock &MBB = *InsertionPoint->getParent();
+ MachineBasicBlock::iterator First, Second;
+
+ if (!InsertAfter) {
+ First = MBB.insert(InsertionPoint, ToInsert);
+ Second = InsertionPoint;
+ } else {
+ Second = MBB.insertAfter(InsertionPoint, ToInsert);
+ First = InsertionPoint;
+ }
+
+ if (!tryConvertToLdStPair(First, Second))
+ finalizeBundle(MBB, First.getInstrIterator(),
+ std::next(Second).getInstrIterator());
+
+ LLVM_DEBUG(dbgs() << "Bonding pair load/store:\n ");
+ LLVM_DEBUG(prev_nodbg(NextI, MBB.begin())->print(dbgs()));
+ return NextI;
+}
+
+// Returns an instance of the Load / Store Optimization pass.
+FunctionPass *llvm::createRISCVLoadStoreOptPass() {
+ return new RISCVLoadStoreOpt();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index 1b54c278820fc2..d7439c50be0764 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -248,6 +248,10 @@ void RISCVSubtarget::overridePostRASchedPolicy(MachineSchedPolicy &Policy,
}
}
+bool RISCVSubtarget::useLoadStorePairs() const {
+ return UseMIPSLoadStorePairsOpt && HasVendorXMIPSLSP;
+}
+
bool RISCVSubtarget::useCCMovInsn() const {
return UseCCMovInsn && HasVendorXMIPSCMove;
}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index dde808ad90413d..6b5241fbc3215f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -143,6 +143,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
initializeRISCVDAGToDAGISelLegacyPass(*PR);
initializeRISCVMoveMergePass(*PR);
initializeRISCVPushPopOptPass(*PR);
+ initializeRISCVLoadStoreOptPass(*PR);
}
static StringRef computeDataLayout(const Triple &TT,
@@ -382,7 +383,12 @@ class RISCVPassConfig : public TargetPassConfig {
ScheduleDAGInstrs *
createPostMachineScheduler(MachineSchedContext *C) const override {
ScheduleDAGMI *DAG = nullptr;
- if (EnablePostMISchedLoadStoreClustering) {
+ const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
+ bool EnableLoadStoreClusteringForLoadStorePairOpt =
+ !ST.getMacroFusions().empty() && ST.useLoadStorePairs();
+
+ if (EnablePostMISchedLoadStoreClustering ||
+ EnableLoadStoreClusteringForLoadStorePairOpt) {
DAG = createGenericSchedPostRA(C);
DAG->addMutation(createLoadClusterDAGMutation(
DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
@@ -549,6 +555,8 @@ void RISCVPassConfig::addPreSched2() {
// Emit KCFI checks for indirect calls.
addPass(createKCFIPass());
+ if (TM->getOptLevel() != CodeGenOptLevel::None)
+ addPass(createRISCVLoadStoreOptPass());
}
void RISCVPassConfig::addPreEmitPass() {
@@ -562,6 +570,11 @@ void RISCVPassConfig::addPreEmitPass() {
addPass(createMachineCopyPropagationPass(true));
addPass(&BranchRelaxationPassID);
addPass(createRISCVMakeCompressibleOptPass());
+
+ // LoadStoreOptimizer creates bundles for load-store bonding.
+ addPass(createUnpackMachineBundles([](const MachineFunction &MF) {
+ return MF.getSubtarget<RISCVSubtarget>().useLoadStorePairs();
+ }));
}
void RISCVPassConfig::addPreEmitPass2() {
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index f60def9d546f81..5ee6c192b80291 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -63,6 +63,7 @@
; CHECK-NEXT: Implement the 'patchable-function' attribute
; CHECK-NEXT: Branch relaxation pass
; CHECK-NEXT: RISC-V Make Compressible
+; CHECK-NEXT: Unpack machine instruction bundles
; CHECK-NEXT: Contiguously Lay Out Funclets
; CHECK-NEXT: Remove Loads Into Fake Uses
; CHECK-NEXT: StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 668c7346124472..4f6b9ba674ef3c 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -180,6 +180,7 @@
; CHECK-NEXT: Post-RA pseudo instruction expansion pass
; CHECK-NEXT: RISC-V post-regalloc pseudo instruction expansion pass
; CHECK-NEXT: Insert KCFI indirect call checks
+; CHECK-NEXT: RISCV Load / Store Optimizer
; CHECK-NEXT: MachineDominator Tree Construction
; CHECK-NEXT: Machine Natural Loop Construction
; CHECK-NEXT: PostRA Machine Instruction Scheduler
@@ -193,6 +194,7 @@
; CHECK-NEXT: Machine Copy Propagation Pass
; CHECK-NEXT: Branch relaxation pass
; CHECK-NEXT: RISC-V Make Compressible
+; CHECK-NEXT: Unpack machine instruction bundles
; CHECK-NEXT: Contiguously Lay Out Funclets
; CHECK-NEXT: Remove Loads Into Fake Uses
; CHECK-NEXT: StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/load-store-pair.ll b/llvm/test/CodeGen/RISCV/load-store-pair.ll
new file mode 100644
index 00000000000000..4f415e789c14b8
---...
[truncated]
|
I'm pretty sure some of this is copied from AArch64. That's a helpful thing to mention in the description as it makes it easier to review if we can cross reference the prior art. Can you mention any notable differences? |
} | ||
|
||
if (!tryConvertToLdStPair(First, Second)) | ||
finalizeBundle(MBB, First.getInstrIterator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you want to bundle them when the pairing fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have introduced a new option for bonding
that should make this more obvious now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, the idea is to take advantage of hardware load/store bonding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that compare to load/store clustering which was already enabled in the scheduler? #73789
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same question as what was discussed here (doing it in scheduler == DAG cluster mutation): #117865 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonding part is very similar thing, so I will remove this part for now and keep ldp/sdp
optimization only in this change. I still see benefits in terms of performance when using both, load/store clustering
from the scheduler and bundling for bonding here
, but it can be a separate topic/change.
Can you talk (in a comment on this PR is fine) about how you see this fitting in with e.g. Zilsd support, which we were told is coming soon to upstream from NXP? The NXP branch against 18.0 does similar things. IIRC, it has two load-store pairing passes (an early one and a late one), it makes changes to prolog-epilog inserter, and it might also make some changes directly to isel. Maybe some of these changes are useful to Mips' pairing instructions too? The NXP changes are here: release/18.x...nxp-auto-tools:llvm-project:Zilsd/release/18.1.6 (this is a compare view of their 18.1.6 version against upstream release/18.x) I would like to see coordination between you and NXP - maybe @anmolparalkar-nxp is the right person? - so that we can get a reasonably general load/store pair codegen implementation that can support extensions where the instruction details might be different, but the overall extensions are similar. |
Also cc'ing @christian-herber-nxp who is involved on the Zilsd side from NXP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a new pass to merge two consecutive loads/stores?
XTheadMemPair codegen is supported by TargetDAGCombine: https://reviews.llvm.org/D144002
thanks, I will keep an eye on this. @anmolparalkar-nxp is the right person though. |
Yes. That Pass was our motivation back then, and yes, I will add that note. Thanks @topperc.
@lenary yes it makes sense, thanks! I will talk to @anmolparalkar-nxp :)
@dtcxzyw This might be an alternative, but I am not sure the |
|
e54aa9b
to
7e2c143
Compare
I have added a top-level comment that explains this. Basically, we use |
✅ With the latest revision this PR passed the C/C++ code formatter. |
One part of the 32-bit extension that NXP will upstream soon can easily be extended on top of this optimization, and I do not see any conflict with this. cc @lenary @anmolparalkar-nxp |
7f15e48
to
5626d55
Compare
Align MMOAlign = MMO->getAlign(); | ||
if (const PseudoSourceValue *Source = MMO->getPseudoValue()) | ||
if (Source->kind() == PseudoSourceValue::FixedStack) | ||
MMOAlign = MF->getSubtarget().getFrameLowering()->getStackAlign(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems questionable. StackAlign only tells you the alignment the stack pointer must be at, but we can and do store objects at smaller alignments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topperc Hmm, got it... What can be an alternative for this? Shall we check the alignment of the specific object e.g. getObjectAlignment(Source->getFrameIndex())
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alignment should already be correct in the MMO
. If it's not, that's a bug in the creation of the MMO
. We shouldn't have to check anything additional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topperc Well, the idea here is, even though the alignment is != 16 bytes
, we rely on ABI and assume it is 16 bytes. Is it right approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack pointer has to be 16 byte aligned at all times. The objects on the stack are not guaranteed to be aligned. On RV32, 4 callee saved registers are saved into a single 16 byte region. Only 1 of the 4 load/stores for those 4 registers is 16 byte aligned.
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction. This is part of MIPS extensions for the p8700 CPU. Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.
It was not very obvious that bonding and pairing are two different things, so it is improved now. In addition, mentioned the AArch64 optimization and similarity with it.
9c0ceb6
to
1371e44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought:
The lowering path can be:
- Merge two consecutive load/stores into pseudo load/store pair instructions no matter what the offsets are.
- Expand these pseudos in
RISCVExpandPseudo
by checking if their offsets are valid.
In this way, other vendor pair extensions like THead's can also benefit from this pass.
Why couldn't we check the offsets in this pass? |
Yes we could. I was thinking that we may make use of PseudoInstExpansion. |
I also think that this Pass can be reused for other vendors and for that use case you mentioned, but especially when we add more load/store optimizations such as the ones implemented in |
Is this good to go? |
hey @topperc thanks a lot for your comments! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2432 Here is the relevant piece of the build log for the reference
|
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction. This is part of MIPS extensions for the p8700 CPU. Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction.
This is part of MIPS extensions for the p8700 CPU.
Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.