Skip to content

[RISCV] Generate MIPS load/store pair instructions #124717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 7, 2025

Conversation

djtodoro
Copy link
Collaborator

@djtodoro djtodoro commented Jan 28, 2025

Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction.

This is part of MIPS extensions for the p8700 CPU.

Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.

@llvmbot
Copy link
Member

llvmbot commented Jan 28, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Djordje Todorovic (djtodoro)

Changes

Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. It bundles loads and stores that operate on consecutive memory locations to take the advantage of hardware load/store bonding.

This is part of MIPS extensions for the p8700 CPU.


Patch is 39.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124717.diff

9 Files Affected:

  • (modified) llvm/lib/Target/RISCV/CMakeLists.txt (+1)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.cpp (+40)
  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.h (+6)
  • (added) llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp (+364)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+4)
  • (modified) llvm/lib/Target/RISCV/RISCVTargetMachine.cpp (+14-1)
  • (modified) llvm/test/CodeGen/RISCV/O0-pipeline.ll (+1)
  • (modified) llvm/test/CodeGen/RISCV/O3-pipeline.ll (+2)
  • (added) llvm/test/CodeGen/RISCV/load-store-pair.ll (+509)
diff --git a/llvm/lib/Target/RISCV/CMakeLists.txt b/llvm/lib/Target/RISCV/CMakeLists.txt
index 98d3615ebab58d..7a533841c82700 100644
--- a/llvm/lib/Target/RISCV/CMakeLists.txt
+++ b/llvm/lib/Target/RISCV/CMakeLists.txt
@@ -48,6 +48,7 @@ add_llvm_target(RISCVCodeGen
   RISCVISelLowering.cpp
   RISCVLandingPadSetup.cpp
   RISCVMachineFunctionInfo.cpp
+  RISCVLoadStoreOptimizer.cpp
   RISCVMergeBaseOffset.cpp
   RISCVOptWInstrs.cpp
   RISCVPostRAExpandPseudoInsts.cpp
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
index bb9ebedeea4f7b..ce11db367f1611 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.cpp
@@ -2738,6 +2738,46 @@ MachineInstr *RISCVInstrInfo::emitLdStWithAddr(MachineInstr &MemI,
       .setMIFlags(MemI.getFlags());
 }
 
+bool RISCVInstrInfo::isPairableLdStInstOpc(unsigned Opc) {
+  switch (Opc) {
+  default:
+    return false;
+  case RISCV::SH:
+  case RISCV::LH:
+  case RISCV::LHU:
+  case RISCV::SW:
+  case RISCV::FSW:
+  case RISCV::LW:
+  case RISCV::FLW:
+  case RISCV::SD:
+  case RISCV::FSD:
+  case RISCV::LD:
+  case RISCV::FLD:
+    return true;
+  }
+}
+
+bool RISCVInstrInfo::isLdStSafeToPair(const MachineInstr &LdSt,
+                                      const TargetRegisterInfo *TRI) {
+  // If this is a volatile load/store, don't mess with it.
+  if (LdSt.hasOrderedMemoryRef() || LdSt.getNumExplicitOperands() != 3)
+    return false;
+
+  if (LdSt.getOperand(1).isFI())
+    return true;
+
+  assert(LdSt.getOperand(1).isReg() && "Expected a reg operand.");
+  // Can't cluster if the instruction modifies the base register
+  // or it is update form. e.g. ld x5,8(x5)
+  if (LdSt.modifiesRegister(LdSt.getOperand(1).getReg(), TRI))
+    return false;
+
+  if (!LdSt.getOperand(2).isImm())
+    return false;
+
+  return true;
+}
+
 bool RISCVInstrInfo::getMemOperandsWithOffsetWidth(
     const MachineInstr &LdSt, SmallVectorImpl<const MachineOperand *> &BaseOps,
     int64_t &Offset, bool &OffsetIsScalable, LocationSize &Width,
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.h b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
index 1c81719c767ecb..64cfb243b7bdb6 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.h
@@ -300,6 +300,12 @@ class RISCVInstrInfo : public RISCVGenInstrInfo {
   std::unique_ptr<TargetInstrInfo::PipelinerLoopInfo>
   analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const override;
 
+  /// Return true if pairing the given load or store may be paired with another.
+  static bool isPairableLdStInstOpc(unsigned Opc);
+
+  static bool isLdStSafeToPair(const MachineInstr &LdSt,
+                               const TargetRegisterInfo *TRI);
+
 protected:
   const RISCVSubtarget &STI;
 
diff --git a/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp b/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
new file mode 100644
index 00000000000000..1421ede686d26a
--- /dev/null
+++ b/llvm/lib/Target/RISCV/RISCVLoadStoreOptimizer.cpp
@@ -0,0 +1,364 @@
+//===----- RISCVLoadStoreOptimizer.cpp ------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// Bundle loads and stores that operate on consecutive memory locations to take
+// the advantage of hardware load/store bonding.
+//
+//===----------------------------------------------------------------------===//
+
+#include "RISCV.h"
+#include "RISCVTargetMachine.h"
+#include "llvm/Analysis/AliasAnalysis.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/MC/TargetRegistry.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Target/TargetOptions.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "riscv-load-store-opt"
+#define RISCV_LOAD_STORE_OPT_NAME "RISCV Load / Store Optimizer"
+namespace {
+
+struct RISCVLoadStoreOpt : public MachineFunctionPass {
+  static char ID;
+  bool runOnMachineFunction(MachineFunction &Fn) override;
+
+  RISCVLoadStoreOpt() : MachineFunctionPass(ID) {}
+
+  MachineFunctionProperties getRequiredProperties() const override {
+    return MachineFunctionProperties().set(
+        MachineFunctionProperties::Property::NoVRegs);
+  }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override {
+    AU.addRequired<AAResultsWrapperPass>();
+    MachineFunctionPass::getAnalysisUsage(AU);
+  }
+
+  StringRef getPassName() const override { return RISCV_LOAD_STORE_OPT_NAME; }
+
+  // Find and pair load/store instructions.
+  bool tryToPairLdStInst(MachineBasicBlock::iterator &MBBI);
+
+  // Convert load/store pairs to single instructions.
+  bool tryConvertToLdStPair(MachineBasicBlock::iterator First,
+                            MachineBasicBlock::iterator Second);
+
+  // Scan the instructions looking for a load/store that can be combined
+  // with the current instruction into a load/store pair.
+  // Return the matching instruction if one is found, else MBB->end().
+  MachineBasicBlock::iterator findMatchingInsn(MachineBasicBlock::iterator I,
+                                               bool &MergeForward);
+
+  MachineBasicBlock::iterator
+  mergePairedInsns(MachineBasicBlock::iterator I,
+                   MachineBasicBlock::iterator Paired, bool MergeForward);
+
+private:
+  AliasAnalysis *AA;
+  MachineRegisterInfo *MRI;
+  const RISCVInstrInfo *TII;
+  const RISCVRegisterInfo *TRI;
+  LiveRegUnits ModifiedRegUnits, UsedRegUnits;
+};
+} // end anonymous namespace
+
+char RISCVLoadStoreOpt::ID = 0;
+INITIALIZE_PASS(RISCVLoadStoreOpt, DEBUG_TYPE, RISCV_LOAD_STORE_OPT_NAME, false,
+                false)
+
+bool RISCVLoadStoreOpt::runOnMachineFunction(MachineFunction &Fn) {
+  if (skipFunction(Fn.getFunction()))
+    return false;
+  const RISCVSubtarget &Subtarget = Fn.getSubtarget<RISCVSubtarget>();
+  if (!Subtarget.useLoadStorePairs())
+    return false;
+
+  bool MadeChange = false;
+  TII = Subtarget.getInstrInfo();
+  TRI = Subtarget.getRegisterInfo();
+  MRI = &Fn.getRegInfo();
+  AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
+  ModifiedRegUnits.init(*TRI);
+  UsedRegUnits.init(*TRI);
+
+  for (MachineBasicBlock &MBB : Fn) {
+    LLVM_DEBUG(dbgs() << "MBB: " << MBB.getName() << "\n");
+
+    for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
+         MBBI != E;) {
+      if (TII->isPairableLdStInstOpc(MBBI->getOpcode()) &&
+          tryToPairLdStInst(MBBI))
+        MadeChange = true;
+      else
+        ++MBBI;
+    }
+  }
+  return MadeChange;
+}
+
+// Find loads and stores that can be merged into a single load or store pair
+// instruction.
+bool RISCVLoadStoreOpt::tryToPairLdStInst(MachineBasicBlock::iterator &MBBI) {
+  MachineInstr &MI = *MBBI;
+  MachineBasicBlock::iterator E = MI.getParent()->end();
+
+  if (!TII->isLdStSafeToPair(MI, TRI))
+    return false;
+
+  // Look ahead for a pairable instruction.
+  bool MergeForward;
+  MachineBasicBlock::iterator Paired = findMatchingInsn(MBBI, MergeForward);
+  if (Paired != E) {
+    MBBI = mergePairedInsns(MBBI, Paired, MergeForward);
+    return true;
+  }
+  return false;
+}
+
+bool RISCVLoadStoreOpt::tryConvertToLdStPair(
+    MachineBasicBlock::iterator First, MachineBasicBlock::iterator Second) {
+  unsigned PairOpc;
+  // TODO: Handle the rest from RISCVInstrInfo::isPairableLdStInstOpc.
+  switch (First->getOpcode()) {
+  default:
+    return false;
+  case RISCV::SW:
+    PairOpc = RISCV::SWP;
+    break;
+  case RISCV::LW:
+    PairOpc = RISCV::LWP;
+    break;
+  case RISCV::SD:
+    PairOpc = RISCV::SDP;
+    break;
+  case RISCV::LD:
+    PairOpc = RISCV::LDP;
+    break;
+  }
+
+  MachineFunction *MF = First->getMF();
+  const MachineMemOperand *MMO = *First->memoperands_begin();
+  Align MMOAlign = MMO->getAlign();
+  if (const PseudoSourceValue *Source = MMO->getPseudoValue())
+    if (Source->kind() == PseudoSourceValue::FixedStack)
+      MMOAlign = MF->getSubtarget().getFrameLowering()->getStackAlign();
+
+  if (MMOAlign < Align(MMO->getSize().getValue() * 2))
+    return false;
+  int64_t Offset = First->getOperand(2).getImm();
+  if (!isUInt<7>(Offset) ||
+      !isAligned(Align(MMO->getSize().getValue()), Offset))
+    return false;
+  MachineInstrBuilder MIB = BuildMI(
+      *MF,
+      First->getDebugLoc().get() ? First->getDebugLoc() : Second->getDebugLoc(),
+      TII->get(PairOpc));
+  MIB.add(First->getOperand(0))
+      .add(Second->getOperand(0))
+      .add(First->getOperand(1))
+      .add(First->getOperand(2))
+      .cloneMergedMemRefs({&*First, &*Second});
+
+  First->getParent()->insert(First, MIB);
+
+  First->removeFromParent();
+  Second->removeFromParent();
+
+  return true;
+}
+
+static bool mayAlias(MachineInstr &MIa,
+                     SmallVectorImpl<MachineInstr *> &MemInsns,
+                     AliasAnalysis *AA) {
+  for (MachineInstr *MIb : MemInsns)
+    if (MIa.mayAlias(AA, *MIb, /*UseTBAA*/ false))
+      return true;
+
+  return false;
+}
+
+// Scan the instructions looking for a load/store that can be combined with the
+// current instruction into a wider equivalent or a load/store pair.
+// TODO: Extend pairing logic to consider reordering both instructions
+// to a safe "middle" position rather than only merging forward/backward.
+// This requires more sophisticated checks for aliasing, register
+// liveness, and potential scheduling hazards.
+MachineBasicBlock::iterator
+RISCVLoadStoreOpt::findMatchingInsn(MachineBasicBlock::iterator I,
+                                    bool &MergeForward) {
+  MachineBasicBlock::iterator E = I->getParent()->end();
+  MachineBasicBlock::iterator MBBI = I;
+  MachineInstr &FirstMI = *I;
+  MBBI = next_nodbg(MBBI, E);
+
+  bool MayLoad = FirstMI.mayLoad();
+  Register Reg = FirstMI.getOperand(0).getReg();
+  Register BaseReg = FirstMI.getOperand(1).getReg();
+  int Offset = FirstMI.getOperand(2).getImm();
+  int OffsetStride = (*FirstMI.memoperands_begin())->getSize().getValue();
+
+  MergeForward = false;
+
+  // Track which register units have been modified and used between the first
+  // insn (inclusive) and the second insn.
+  ModifiedRegUnits.clear();
+  UsedRegUnits.clear();
+
+  // Remember any instructions that read/write memory between FirstMI and MI.
+  SmallVector<MachineInstr *, 4> MemInsns;
+
+  for (unsigned Count = 0; MBBI != E && Count < 128;
+       MBBI = next_nodbg(MBBI, E)) {
+    MachineInstr &MI = *MBBI;
+
+    // Don't count transient instructions towards the search limit since there
+    // may be different numbers of them if e.g. debug information is present.
+    if (!MI.isTransient())
+      ++Count;
+
+    if (MI.getOpcode() == FirstMI.getOpcode() &&
+        TII->isLdStSafeToPair(MI, TRI)) {
+      Register MIBaseReg = MI.getOperand(1).getReg();
+      int MIOffset = MI.getOperand(2).getImm();
+
+      if (BaseReg == MIBaseReg) {
+
+        if ((Offset != MIOffset + OffsetStride) &&
+            (Offset + OffsetStride != MIOffset)) {
+          LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
+                                            TRI);
+          MemInsns.push_back(&MI);
+          continue;
+        }
+
+        // If the destination register of one load is the same register or a
+        // sub/super register of the other load, bail and keep looking.
+        if (MayLoad &&
+            TRI->isSuperOrSubRegisterEq(Reg, MI.getOperand(0).getReg())) {
+          LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits,
+                                            TRI);
+          MemInsns.push_back(&MI);
+          continue;
+        }
+
+        // If the BaseReg has been modified, then we cannot do the optimization.
+        if (!ModifiedRegUnits.available(BaseReg))
+          return E;
+
+        // If the Rt of the second instruction was not modified or used between
+        // the two instructions and none of the instructions between the second
+        // and first alias with the second, we can combine the second into the
+        // first.
+        if (ModifiedRegUnits.available(MI.getOperand(0).getReg()) &&
+            !(MI.mayLoad() &&
+              !UsedRegUnits.available(MI.getOperand(0).getReg())) &&
+            !mayAlias(MI, MemInsns, AA)) {
+
+          MergeForward = false;
+          return MBBI;
+        }
+
+        // Likewise, if the Rt of the first instruction is not modified or used
+        // between the two instructions and none of the instructions between the
+        // first and the second alias with the first, we can combine the first
+        // into the second.
+        if (!(MayLoad &&
+              !UsedRegUnits.available(FirstMI.getOperand(0).getReg())) &&
+            !mayAlias(FirstMI, MemInsns, AA)) {
+
+          if (ModifiedRegUnits.available(FirstMI.getOperand(0).getReg())) {
+            MergeForward = true;
+            return MBBI;
+          }
+        }
+        // Unable to combine these instructions due to interference in between.
+        // Keep looking.
+      }
+    }
+
+    // If the instruction wasn't a matching load or store.  Stop searching if we
+    // encounter a call instruction that might modify memory.
+    if (MI.isCall())
+      return E;
+
+    // Update modified / uses register units.
+    LiveRegUnits::accumulateUsedDefed(MI, ModifiedRegUnits, UsedRegUnits, TRI);
+
+    // Otherwise, if the base register is modified, we have no match, so
+    // return early.
+    if (!ModifiedRegUnits.available(BaseReg))
+      return E;
+
+    // Update list of instructions that read/write memory.
+    if (MI.mayLoadOrStore())
+      MemInsns.push_back(&MI);
+  }
+  return E;
+}
+
+MachineBasicBlock::iterator
+RISCVLoadStoreOpt::mergePairedInsns(MachineBasicBlock::iterator I,
+                                    MachineBasicBlock::iterator Paired,
+                                    bool MergeForward) {
+  MachineBasicBlock::iterator E = I->getParent()->end();
+  MachineBasicBlock::iterator NextI = next_nodbg(I, E);
+  if (NextI == Paired)
+    NextI = next_nodbg(NextI, E);
+
+  // Insert our new paired instruction after whichever of the paired
+  // instructions MergeForward indicates.
+  MachineBasicBlock::iterator InsertionPoint = MergeForward ? Paired : I;
+  MachineBasicBlock::iterator DeletionPoint = MergeForward ? I : Paired;
+  int Offset = I->getOperand(2).getImm();
+  int PairedOffset = Paired->getOperand(2).getImm();
+  bool InsertAfter = (Offset < PairedOffset) ^ MergeForward;
+
+  if (!MergeForward)
+    Paired->getOperand(1).setIsKill(false);
+
+  // Kill flags may become invalid when moving stores for pairing.
+  if (I->getOperand(0).isUse()) {
+    if (!MergeForward) {
+      // Clear kill flags on store if moving upwards.
+      I->getOperand(0).setIsKill(false);
+      Paired->getOperand(0).setIsKill(false);
+    } else {
+      // Clear kill flags of the first stores register.
+      Register Reg = I->getOperand(0).getReg();
+      for (MachineInstr &MI : make_range(std::next(I), std::next(Paired)))
+        MI.clearRegisterKills(Reg, TRI);
+    }
+  }
+
+  MachineInstr *ToInsert = DeletionPoint->removeFromParent();
+  MachineBasicBlock &MBB = *InsertionPoint->getParent();
+  MachineBasicBlock::iterator First, Second;
+
+  if (!InsertAfter) {
+    First = MBB.insert(InsertionPoint, ToInsert);
+    Second = InsertionPoint;
+  } else {
+    Second = MBB.insertAfter(InsertionPoint, ToInsert);
+    First = InsertionPoint;
+  }
+
+  if (!tryConvertToLdStPair(First, Second))
+    finalizeBundle(MBB, First.getInstrIterator(),
+                   std::next(Second).getInstrIterator());
+
+  LLVM_DEBUG(dbgs() << "Bonding pair load/store:\n    ");
+  LLVM_DEBUG(prev_nodbg(NextI, MBB.begin())->print(dbgs()));
+  return NextI;
+}
+
+// Returns an instance of the Load / Store Optimization pass.
+FunctionPass *llvm::createRISCVLoadStoreOptPass() {
+  return new RISCVLoadStoreOpt();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index 1b54c278820fc2..d7439c50be0764 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -248,6 +248,10 @@ void RISCVSubtarget::overridePostRASchedPolicy(MachineSchedPolicy &Policy,
   }
 }
 
+bool RISCVSubtarget::useLoadStorePairs() const {
+  return UseMIPSLoadStorePairsOpt && HasVendorXMIPSLSP;
+}
+
 bool RISCVSubtarget::useCCMovInsn() const {
   return UseCCMovInsn && HasVendorXMIPSCMove;
 }
diff --git a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
index dde808ad90413d..6b5241fbc3215f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
@@ -143,6 +143,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
   initializeRISCVDAGToDAGISelLegacyPass(*PR);
   initializeRISCVMoveMergePass(*PR);
   initializeRISCVPushPopOptPass(*PR);
+  initializeRISCVLoadStoreOptPass(*PR);
 }
 
 static StringRef computeDataLayout(const Triple &TT,
@@ -382,7 +383,12 @@ class RISCVPassConfig : public TargetPassConfig {
   ScheduleDAGInstrs *
   createPostMachineScheduler(MachineSchedContext *C) const override {
     ScheduleDAGMI *DAG = nullptr;
-    if (EnablePostMISchedLoadStoreClustering) {
+    const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
+    bool EnableLoadStoreClusteringForLoadStorePairOpt =
+        !ST.getMacroFusions().empty() && ST.useLoadStorePairs();
+
+    if (EnablePostMISchedLoadStoreClustering ||
+        EnableLoadStoreClusteringForLoadStorePairOpt) {
       DAG = createGenericSchedPostRA(C);
       DAG->addMutation(createLoadClusterDAGMutation(
           DAG->TII, DAG->TRI, /*ReorderWhileClustering=*/true));
@@ -549,6 +555,8 @@ void RISCVPassConfig::addPreSched2() {
 
   // Emit KCFI checks for indirect calls.
   addPass(createKCFIPass());
+  if (TM->getOptLevel() != CodeGenOptLevel::None)
+    addPass(createRISCVLoadStoreOptPass());
 }
 
 void RISCVPassConfig::addPreEmitPass() {
@@ -562,6 +570,11 @@ void RISCVPassConfig::addPreEmitPass() {
     addPass(createMachineCopyPropagationPass(true));
   addPass(&BranchRelaxationPassID);
   addPass(createRISCVMakeCompressibleOptPass());
+
+  // LoadStoreOptimizer creates bundles for load-store bonding.
+  addPass(createUnpackMachineBundles([](const MachineFunction &MF) {
+    return MF.getSubtarget<RISCVSubtarget>().useLoadStorePairs();
+  }));
 }
 
 void RISCVPassConfig::addPreEmitPass2() {
diff --git a/llvm/test/CodeGen/RISCV/O0-pipeline.ll b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
index f60def9d546f81..5ee6c192b80291 100644
--- a/llvm/test/CodeGen/RISCV/O0-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O0-pipeline.ll
@@ -63,6 +63,7 @@
 ; CHECK-NEXT:       Implement the 'patchable-function' attribute
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       Unpack machine instruction bundles
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/O3-pipeline.ll b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
index 668c7346124472..4f6b9ba674ef3c 100644
--- a/llvm/test/CodeGen/RISCV/O3-pipeline.ll
+++ b/llvm/test/CodeGen/RISCV/O3-pipeline.ll
@@ -180,6 +180,7 @@
 ; CHECK-NEXT:       Post-RA pseudo instruction expansion pass
 ; CHECK-NEXT:       RISC-V post-regalloc pseudo instruction expansion pass
 ; CHECK-NEXT:       Insert KCFI indirect call checks
+; CHECK-NEXT:       RISCV Load / Store Optimizer
 ; CHECK-NEXT:       MachineDominator Tree Construction
 ; CHECK-NEXT:       Machine Natural Loop Construction
 ; CHECK-NEXT:       PostRA Machine Instruction Scheduler
@@ -193,6 +194,7 @@
 ; CHECK-NEXT:       Machine Copy Propagation Pass
 ; CHECK-NEXT:       Branch relaxation pass
 ; CHECK-NEXT:       RISC-V Make Compressible
+; CHECK-NEXT:       Unpack machine instruction bundles
 ; CHECK-NEXT:       Contiguously Lay Out Funclets
 ; CHECK-NEXT:       Remove Loads Into Fake Uses
 ; CHECK-NEXT:       StackMap Liveness Analysis
diff --git a/llvm/test/CodeGen/RISCV/load-store-pair.ll b/llvm/test/CodeGen/RISCV/load-store-pair.ll
new file mode 100644
index 00000000000000..4f415e789c14b8
---...
[truncated]

@topperc
Copy link
Collaborator

topperc commented Jan 28, 2025

I'm pretty sure some of this is copied from AArch64. That's a helpful thing to mention in the description as it makes it easier to review if we can cross reference the prior art. Can you mention any notable differences?

}

if (!tryConvertToLdStPair(First, Second))
finalizeBundle(MBB, First.getInstrIterator(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you want to bundle them when the pairing fails?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced a new option for bonding that should make this more obvious now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, the idea is to take advantage of hardware load/store bonding.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that compare to load/store clustering which was already enabled in the scheduler? #73789

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same question as what was discussed here (doing it in scheduler == DAG cluster mutation): #117865 (comment)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bonding part is very similar thing, so I will remove this part for now and keep ldp/sdp optimization only in this change. I still see benefits in terms of performance when using both, load/store clustering from the scheduler and bundling for bonding here, but it can be a separate topic/change.

@lenary
Copy link
Member

lenary commented Jan 28, 2025

Can you talk (in a comment on this PR is fine) about how you see this fitting in with e.g. Zilsd support, which we were told is coming soon to upstream from NXP?

The NXP branch against 18.0 does similar things. IIRC, it has two load-store pairing passes (an early one and a late one), it makes changes to prolog-epilog inserter, and it might also make some changes directly to isel. Maybe some of these changes are useful to Mips' pairing instructions too? The NXP changes are here: release/18.x...nxp-auto-tools:llvm-project:Zilsd/release/18.1.6 (this is a compare view of their 18.1.6 version against upstream release/18.x)

I would like to see coordination between you and NXP - maybe @anmolparalkar-nxp is the right person? - so that we can get a reasonably general load/store pair codegen implementation that can support extensions where the instruction details might be different, but the overall extensions are similar.

@lenary
Copy link
Member

lenary commented Jan 28, 2025

Also cc'ing @christian-herber-nxp who is involved on the Zilsd side from NXP.

Copy link
Member

@dtcxzyw dtcxzyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need a new pass to merge two consecutive loads/stores?
XTheadMemPair codegen is supported by TargetDAGCombine: https://reviews.llvm.org/D144002

@christian-herber-nxp
Copy link

Also cc'ing @christian-herber-nxp who is involved on the Zilsd side from NXP.

thanks, I will keep an eye on this. @anmolparalkar-nxp is the right person though.

@djtodoro
Copy link
Collaborator Author

I'm pretty sure some of this is copied from AArch64. That's a helpful thing to mention in the description as it makes it easier to review if we can cross reference the prior art. Can you mention any notable differences?

Yes. That Pass was our motivation back then, and yes, I will add that note. Thanks @topperc.

Can you talk (in a comment on this PR is fine) about how you see this fitting in with e.g. Zilsd support, which we were told is coming soon to upstream from NXP?

The NXP branch against 18.0 does similar things. IIRC, it has two load-store pairing passes (an early one and a late one), it makes changes to prolog-epilog inserter, and it might also make some changes directly to isel. Maybe some of these changes are useful to Mips' pairing instructions too? The NXP changes are here: release/18.x...nxp-auto-tools:llvm-project:Zilsd/release/18.1.6 (this is a compare view of their 18.1.6 version against upstream release/18.x)

I would like to see coordination between you and NXP - maybe @anmolparalkar-nxp is the right person? - so that we can get a reasonably general load/store pair codegen implementation that can support extensions where the instruction details might be different, but the overall extensions are similar.

@lenary yes it makes sense, thanks! I will talk to @anmolparalkar-nxp :)

Do we really need a new pass to merge two consecutive loads/stores? XTheadMemPair codegen is supported by TargetDAGCombine: https://reviews.llvm.org/D144002

@dtcxzyw This might be an alternative, but I am not sure the DAGCombiner can catch/handle more complex cases, for example this one: #117865 (comment).

@anmolparalkar-nxp
Copy link
Contributor

anmolparalkar-nxp commented Jan 30, 2025

@lenary yes it makes sense, thanks! I will talk to @anmolparalkar-nxp :)
Thanks @lenary, @christian-herber-nxp. @djtodoro, we will work together to streamline and unify the approach, thank you.

@djtodoro djtodoro force-pushed the pr/mips-riscv-load-store-pairs branch from e54aa9b to 7e2c143 Compare February 12, 2025 09:53
@djtodoro
Copy link
Collaborator Author

I'm pretty sure some of this is copied from AArch64. That's a helpful thing to mention in the description as it makes it easier to review if we can cross reference the prior art. Can you mention any notable differences?

I have added a top-level comment that explains this. Basically, we use tryToPairLdStInst from it only. AArch64 performs away more things, and it can be future work for RISC-V as well.

Copy link

github-actions bot commented Feb 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@djtodoro
Copy link
Collaborator Author

One part of the 32-bit extension that NXP will upstream soon can easily be extended on top of this optimization, and I do not see any conflict with this. cc @lenary @anmolparalkar-nxp

@djtodoro djtodoro force-pushed the pr/mips-riscv-load-store-pairs branch from 7f15e48 to 5626d55 Compare February 13, 2025 12:19
Align MMOAlign = MMO->getAlign();
if (const PseudoSourceValue *Source = MMO->getPseudoValue())
if (Source->kind() == PseudoSourceValue::FixedStack)
MMOAlign = MF->getSubtarget().getFrameLowering()->getStackAlign();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems questionable. StackAlign only tells you the alignment the stack pointer must be at, but we can and do store objects at smaller alignments.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc Hmm, got it... What can be an alternative for this? Shall we check the alignment of the specific object e.g. getObjectAlignment(Source->getFrameIndex())?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alignment should already be correct in the MMO. If it's not, that's a bug in the creation of the MMO. We shouldn't have to check anything additional.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@topperc Well, the idea here is, even though the alignment is != 16 bytes, we rely on ABI and assume it is 16 bytes. Is it right approach?

Copy link
Collaborator

@topperc topperc Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack pointer has to be 16 byte aligned at all times. The objects on the stack are not guaranteed to be aligned. On RV32, 4 callee saved registers are saved into a single 16 byte region. Only 1 of the 4 load/stores for those 4 registers is 16 byte aligned.

Introduce RISCVLoadStoreOptimizer MIR Pass that will do
the optimization. The load/store pairing pass identifies adjacent
load/store instructions operating on consecutive memory locations
and merges them into a single paired instruction.

This is part of MIPS extensions for the p8700 CPU.

Production of ldp/sdp instructions is OFF by default, since it
is beneficial for -Os only in the case of p8700 CPU.
It was not very obvious that bonding and pairing are two different
things, so it is improved now. In addition, mentioned the AArch64
optimization and similarity with it.
@djtodoro djtodoro force-pushed the pr/mips-riscv-load-store-pairs branch from 9c0ceb6 to 1371e44 Compare February 19, 2025 13:33
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought:
The lowering path can be:

  1. Merge two consecutive load/stores into pseudo load/store pair instructions no matter what the offsets are.
  2. Expand these pseudos in RISCVExpandPseudo by checking if their offsets are valid.

In this way, other vendor pair extensions like THead's can also benefit from this pass.

@topperc
Copy link
Collaborator

topperc commented Feb 24, 2025

A thought: The lowering path can be:

  1. Merge two consecutive load/stores into pseudo load/store pair instructions no matter what the offsets are.
  2. Expand these pseudos in RISCVExpandPseudo by checking if their offsets are valid.

In this way, other vendor pair extensions like THead's can also benefit from this pass.

Why couldn't we check the offsets in this pass?

@wangpc-pp
Copy link
Contributor

A thought: The lowering path can be:

  1. Merge two consecutive load/stores into pseudo load/store pair instructions no matter what the offsets are.
  2. Expand these pseudos in RISCVExpandPseudo by checking if their offsets are valid.

In this way, other vendor pair extensions like THead's can also benefit from this pass.

Why couldn't we check the offsets in this pass?

Yes we could. I was thinking that we may make use of PseudoInstExpansion.

@djtodoro
Copy link
Collaborator Author

A thought: The lowering path can be:

  1. Merge two consecutive load/stores into pseudo load/store pair instructions no matter what the offsets are.
  2. Expand these pseudos in RISCVExpandPseudo by checking if their offsets are valid.

In this way, other vendor pair extensions like THead's can also benefit from this pass.

Why couldn't we check the offsets in this pass?

Yes we could. I was thinking that we may make use of PseudoInstExpansion.

I also think that this Pass can be reused for other vendors and for that use case you mentioned, but especially when we add more load/store optimizations such as the ones implemented in AArch64LoadStoreOpt this Pass will be more beneficial.

@djtodoro
Copy link
Collaborator Author

Is this good to go?

@djtodoro
Copy link
Collaborator Author

djtodoro commented Mar 5, 2025

hey @topperc thanks a lot for your comments!

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@djtodoro djtodoro merged commit 5048a08 into llvm:main Mar 7, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Mar 7, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-win running on as-worker-93 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/146/builds/2432

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM-Unit :: Support/./SupportTests.exe/38/87' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe-LLVM-Unit-7820-38-87.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=87 GTEST_SHARD_INDEX=38 C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe
--

Script:
--
C:\a\lld-x86_64-win\build\unittests\Support\.\SupportTests.exe --gtest_filter=ProgramEnvTest.CreateProcessLongPath
--
C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(160): error: Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp(163): error: fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied



C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:160
Expected equality of these values:
  0
  RC
    Which is: -2

C:\a\lld-x86_64-win\llvm-project\llvm\unittests\Support\ProgramTest.cpp:163
fs::remove(Twine(LongPath)): did not return errc::success.
error number: 13
error message: permission denied




********************


jph-13 pushed a commit to jph-13/llvm-project that referenced this pull request Mar 21, 2025
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the
optimization. The load/store pairing pass identifies adjacent load/store
instructions operating on consecutive memory locations and merges them
into a single paired instruction.

This is part of MIPS extensions for the p8700 CPU.

Production of ldp/sdp instructions is OFF by default, since it is
beneficial for -Os only in the case of p8700 CPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.