Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG: Move scalarizeExtractedVectorLoad to TargetLowering #122670

Merged

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Jan 13, 2025

SimplifyDemandedVectorElts should be able to use this on loads

Copy link
Contributor Author

arsenm commented Jan 13, 2025

@llvmbot
Copy link
Member

llvmbot commented Jan 13, 2025

@llvm/pr-subscribers-llvm-selectiondag

Author: Matt Arsenault (arsenm)

Changes

SimplifyDemandedVectorElts should be able to use this on loads


Full diff: https://github.com/llvm/llvm-project/pull/122670.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/TargetLowering.h (+12)
  • (modified) llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (+7-2)
  • (modified) llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp (+74)
diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index ce58777655e063..f898dfaa89915f 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5618,6 +5618,18 @@ class TargetLowering : public TargetLoweringBase {
   // joining their results. SDValue() is returned when expansion did not happen.
   SDValue expandVectorNaryOpBySplitting(SDNode *Node, SelectionDAG &DAG) const;
 
+  /// Replace an extraction of a load with a narrowed load.
+  ///
+  /// \param ResultVT type of the result extraction.
+  /// \param InVecVT type of the input vector to with bitcasts resolved.
+  /// \param EltNo index of the vector element to load.
+  /// \param OriginalLoad vector load that to be replaced.
+  /// \returns \p ResultVT Load on success SDValue() on failure.
+  SDValue scalarizeExtractedVectorLoad(EVT ResultVT, const SDLoc &DL,
+                                       EVT InVecVT, SDValue EltNo,
+                                       LoadSDNode *OriginalLoad,
+                                       SelectionDAG &DAG) const;
+
 private:
   SDValue foldSetCCWithAnd(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond,
                            const SDLoc &DL, DAGCombinerInfo &DCI) const;
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index da3c834417d6b2..6497531047476f 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -23246,8 +23246,13 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) {
       ISD::isNormalLoad(VecOp.getNode()) &&
       !Index->hasPredecessor(VecOp.getNode())) {
     auto *VecLoad = dyn_cast<LoadSDNode>(VecOp);
-    if (VecLoad && VecLoad->isSimple())
-      return scalarizeExtractedVectorLoad(N, VecVT, Index, VecLoad);
+    if (VecLoad && VecLoad->isSimple()) {
+      if (SDValue Scalarized = TLI.scalarizeExtractedVectorLoad(
+              ExtVT, SDLoc(N), VecVT, Index, VecLoad, DAG)) {
+        ++OpsNarrowed;
+        return Scalarized;
+      }
+    }
   }
 
   // Perform only after legalization to ensure build_vector / vector_shuffle
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index 56194e2614af2d..b1fb4947fb9451 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -12069,3 +12069,77 @@ SDValue TargetLowering::expandVectorNaryOpBySplitting(SDNode *Node,
   SDValue SplitOpHi = DAG.getNode(Opcode, DL, HiVT, HiOps);
   return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, SplitOpLo, SplitOpHi);
 }
+
+SDValue TargetLowering::scalarizeExtractedVectorLoad(EVT ResultVT,
+                                                     const SDLoc &DL,
+                                                     EVT InVecVT, SDValue EltNo,
+                                                     LoadSDNode *OriginalLoad,
+                                                     SelectionDAG &DAG) const {
+  assert(OriginalLoad->isSimple());
+
+  EVT VecEltVT = InVecVT.getVectorElementType();
+
+  // If the vector element type is not a multiple of a byte then we are unable
+  // to correctly compute an address to load only the extracted element as a
+  // scalar.
+  if (!VecEltVT.isByteSized())
+    return SDValue();
+
+  ISD::LoadExtType ExtTy =
+      ResultVT.bitsGT(VecEltVT) ? ISD::EXTLOAD : ISD::NON_EXTLOAD;
+  if (!isOperationLegalOrCustom(ISD::LOAD, VecEltVT) ||
+      !shouldReduceLoadWidth(OriginalLoad, ExtTy, VecEltVT))
+    return SDValue();
+
+  Align Alignment = OriginalLoad->getAlign();
+  MachinePointerInfo MPI;
+  if (auto *ConstEltNo = dyn_cast<ConstantSDNode>(EltNo)) {
+    int Elt = ConstEltNo->getZExtValue();
+    unsigned PtrOff = VecEltVT.getSizeInBits() * Elt / 8;
+    MPI = OriginalLoad->getPointerInfo().getWithOffset(PtrOff);
+    Alignment = commonAlignment(Alignment, PtrOff);
+  } else {
+    // Discard the pointer info except the address space because the memory
+    // operand can't represent this new access since the offset is variable.
+    MPI = MachinePointerInfo(OriginalLoad->getPointerInfo().getAddrSpace());
+    Alignment = commonAlignment(Alignment, VecEltVT.getSizeInBits() / 8);
+  }
+
+  unsigned IsFast = 0;
+  if (!allowsMemoryAccess(*DAG.getContext(), DAG.getDataLayout(), VecEltVT,
+                          OriginalLoad->getAddressSpace(), Alignment,
+                          OriginalLoad->getMemOperand()->getFlags(), &IsFast) ||
+      !IsFast)
+    return SDValue();
+
+  SDValue NewPtr =
+      getVectorElementPointer(DAG, OriginalLoad->getBasePtr(), InVecVT, EltNo);
+
+  // We are replacing a vector load with a scalar load. The new load must have
+  // identical memory op ordering to the original.
+  SDValue Load;
+  if (ResultVT.bitsGT(VecEltVT)) {
+    // If the result type of vextract is wider than the load, then issue an
+    // extending load instead.
+    ISD::LoadExtType ExtType = isLoadExtLegal(ISD::ZEXTLOAD, ResultVT, VecEltVT)
+                                   ? ISD::ZEXTLOAD
+                                   : ISD::EXTLOAD;
+    Load = DAG.getExtLoad(ExtType, DL, ResultVT, OriginalLoad->getChain(),
+                          NewPtr, MPI, VecEltVT, Alignment,
+                          OriginalLoad->getMemOperand()->getFlags(),
+                          OriginalLoad->getAAInfo());
+    DAG.makeEquivalentMemoryOrdering(OriginalLoad, Load);
+  } else {
+    // The result type is narrower or the same width as the vector element
+    Load = DAG.getLoad(VecEltVT, DL, OriginalLoad->getChain(), NewPtr, MPI,
+                       Alignment, OriginalLoad->getMemOperand()->getFlags(),
+                       OriginalLoad->getAAInfo());
+    DAG.makeEquivalentMemoryOrdering(OriginalLoad, Load);
+    if (ResultVT.bitsLT(VecEltVT))
+      Load = DAG.getNode(ISD::TRUNCATE, DL, ResultVT, Load);
+    else
+      Load = DAG.getBitcast(ResultVT, Load);
+  }
+
+  return Load;
+}

@arsenm arsenm marked this pull request as ready for review January 13, 2025 07:10
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAGCombiner::scalarizeExtractedVectorLoad still exists?

@arsenm arsenm force-pushed the users/arsenm/dag-move-scalarize-extracted-vector-load-to-tli branch from 9bedb14 to 65e9c1b Compare January 29, 2025 16:33
@@ -23357,7 +23276,7 @@ SDValue DAGCombiner::visitEXTRACT_VECTOR_ELT(SDNode *N) {
if (Elt == -1)
return DAG.getUNDEF(LVT);

return scalarizeExtractedVectorLoad(N, VecVT, Index, LN0);
return TLI.scalarizeExtractedVectorLoad(LVT, DL, VecVT, Index, LN0, DAG);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should OpsNarrowed be incremented here?

@arsenm arsenm force-pushed the users/arsenm/dag-move-scalarize-extracted-vector-load-to-tli branch from 65e9c1b to b7d320b Compare February 4, 2025 09:32
Copy link
Collaborator

@RKSimon RKSimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arsenm arsenm merged commit c55a765 into main Feb 4, 2025
8 checks passed
@arsenm arsenm deleted the users/arsenm/dag-move-scalarize-extracted-vector-load-to-tli branch February 4, 2025 10:37
@vvereschaka
Copy link
Contributor

@arsenm ,

looks like these changes cause a crash for llvm-libc++-static.cfg.in::simd_copy.pass.cpp libc++ test on ARMv7 cross toolchain builder with the following error message:

# | Assertion failed: N->getValueType(0) == RV.getValueType() && N->getNumValues() == 1 && "Type mismatch", file C:\buildbot\as-builder-1\x-armv7l\llvm-project\llvm\lib\CodeGen\SelectionDAG\DAGCombiner.cpp, line 1812
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
# | Stack dump:
# | 0.	Program arguments: C:\\buildbot\\as-builder-1\\x-armv7l\\build\\bin\\clang++.exe -cc1 -triple armv7-unknown-linux-gnueabihf -emit-obj -dumpdir C:\\buildbot\\as-builder-1\\x-armv7l\\build\\runtimes\\runtimes-armv7-unknown-linux-gnueabihf-bins\\libcxx\\test\\std\\experimental\\simd\\simd.class\\Output\\simd_copy.pass.cpp.dir\\t.tmp.exe- -disable-free -clear-ast-before-backend -main-file-name simd_copy.pass.cpp -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -target-cpu generic -target-feature +vfp2 -target-feature +vfp2sp -target-feature +vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature +vfp3sp -target-feature -fp16 -target-feature -vfp4 -target-feature -vfp4d16 -target-feature -vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature -fp-armv8d16 -target-feature -fp-armv8d16sp -target-feature -fp-armv8sp -target-feature -fullfp16 -target-feature +fp64 -target-feature +d32 -target-feature +neon -target-feature -sha2 -target-feature -aes -target-feature -fp16fml -target-abi aapcs-linux -mfloat-abi hard -debugger-tuning=gdb -fdebug-compilation-dir=C:\\buildbot\\as-builder-1\\x-armv7l\\build\\runtimes\\runtimes-armv7-unknown-linux-gnueabihf-bins\\libcxx\\test\\std\\experimental\\simd\\simd.class -fcoverage-compilation-dir=C:\\buildbot\\as-builder-1\\x-armv7l\\build\\runtimes\\runtimes-armv7-unknown-linux-gnueabihf-bins\\libcxx\\test\\std\\experimental\\simd\\simd.class -nostdinc++ -resource-dir C:\\buildbot\\as-builder-1\\x-armv7l\\build\\lib\\clang\\21 -I C:/buildbot/as-builder-1/x-armv7l/build/runtimes/runtimes-armv7-unknown-linux-gnueabihf-bins/libcxx/test-suite-install/include/c++/v1 -I C:/buildbot/as-builder-1/x-armv7l/build/runtimes/runtimes-armv7-unknown-linux-gnueabihf-bins/libcxx/test-suite-install/include/armv7-unknown-linux-gnueabihf/c++/v1 -I C:/buildbot/as-builder-1/x-armv7l/llvm-project/libcxx/test/support -D _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D _LIBCPP_ENABLE_EXPERIMENTAL -D _LIBCPP_HARDENING_MODE=_LIBCPP_HARDENING_MODE_NONE -isysroot c:/buildbot/fs/jetson-tk1-arm-ubuntu -internal-isystem C:\\buildbot\\as-builder-1\\x-armv7l\\build\\lib\\clang\\21\\include -internal-isystem c:/buildbot/fs/jetson-tk1-arm-ubuntu/usr/local/include -internal-isystem c:/buildbot/fs/jetson-tk1-arm-ubuntu/usr/lib/gcc/arm-linux-gnueabihf/7/../../../../arm-linux-gnueabihf/include -internal-externc-isystem c:/buildbot/fs/jetson-tk1-arm-ubuntu/usr/include/arm-linux-gnueabihf -internal-externc-isystem c:/buildbot/fs/jetson-tk1-arm-ubuntu/include -internal-externc-isystem c:/buildbot/fs/jetson-tk1-arm-ubuntu/usr/include -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wunused-template -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-noexcept-type -Wno-atomic-alignment -Wno-reserved-module-identifier -Wdeprecated-copy -Wdeprecated-copy-dtor -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -Wno-local-type-template-args -Wno-c++11-extensions -Wno-unknown-pragmas -Wno-pass-failed -Wno-mismatched-new-delete -Wno-redundant-move -Wno-self-move -Werror=thread-safety -Wuser-defined-warnings -std=c++26 -fdeprecated-macro -ferror-limit 19 -pthread -fno-signed-char -fgnuc-version=4.2.1 -fno-implicit-modules -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o C:\\buildbot\\as-builder-1\\x-armv7l\\build\\lit-tmp-0jxgfhqp\\simd_copy-01f091.o -x c++ C:\\buildbot\\as-builder-1\\x-armv7l\\llvm-project\\libcxx\\test\\std\\experimental\\simd\\simd.class\\simd_copy.pass.cpp
# | 1.	<eof> parser at end of file
# | 2.	Code generation
# | 3.	Running pass 'Function Pass Manager' on module 'C:\buildbot\as-builder-1\x-armv7l\llvm-project\libcxx\test\std\experimental\simd\simd.class\simd_copy.pass.cpp'.
# | 4.	Running pass 'ARM Instruction Selection' on function '@_ZNSt12experimental14parallelism_v217__simd_operationsIxNS0_8simd_abi9__vec_extILi2EEEE7__storeB8ne210000IcEEvNS0_14__simd_storageIxS4_EEPT_'
# | Exception Code: 0x80000003

full error log: https://lab.llvm.org/buildbot/#/builders/38/builds/2166/steps/15/logs/FAIL__llvm-libc__-static_cfg_in__simd_copy_pass_cp

the first failed build: https://lab.llvm.org/buildbot/#/builders/38/builds/2166

Reverting of this commit fixes the test.

Would you take care of it?

@vvereschaka
Copy link
Contributor

@arsenm any updates?

@arsenm
Copy link
Contributor Author

arsenm commented Feb 6, 2025

@arsenm any updates?

Is there a way to get the IR dump from this?

@vvereschaka
Copy link
Contributor

I have an access to the build host. If you will point where/how to get IR for the test I'll get it.

@arsenm
Copy link
Contributor Author

arsenm commented Feb 6, 2025

I have an access to the build host. If you will point where/how to get IR for the test I'll get it.

Add -save-temps to the failing clang invocation

@vvereschaka
Copy link
Contributor

@arsenm here
dag-armv7-simd_copy.zip

@arsenm
Copy link
Contributor Author

arsenm commented Feb 6, 2025

@arsenm here dag-armv7-simd_copy.zip

Should be fixed by c268a3f

@vvereschaka
Copy link
Contributor

Thank you @arsenm , fixed.

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
SimplifyDemandedVectorElts should be able to use this on loads
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants