[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough #91969

Pierre-vh · 2024-05-13T14:09:18Z

I was benchmarking the MatchTable when I found that getConstantVRegValWithLookThrough took a non-negligible amount of time, about 7.5% of all of AArch64PreLegalizerCombinerImpl::tryCombineAll.

I decided to take a closer look to see if I could squeeze some performance out of it, and I landed on a few changes that:

Avoid copying APint unnecessarily, especially returning std::optional can be expensive when a out parameter also works.
Avoid indirect call by using templated function pointers instead of function_ref/std::function

Both of those changes seem to speedup this function by about 50%, but my benchmarking (perf record) seems inconsistent (so take measurements with a grain of salt), I saw as high as 4.5% and as low as 2% for this function on the exact same input after the changes, but it never got close again to 7% in a few runs so this looks like a stable improvement.

I was benchmarking the MatchTable when I found that getConstantVRegValWithLookThrough took a non-negligible amount of time, about 7.5% of all of AArch64PreLegalizerCombinerImpl::tryCombineAll. I decided to take a closer look to see if I could squeeze some performance out of it, and I landed on a few changes that: - Avoid copying APint unnecessarily, especially returning std::optional<APInt> can be expensive when a out parameter also works. - Avoid indirect call by using templated function pointers instead of function_ref/std::function Both of those changes seem to speedup this function by about 50%, but my benchmarking (`perf record`) seems inconsistent (so take measurements with a grain of salt), I saw as high as 4.5% and as low as 2% for this function on the exact same input after the changes, but it never got close again to 7% in a few runs so this looks like a stable improvement.

llvmbot · 2024-05-13T14:09:53Z

@llvm/pr-subscribers-llvm-globalisel

Author: Pierre van Houtryve (Pierre-vh)

Changes

I was benchmarking the MatchTable when I found that getConstantVRegValWithLookThrough took a non-negligible amount of time, about 7.5% of all of AArch64PreLegalizerCombinerImpl::tryCombineAll.

I decided to take a closer look to see if I could squeeze some performance out of it, and I landed on a few changes that:

Avoid copying APint unnecessarily, especially returning std::optional<APInt> can be expensive when a out parameter also works.
Avoid indirect call by using templated function pointers instead of function_ref/std::function

Both of those changes seem to speedup this function by about 50%, but my benchmarking (perf record) seems inconsistent (so take measurements with a grain of salt), I saw as high as 4.5% and as low as 2% for this function on the exact same input after the changes, but it never got close again to 7% in a few runs so this looks like a stable improvement.

Full diff: https://github.com/llvm/llvm-project/pull/91969.diff

1 Files Affected:

(modified) llvm/lib/CodeGen/GlobalISel/Utils.cpp (+44-32)

diff --git a/llvm/lib/CodeGen/GlobalISel/Utils.cpp b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
index 4e3781cb4e9d5..cd5dc0e01ed0e 100644
--- a/llvm/lib/CodeGen/GlobalISel/Utils.cpp
+++ b/llvm/lib/CodeGen/GlobalISel/Utils.cpp
@@ -313,13 +313,22 @@ llvm::getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI) {
 
 namespace {
 
-typedef std::function<bool(const MachineInstr *)> IsOpcodeFn;
-typedef std::function<std::optional<APInt>(const MachineInstr *MI)> GetAPCstFn;
-
-std::optional<ValueAndVReg> getConstantVRegValWithLookThrough(
-    Register VReg, const MachineRegisterInfo &MRI, IsOpcodeFn IsConstantOpcode,
-    GetAPCstFn getAPCstValue, bool LookThroughInstrs = true,
-    bool LookThroughAnyExt = false) {
+// This function is used in many places, and as such, it has some
+// micro-optimizations to try and make it as fast as it can be.
+//
+// - We use template arguments to avoid an indirect call caused by passing a
+// function_ref/std::function
+// - GetAPCstValue does not return std::optional<APInt> as that's expensive.
+// Instead it returns true/false and places the result in a pre-constructed
+// APInt.
+//
+// Please change this function carefully and benchmark your changes.
+template <bool (*IsConstantOpcode)(const MachineInstr *),
+          bool (*GetAPCstValue)(const MachineInstr *MI, APInt &)>
+std::optional<ValueAndVReg>
+getConstantVRegValWithLookThrough(Register VReg, const MachineRegisterInfo &MRI,
+                                  bool LookThroughInstrs = true,
+                                  bool LookThroughAnyExt = false) {
   SmallVector<std::pair<unsigned, unsigned>, 4> SeenOpcodes;
   MachineInstr *MI;
 
@@ -353,26 +362,25 @@ std::optional<ValueAndVReg> getConstantVRegValWithLookThrough(
   if (!MI || !IsConstantOpcode(MI))
     return std::nullopt;
 
-  std::optional<APInt> MaybeVal = getAPCstValue(MI);
-  if (!MaybeVal)
+  APInt Val;
+  if (!GetAPCstValue(MI, Val))
     return std::nullopt;
-  APInt &Val = *MaybeVal;
-  for (auto [Opcode, Size] : reverse(SeenOpcodes)) {
-    switch (Opcode) {
+  for (auto &Pair : reverse(SeenOpcodes)) {
+    switch (Pair.first) {
     case TargetOpcode::G_TRUNC:
-      Val = Val.trunc(Size);
+      Val = Val.trunc(Pair.second);
       break;
     case TargetOpcode::G_ANYEXT:
     case TargetOpcode::G_SEXT:
-      Val = Val.sext(Size);
+      Val = Val.sext(Pair.second);
       break;
     case TargetOpcode::G_ZEXT:
-      Val = Val.zext(Size);
+      Val = Val.zext(Pair.second);
       break;
     }
   }
 
-  return ValueAndVReg{Val, VReg};
+  return ValueAndVReg{std::move(Val), VReg};
 }
 
 bool isIConstant(const MachineInstr *MI) {
@@ -394,42 +402,46 @@ bool isAnyConstant(const MachineInstr *MI) {
   return Opc == TargetOpcode::G_CONSTANT || Opc == TargetOpcode::G_FCONSTANT;
 }
 
-std::optional<APInt> getCImmAsAPInt(const MachineInstr *MI) {
+bool getCImmAsAPInt(const MachineInstr *MI, APInt &Result) {
   const MachineOperand &CstVal = MI->getOperand(1);
-  if (CstVal.isCImm())
-    return CstVal.getCImm()->getValue();
-  return std::nullopt;
+  if (!CstVal.isCImm())
+    return false;
+  Result = CstVal.getCImm()->getValue();
+  return true;
 }
 
-std::optional<APInt> getCImmOrFPImmAsAPInt(const MachineInstr *MI) {
+bool getCImmOrFPImmAsAPInt(const MachineInstr *MI, APInt &Result) {
   const MachineOperand &CstVal = MI->getOperand(1);
   if (CstVal.isCImm())
-    return CstVal.getCImm()->getValue();
-  if (CstVal.isFPImm())
-    return CstVal.getFPImm()->getValueAPF().bitcastToAPInt();
-  return std::nullopt;
+    Result = CstVal.getCImm()->getValue();
+  else if (CstVal.isFPImm())
+    Result = CstVal.getFPImm()->getValueAPF().bitcastToAPInt();
+  else
+    return false;
+  return true;
 }
 
 } // end anonymous namespace
 
 std::optional<ValueAndVReg> llvm::getIConstantVRegValWithLookThrough(
     Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs) {
-  return getConstantVRegValWithLookThrough(VReg, MRI, isIConstant,
-                                           getCImmAsAPInt, LookThroughInstrs);
+  return getConstantVRegValWithLookThrough<isIConstant, getCImmAsAPInt>(
+      VReg, MRI, LookThroughInstrs);
 }
 
 std::optional<ValueAndVReg> llvm::getAnyConstantVRegValWithLookThrough(
     Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs,
     bool LookThroughAnyExt) {
-  return getConstantVRegValWithLookThrough(
-      VReg, MRI, isAnyConstant, getCImmOrFPImmAsAPInt, LookThroughInstrs,
-      LookThroughAnyExt);
+  return getConstantVRegValWithLookThrough<isAnyConstant,
+                                           getCImmOrFPImmAsAPInt>(
+      VReg, MRI, LookThroughInstrs, LookThroughAnyExt);
 }
 
 std::optional<FPValueAndVReg> llvm::getFConstantVRegValWithLookThrough(
     Register VReg, const MachineRegisterInfo &MRI, bool LookThroughInstrs) {
-  auto Reg = getConstantVRegValWithLookThrough(
-      VReg, MRI, isFConstant, getCImmOrFPImmAsAPInt, LookThroughInstrs);
+  auto Reg =
+      getConstantVRegValWithLookThrough<isFConstant, getCImmOrFPImmAsAPInt>(
+          VReg, MRI, LookThroughInstrs);
   if (!Reg)
     return std::nullopt;
   return FPValueAndVReg{getConstantFPVRegVal(Reg->VReg, MRI)->getValueAPF(),

llvm/lib/CodeGen/GlobalISel/Utils.cpp

qcolombet · 2024-05-13T15:41:02Z

Do you know how each of these items contributes to the compile time individually:

Avoid copying APint unnecessarily, especially returning std::optional can be expensive when an out parameter also works.
Avoid indirect call by using templated function pointers instead of function_ref/std::function

The use of optional versus bool + out parameter is what I think we've been moving towards in the past few years within the LLVM code base and I'm wondering if that means this is generally bad for performance.

Pierre-vh · 2024-05-13T16:00:10Z

Do you know how each of these items contributes to the compile time individually:

Avoid copying APint unnecessarily, especially returning std::optional can be expensive when an out parameter also works.

Avoid indirect call by using templated function pointers instead of function_ref/std::function

The use of optional versus bool + out parameter is what I think we've been moving towards in the past few years within the LLVM code base and I'm wondering if that means this is generally bad for performance.

I can come back to this and do more thorough testing if you want a precise answer, but from what i remember (take with a big grain of salt): the biggest contributor was removing the indirect call (which enabled inlining as well probably), as the call occurred within a loop.

The removal of optional brought an additional nice improvement (maybe 1/3rd of the indirect call removal?) mostly because it removed APInt copies, and APInt copies aren't trivial.

Finally this patch needs to be put in context: this is a micro-optimization I did for fun and profit (because it was a low-hanging fruit). The gain is a single digit % improvement on the AArch64PreLegalizerCombine match table execution. That part of the compilation process is in itself like 5-7% of the build in my test, so in the best case this saved 0.2% of a full build.

I wouldn't worry about using optional more, but I'd just be careful with it when dealing with non-trivial types (especially large types). It's easy to lose optimizations like copy elision with it. (hot take) I'd also be wary of using optional just "out of convenience" and ending up with sub-optimal code when sometimes returning a bool & using an out parameter just makes more sense and avoids a copy or two.

aemerson

How are you benchmarking MatchTable for these changes? Not saying you have to do this, but I'd be curious to see if we still see gains with LTO builds of llvm.

Pierre-vh · 2024-05-14T06:25:46Z

How are you benchmarking MatchTable for these changes? Not saying you have to do this, but I'd be curious to see if we still see gains with LTO builds of llvm.

I was writing a comment but then it got too long so I just added it to the thread, let's continue the discussion there: https://discourse.llvm.org/t/rfc-measuring-globalisel-compile-time-performance/78412/19

qcolombet · 2024-05-14T12:04:37Z

Do you know how each of these items contributes to the compile time individually:

Avoid copying APint unnecessarily, especially returning std::optional can be expensive when an out parameter also works.

Avoid indirect call by using templated function pointers instead of function_ref/std::function

The use of optional versus bool + out parameter is what I think we've been moving towards in the past few years within the LLVM code base and I'm wondering if that means this is generally bad for performance.

I can come back to this and do more thorough testing if you want a precise answer, but from what i remember (take with a big grain of salt): the biggest contributor was removing the indirect call (which enabled inlining as well probably), as the call occurred within a loop.

The removal of optional brought an additional nice improvement (maybe 1/3rd of the indirect call removal?) mostly because it removed APInt copies, and APInt copies aren't trivial.

Finally this patch needs to be put in context: this is a micro-optimization I did for fun and profit (because it was a low-hanging fruit). The gain is a single digit % improvement on the AArch64PreLegalizerCombine match table execution. That part of the compilation process is in itself like 5-7% of the build in my test, so in the best case this saved 0.2% of a full build.

I wouldn't worry about using optional more, but I'd just be careful with it when dealing with non-trivial types (especially large types). It's easy to lose optimizations like copy elision with it. (hot take) I'd also be wary of using optional just "out of convenience" and ending up with sub-optimal code when sometimes returning a bool & using an out parameter just makes more sense and avoids a copy or two.

Thanks for the details!

I was afraid the optional pattern was a time sink in general, but like you said here we are looking at a micro optimization.

Pierre-vh requested review from aemerson, arsenm and jayfoad May 13, 2024 14:09

llvmbot added the llvm:globalisel label May 13, 2024

arsenm approved these changes May 13, 2024

View reviewed changes

llvm/lib/CodeGen/GlobalISel/Utils.cpp Show resolved Hide resolved

aemerson approved these changes May 13, 2024

View reviewed changes

Pierre-vh merged commit 922fafa into llvm:main May 14, 2024

Pierre-vh deleted the microopt-getConstantVRegVal branch May 14, 2024 06:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough #91969

[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough #91969

Uh oh!

Pierre-vh commented May 13, 2024

Uh oh!

llvmbot commented May 13, 2024

Uh oh!

Uh oh!

qcolombet commented May 13, 2024

Uh oh!

Pierre-vh commented May 13, 2024 •

edited

Loading

Uh oh!

aemerson left a comment

Uh oh!

Pierre-vh commented May 14, 2024

Uh oh!

qcolombet commented May 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough #91969

[GlobalISel] Micro-optimize getConstantVRegValWithLookThrough #91969

Uh oh!

Conversation

Pierre-vh commented May 13, 2024

Uh oh!

llvmbot commented May 13, 2024

Uh oh!

Uh oh!

qcolombet commented May 13, 2024

Uh oh!

Pierre-vh commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aemerson left a comment

Choose a reason for hiding this comment

Uh oh!

Pierre-vh commented May 14, 2024

Uh oh!

qcolombet commented May 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Pierre-vh commented May 13, 2024 •

edited

Loading