Skip to content

[StackColoring] Change the StackColoring logic + enables it to handle spills #143800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

Ralender
Copy link
Collaborator

Here is a patch set I have been working on to reduce stack usage.
It starts with a bunch if trivial NFC-like commits, that can be easily handled separately
Then a replacement of the StackColoring algorithm, with all the original analysis and assumption kept.
Finally changes to allow the new algo in StackColoring to operate on spill slots

I know that lifetime annotations are a mess, so I tried to not change any of the assumptions made by the old StackColoring.

here is some examples of the current stack coloring's shortcoming:

void use(...);
void example(int test) {
  int e[4];
  int a, b, c, d;
  if (test)
    return use(&e);
  else
    return use(&a, &b, &c, &d);
}

one of the int will be merged with the array, and thats it.
when all 4 ints could share the space with the array

void bar(char *, int);
void foo(bool var) {
A: {
  char z[4096];
  bar(z, 0);
}

  char *p;
  char x[4096];
  char y[4096];
  if (var) {
    p = x;
  } else {
    bar(y, 1);
    p = y + 1024;
  }
B:
  bar(p, 2);
}

The current StackColoring is not capable of re-using the same storage for all 3 of these arrays
Even if its the example given in the comments of StackColoring.

Both those issues are fixed by the Algorithm change

StackColoring had TODO's about this issue and handling spill slots at the same time as regular slots

This patch set adds 2 options used to do the testing.
-mllvm -new-stack-coloring, this swaps out the algorithm for stack-coloring.
-mllvm -merged-stack-coloring, this moves stack-coloring after regalloc instead of stack-slot-coloring

I have not yet updated tests because all of the work is behind options
and didn't added golden tests YET because the code has kept changing. But I obviously will add tests. before any merging.

Testing Done on this patch:

  • Basic correctness checked on:
    • x86_64-linux-gnu
    • i386-linux-gnu
    • x86_64-windows-pc
    • aarch64-linux-gnu
    • arm-linux-gnueabihf
    • riscv64-linux-gnu
    • I need to do basic correctness checking on targets where the stack grows up.
  • Deeper testing and benchmarking for x86_64-linux-gnu:
    • Passes the correctness part of the llvm-test-suite
    • All Benchmark and example presented.

Benchmark results (mean):
Note: all benchmarks are done with release with assert build, on the llvm-test-suite
With max-candidates=unlimited, because it doesn't seem to have much impact on compile-time.

  • exec time: 0%
  • compile-time: 0%
  • binary size: +0.1%
    I suspect this is caused by some frequently accessed slot not being close to the sp,
    leading to larger variants of the instruction being used. But I need to investigate
  • prologepilog.NumBytesStackSpace: -1.5% and no regressions

Note: The cleanup of the git history was only partial for now, its intended mostly help review.

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp,h -- llvm/include/llvm/CodeGen/LiveStacks.h llvm/include/llvm/CodeGen/MachineFrameInfo.h llvm/include/llvm/CodeGen/MachineInstr.h llvm/lib/CodeGen/LiveStacks.cpp llvm/lib/CodeGen/MachineFrameInfo.cpp llvm/lib/CodeGen/MachineInstr.cpp llvm/lib/CodeGen/PrologEpilogInserter.cpp llvm/lib/CodeGen/StackColoring.cpp llvm/lib/CodeGen/StackSlotColoring.cpp llvm/lib/CodeGen/TargetPassConfig.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/CodeGen/LiveStacks.h b/llvm/include/llvm/CodeGen/LiveStacks.h
index 6b0b23bd7..e87f90c28 100644
--- a/llvm/include/llvm/CodeGen/LiveStacks.h
+++ b/llvm/include/llvm/CodeGen/LiveStacks.h
@@ -41,12 +41,12 @@ class LiveStacks {
   VNInfo::Allocator VNInfoAllocator;
 
   int StartIdx = -1;
-  SmallVector<LiveInterval*> S2LI;
+  SmallVector<LiveInterval *> S2LI;
   SmallVector<const TargetRegisterClass *> S2RC;
 
 public:
-  using iterator = SmallVector<LiveInterval*>::iterator;
-  using const_iterator = SmallVector<LiveInterval*>::const_iterator;
+  using iterator = SmallVector<LiveInterval *>::iterator;
+  using const_iterator = SmallVector<LiveInterval *>::const_iterator;
 
   const_iterator begin() const { return S2LI.begin(); }
   const_iterator end() const { return S2LI.end(); }
diff --git a/llvm/include/llvm/CodeGen/MachineFrameInfo.h b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
index 8041b2829..fdb2fbd13 100644
--- a/llvm/include/llvm/CodeGen/MachineFrameInfo.h
+++ b/llvm/include/llvm/CodeGen/MachineFrameInfo.h
@@ -123,7 +123,7 @@ public:
   static constexpr int NoUnderlyingSlot = std::numeric_limits<int>::min();
   static constexpr int IsUnderlyingSlot = std::numeric_limits<int>::min() + 1;
 
-  private:
+private:
   // Represent a single object allocated on the stack.
   struct StackObject {
     // The offset of this object from the stack pointer on entry to
@@ -542,7 +542,7 @@ public:
            "Invalid Object Idx!");
     assert(!isDeadObjectIndex(ObjectIdx) &&
            "Getting frame offset for a dead object?");
-    return Objects[ObjectIdx+NumFixedObjects].Offset;
+    return Objects[ObjectIdx + NumFixedObjects].Offset;
   }
 
   bool isObjectZExt(int ObjectIdx) const {
@@ -576,7 +576,7 @@ public:
            "Invalid Object Idx!");
     assert(!isDeadObjectIndex(ObjectIdx) &&
            "Setting frame offset for a dead object?");
-    Objects[ObjectIdx+NumFixedObjects].Offset = Offset;
+    Objects[ObjectIdx + NumFixedObjects].Offset = Offset;
   }
 
   SSPLayoutKind getObjectSSPLayout(int ObjectIdx) const {
diff --git a/llvm/lib/CodeGen/LiveStacks.cpp b/llvm/lib/CodeGen/LiveStacks.cpp
index 761c5a71f..ea158b2d9 100644
--- a/llvm/lib/CodeGen/LiveStacks.cpp
+++ b/llvm/lib/CodeGen/LiveStacks.cpp
@@ -56,7 +56,7 @@ LiveStacks::getOrCreateInterval(int Slot, const TargetRegisterClass *RC) {
   assert(Slot >= 0 && "Spill slot indice must be >= 0");
   if (StartIdx == -1)
     StartIdx = Slot;
-  
+
   int Idx = Slot - StartIdx;
   assert(Idx >= 0 && "Slot not in order ?");
   if (Idx < (int)S2LI.size()) {
diff --git a/llvm/lib/CodeGen/MachineFrameInfo.cpp b/llvm/lib/CodeGen/MachineFrameInfo.cpp
index 5b9974962..4e65f8949 100644
--- a/llvm/lib/CodeGen/MachineFrameInfo.cpp
+++ b/llvm/lib/CodeGen/MachineFrameInfo.cpp
@@ -13,7 +13,6 @@
 #include "llvm/CodeGen/MachineFrameInfo.h"
 
 #include "llvm/ADT/BitVector.h"
-#include "llvm/IR/Instructions.h"
 #include "llvm/CodeGen/MachineFunction.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/TargetFrameLowering.h"
@@ -21,6 +20,7 @@
 #include "llvm/CodeGen/TargetRegisterInfo.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/Config/llvm-config.h"
+#include "llvm/IR/Instructions.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
 #include <cassert>
@@ -100,8 +100,7 @@ int MachineFrameInfo::CreateFixedObject(uint64_t Size, int64_t Offset,
   return -++NumFixedObjects;
 }
 
-int MachineFrameInfo::CreateFixedSpillStackObject(uint64_t Size,
-                                                  int64_t Offset,
+int MachineFrameInfo::CreateFixedSpillStackObject(uint64_t Size, int64_t Offset,
                                                   bool IsImmutable) {
   Align Alignment =
       commonAlignment(ForcedRealign ? Align(1) : StackAlignment, Offset);
diff --git a/llvm/lib/CodeGen/PrologEpilogInserter.cpp b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
index d006e3984..7a44b3937 100644
--- a/llvm/lib/CodeGen/PrologEpilogInserter.cpp
+++ b/llvm/lib/CodeGen/PrologEpilogInserter.cpp
@@ -696,9 +696,9 @@ void PEIImpl::spillCalleeSavedRegs(MachineFunction &MF) {
 
 static inline void UpdateOffset(MachineFrameInfo &MFI, int FrameIdx,
                                 int64_t Offset) {
-    LLVM_DEBUG(dbgs() << "alloc FI(" << FrameIdx << ") at SP[" << Offset
-                      << "]\n");
-    MFI.setObjectOffset(FrameIdx, Offset); // Set the computed offset
+  LLVM_DEBUG(dbgs() << "alloc FI(" << FrameIdx << ") at SP[" << Offset
+                    << "]\n");
+  MFI.setObjectOffset(FrameIdx, Offset); // Set the computed offset
 }
 
 /// AdjustStackOffset - Helper function used to adjust the stack frame offset.

@thanm
Copy link
Contributor

thanm commented Jun 11, 2025

I am willing to take a look, but for various reasons I won't be able to dig in for a couple of months (I am away from my work computers). Hope you can make progress without my eyeballs for now. Thanks.

@efriedma-quic
Copy link
Collaborator

The cleanup commits look reasonable; please submit them separately.

The IR representation of lifetimes has some pretty significant issues (see #45725 (comment) , #51838, #132085, #43674). Will this cause those issues to be exposed more frequently?

@Ralender
Copy link
Collaborator Author

Ralender commented Jun 12, 2025

The cleanup commits look reasonable; please submit them separately.

Yes,

The IR representation of lifetimes has some pretty significant issues (see #45725 (comment) , #51838, #132085, #43674). Will this cause those issues to be exposed more frequently?

Here are my thoughts on the problems in general, I think there is 2 distinct issues:
Pointer comparisons self-consistency:
The only viable solution I see is tagging allocas, that need to stay distinct from each other and ensuring alias analysis and StackColoring respect it. and whenever the IR folds a icmp based on distinct provenance, it tags the allocas as distinct from each other. this keeps folding icmp simple. and it would not be legal to remove a tag from an alloca under any circumstances. I/We need to experiment, but it may not have much of negative impact.

IR transformation moving accesses out of lifetime, causing UB:
I have only seen it for loads, so maybe loads outside of lifetime could be defined as poison instead of UB, as has been proposed.
maybe an other solution is to say that lifetime annotation only have meaning for captures, and accesses outside of it are legal. alias analysis could still assume that a call outside of lifetime cannot access an escaped pointer, but potential load or store outside of lifetime would cause extension of the lifetime. Captures scopes cannot be changed by optimizations and everything else should analyzable even if we may need to be pessimist when multiple objects could be accessed by a load. the consequence is more complex analysis to figure out the real lifetime, and being pessimist while doing so may cause some regressions.

About the question, original question:
The old and new StackColoring should always agree on whether 2 slots can overlap or not.
I checked it at some point by having an assert during development (it was removed since).
That said, the new algorithm reduces stack space but getting more overlaps between slots.
So some slots that would not have overlapped with the old algo because of different decisions may overlap now.
Spills are unaffected by all the lifetime issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants