Skip to content

[BPI] Cache LoopExitBlocks to improve compile time #93451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 3, 2024

Conversation

Enna1
Copy link
Contributor

@Enna1 Enna1 commented May 27, 2024

The LoopBlock stored in LoopWorkList consist of basic block and its loop data information. When iterate LoopWorkList, if estimated weight of a loop is not stored in EstimatedLoopWeight, getLoopExitBlocks() is called to get all exit blocks of the loop. The estimated weight of a loop is calculated by iterating over edges leading from basic block to all exit blocks of the loop. If at least one edge has unknown estimated weight, the estimated weight of loop is unknown and will not be stored in EstimatedLoopWeight. LoopWorkList can contain different blocks in a same loop, so there is wasted work that calls getLoopExitBlocks() for same loop multiple times.

Since computing the exit blocks of loop is expensive and the loop structure is not mutated in Branch Probability Analysis, we can cache the result and improve compile time.

With this change, the overall compile time for a file containing a very large loop is dropped by around 82%.

The `LoopBlock` stored in `LoopWorkList` consist of basic block and its loop
data information. When iterate `LoopWorkList`, if estimated weight of a loop
is not stored in `EstimatedLoopWeight`, `getLoopExitBlocks()` is called to get
all exit blocks of the loop. The estimated weight of a loop is calculated by
iterating over edges leading from basic block to all exit blocks of the loop.
If at least one edge has unknown estimated weight, the estimated weight of loop
is unknown and will not be stored in `EstimatedLoopWeight`.
`LoopWorkList` can contain different blocks in a same loop, so there is
wasted work that calls `getLoopExitBlocks()` for same loop multiple times.

Since computing the exit blocks of loop is expensive and the loop structure is
not mutated in Branch Probability Analysis, we can cache the result and improve
compile time.

With this change, the overall compile time for a file containing a
very large loop is dropped by around 82%.
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable to me, but I'm not familiar with this analysis.

@@ -343,6 +343,9 @@ class BranchProbabilityInfo {
/// Keeps mapping of a loop to estimated weight to enter the loop.
SmallDenseMap<LoopData, uint32_t> EstimatedLoopWeight;

/// Keeps mapping of a Loop to its "exit" blocks.
SmallDenseMap<LoopData, SmallVector<BasicBlock *, 4>> LoopExitBlocks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a local variable in computeEestimateBlockWeight? Just to make it clear that there are no invalidation concerns...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! done.

continue;

SmallVector<BasicBlock *, 4> Exits;
getLoopExitBlocks(LoopBB, Exits);
if (!LoopExitBlocks.count(LD))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use find() here to avoid the duplicate lookup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't quite get it.
What do you think of using try_emplace() ?

      auto Res = LoopExitBlocks.try_emplace(LD);
      SmallVectorImpl<BasicBlock *> &Exits = Res.first->second;
      if (Res.second)
        getLoopExitBlocks(LoopBB, Exits);
      auto LoopWeight = getMaxEstimatedEdgeWeight(
          LoopBB, make_range(Exits.begin(), Exits.end()));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant using find() to avoid doing both the count() and then below another lookup using LoopExitBlocks[LD].

Your suggestion of using try_emplace also covers the insertion path, so I guess that's even better.

@Enna1 Enna1 marked this pull request as ready for review May 27, 2024 10:40
@llvmbot
Copy link
Member

llvmbot commented May 27, 2024

@llvm/pr-subscribers-llvm-analysis

Author: Enna1 (Enna1)

Changes

The LoopBlock stored in LoopWorkList consist of basic block and its loop data information. When iterate LoopWorkList, if estimated weight of a loop is not stored in EstimatedLoopWeight, getLoopExitBlocks() is called to get all exit blocks of the loop. The estimated weight of a loop is calculated by iterating over edges leading from basic block to all exit blocks of the loop. If at least one edge has unknown estimated weight, the estimated weight of loop is unknown and will not be stored in EstimatedLoopWeight. LoopWorkList can contain different blocks in a same loop, so there is wasted work that calls getLoopExitBlocks() for same loop multiple times.

Since computing the exit blocks of loop is expensive and the loop structure is not mutated in Branch Probability Analysis, we can cache the result and improve compile time.

With this change, the overall compile time for a file containing a very large loop is dropped by around 82%.


Full diff: https://github.com/llvm/llvm-project/pull/93451.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Analysis/BranchProbabilityInfo.h (+3)
  • (modified) llvm/lib/Analysis/BranchProbabilityInfo.cpp (+6-5)
diff --git a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
index 91e1872e9bd6f..9713629e1bf02 100644
--- a/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
+++ b/llvm/include/llvm/Analysis/BranchProbabilityInfo.h
@@ -343,6 +343,9 @@ class BranchProbabilityInfo {
   /// Keeps mapping of a loop to estimated weight to enter the loop.
   SmallDenseMap<LoopData, uint32_t> EstimatedLoopWeight;
 
+  /// Keeps mapping of a Loop to its "exit" blocks.
+  SmallDenseMap<LoopData, SmallVector<BasicBlock *, 4>> LoopExitBlocks;
+
   /// Helper to construct LoopBlock for \p BB.
   LoopBlock getLoopBlock(const BasicBlock *BB) const {
     return LoopBlock(BB, *LI, *SccI.get());
diff --git a/llvm/lib/Analysis/BranchProbabilityInfo.cpp b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
index cd3e3a4991327..4191fb64b88ec 100644
--- a/llvm/lib/Analysis/BranchProbabilityInfo.cpp
+++ b/llvm/lib/Analysis/BranchProbabilityInfo.cpp
@@ -828,12 +828,13 @@ void BranchProbabilityInfo::computeEestimateBlockWeight(
   do {
     while (!LoopWorkList.empty()) {
       const LoopBlock LoopBB = LoopWorkList.pop_back_val();
-
-      if (EstimatedLoopWeight.count(LoopBB.getLoopData()))
+      const LoopData LD = LoopBB.getLoopData();
+      if (EstimatedLoopWeight.count(LD))
         continue;
 
-      SmallVector<BasicBlock *, 4> Exits;
-      getLoopExitBlocks(LoopBB, Exits);
+      if (!LoopExitBlocks.count(LD))
+        getLoopExitBlocks(LoopBB, LoopExitBlocks[LD]);
+      const SmallVectorImpl<BasicBlock *> &Exits = LoopExitBlocks[LD];
       auto LoopWeight = getMaxEstimatedEdgeWeight(
           LoopBB, make_range(Exits.begin(), Exits.end()));
 
@@ -842,7 +843,7 @@ void BranchProbabilityInfo::computeEestimateBlockWeight(
         if (LoopWeight <= static_cast<uint32_t>(BlockExecWeight::UNREACHABLE))
           LoopWeight = static_cast<uint32_t>(BlockExecWeight::LOWEST_NON_ZERO);
 
-        EstimatedLoopWeight.insert({LoopBB.getLoopData(), *LoopWeight});
+        EstimatedLoopWeight.insert({LD, *LoopWeight});
         // Add all blocks entering the loop into working list.
         getLoopEnterBlocks(LoopBB, BlockWorkList);
       }

@Enna1 Enna1 requested review from WenleiHe and xur-llvm May 28, 2024 02:57
Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

Copy link
Member

@WenleiHe WenleiHe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @apolloww

@@ -828,12 +829,14 @@ void BranchProbabilityInfo::computeEestimateBlockWeight(
do {
while (!LoopWorkList.empty()) {
const LoopBlock LoopBB = LoopWorkList.pop_back_val();

if (EstimatedLoopWeight.count(LoopBB.getLoopData()))
const LoopData LD = LoopBB.getLoopData();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a const-reference?

@Enna1 Enna1 merged commit f779ec7 into main Jun 3, 2024
7 checks passed
@Enna1 Enna1 deleted the users/Enna1/bpi-cache-loop-exitblocks branch June 3, 2024 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants