Skip to content

[analyzer] Workaround for slowdown spikes (unintended scope increase) #136720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 12, 2025

Conversation

NagyDonat
Copy link
Contributor

@NagyDonat NagyDonat commented Apr 22, 2025

Recently some users reported that they observed large increases of runtime (up to +600% on some translation units) when they upgraded to a more recent (slightly patched, internal) clang version. Bisection revealed that the bulk of this increase was probably caused by my earlier commit bb27d5e ("Don't assume third iteration in loops").

As I evaluated that earlier commit on several open source project, it turns out that on average it's runtime-neutral (or slightly helpful: it reduced the total analysis time by 1.5%) but it can cause runtime spikes on some code: in particular it more than doubled the time to analyze tmux (one of the smaller test projects).

Further profiling and investigation proved that these spikes were caused by an increase of analysis scope because there was an heuristic that placed functions on a "don't inline this" blacklist if they reached the -analyzer-max-loop limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is unjustified and arbitrary, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope.

However, the runtime increases are annoying for the users whose project is affected, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of ambiguous loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).

Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári (tigbr) who helped me in profiling.

Recently some users reported that the runtime of the static analyzer
drastically increased on their codebase when they upgraded to a more
recent (sligthly patched, internal) clang version: some of their
translation units got more than +600% runtime (yes, with two zeroes) and
IIUC the average increase was also high.

Bisection revealed that the bulk of the increase was caused by my
earlier commit bb27d5e ("Don't assume
third iteration in loops") and I verified that the slowdown affects the
downstream clang. (I used tmux as an arbitrary easy-to-analyze open
source project and measured that reverting this commit reduces the
total runtime by more than 50% compared to a recent main revision.)

Further profiling and investigation proved that this increase is
caused by an _increase of analysis scope_ because there was an arbitrary
heuristic that placed functions on a "don't inline this" blacklist if
they reached the `-analyzer-max-loop` limit (anywhere, on any one
execution path) -- which became significantly rarer when my commit
ensured the analyzer no longer "just assumes" four iterations. (With
more inlining significantly more entry points use up their allocated
budgets, which leads to the increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is
unjustified, because reaching the "retry without inlining" limit on one
path does not imply that inlining the function won't be valuable on
other paths -- so I hope that we can eventually replace it with more
"natural" limits of the analysis scope.

However, the brutal jump of analysis runtime significantly degrades the
user experience, so I created this quick workaround commit that
approximates the "don't inline" blacklist effects of opaque loops (where
the analyzer doesn't understand the loop condition) without fully
reverting the "Don't assume third iteration" commit (to avoid
reintroducing the false positives that were eliminated by it).
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Apr 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-clang

Author: Donát Nagy (NagyDonat)

Changes

Recently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high.

Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.)

Further profiling and investigation proved that this increase is caused by an increase of analysis scope because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the -analyzer-max-loop limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope.

However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).


Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári who helped me in profiling.

My preliminary measurements confirm that this commit does improve the runtime: I checked the analysis of window-copy.c from tmux where the analysis takes roughly:

  • 3 minutes on a recent main revision (the parent of this commit);
  • 50 seconds if the "Don't assume third iteration" commit is reverted;
  • 40 seconds if this commit is applied.
    In the upcoming days I'll do more accurate time measurements and check the impact of this change on the quality and quantity of the results.

Full diff: https://github.com/llvm/llvm-project/pull/136720.diff

6 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+14)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h (-4)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+49-11)
  • (modified) clang/test/Analysis/analyzer-config.c (+1)
  • (added) clang/test/Analysis/legacy-inlining-prevention.c (+198)
  • (modified) clang/test/Analysis/loop-unrolling.cpp (+23-7)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index f9f22a9ced650..0ac85b14147e0 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -414,6 +414,20 @@ ANALYZER_OPTION(
     "any target.",
     true)
 
+ANALYZER_OPTION(
+    bool, LegacyInliningPrevention, "legacy-inlining-prevention",
+    "If enabled, the analyzer puts functions on a \"do not inline this\" list "
+    "if it finds an execution path within that function that may potentially "
+    "perform 'analyzer-max-loop' (= 4 by default) iterations in a loop. "
+    "Note that functions that _definitely_ reach the loop limit on some "
+    "execution path are currently marked as \"do not inline\" even if this "
+    "option is disabled (but this may change in future versions). This option "
+    "is a dumb and arbitrary restriction on inlining, but disabling it would "
+    "significantly increase the analysis workload (and the time taken) "
+    "compared to older clang versions because more top-level functions can "
+    "use up their 'max-nodes' limit if inlining is not restricted.",
+    true)
+
 //===----------------------------------------------------------------------===//
 // Unsigned analyzer options.
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
index 3ee0d229cfc29..761395260a0cf 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
     I->second.MayInline = 0;
   }
 
-  void markReachedMaxBlockCount(const Decl *D) {
-    markShouldNotInline(D);
-  }
-
   std::optional<bool> mayInline(const Decl *D) {
     MapTy::const_iterator I = Map.find(D);
     if (I != Map.end() && I->second.InlineChecked)
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 86e2e8f634bfd..1ff36eb83dbe7 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2523,6 +2523,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
   return true;
 }
 
+/// Return the innermost location context which is inlined at `Node`, unless
+/// it's the top-level (entry point) location context.
+static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
+                                                        ExplodedGraph &G) {
+  const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
+  const LocationContext *RootLC =
+      (*G.roots_begin())->getLocation().getLocationContext();
+
+  if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
+    return nullptr;
+
+  return CalleeLC;
+}
+
 /// Block entrance.  (Update counters).
 void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
                                          NodeBuilderWithSinks &nodeBuilder,
@@ -2570,21 +2584,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
     const ExplodedNode *Sink =
                    nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
 
-    // Check if we stopped at the top level function or not.
-    // Root node should have the location context of the top most function.
-    const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
-    const LocationContext *CalleeSF = CalleeLC->getStackFrame();
-    const LocationContext *RootLC =
-                        (*G.roots_begin())->getLocation().getLocationContext();
-    if (RootLC->getStackFrame() != CalleeSF) {
-      Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
+    if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+      // FIXME: This will unconditionally prevent inlining this function (even
+      // from other entrypoints), which is not a reasonable heuristic: even if
+      // we reached max block count on this particular execution path, there
+      // may be other execution paths (especially with other parametrizations)
+      // where the analyzer can reach the end of the function (so there is no
+      // natural reason to avoid inlining it). However, disabling this would
+      // significantly increase the analysis time (because more entrypoints
+      // would exhaust their allocated budget), so it must be compensated by a
+      // different (more reasonable) reduction of analysis scope.
+      Engine.FunctionSummaries->markShouldNotInline(
+          LC->getStackFrame()->getDecl());
 
       // Re-run the call evaluation without inlining it, by storing the
       // no-inlining policy in the state and enqueuing the new work item on
       // the list. Replay should almost never fail. Use the stats to catch it
       // if it does.
-      if ((!AMgr.options.NoRetryExhausted &&
-           replayWithoutInlining(Pred, CalleeLC)))
+      if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
         return;
       NumMaxBlockCountReachedInInlined++;
     } else
@@ -2856,8 +2873,29 @@ void ExprEngine::processBranch(
       // conflicts with the widen-loop analysis option (which is off by
       // default). If we intend to support and stabilize the loop widening,
       // we must ensure that it 'plays nicely' with this logic.
-      if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
+      if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
         Builder.generateNode(StTrue, true, PredN);
+      } else if (AMgr.options.LegacyInliningPrevention) {
+        // FIXME: There is an ancient and very arbitrary heuristic in
+        // `ExprEngine::processCFGBlockEntrance` which prevents all further
+        // inlining of a function if it finds an execution path within that
+        // function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
+        // `analyzer-max-loop`, by default four iterations in a loop). Adding
+        // this "don't assume third iteration" logic significantly increased
+        // the analysis runtime on some inputs because less functions were
+        // arbitrarily excluded from being inlined, so more entrypoints used
+        // up their full allocated budget. As a hacky compensation for this,
+        // here we apply the "should not inline" mark in cases when the loop
+        // could potentially reach the `MaxBlockVisitOnPath` limit without the
+        // "don't assume third iteration" logic. This slightly overcompensates
+        // (activates if the third iteration can be entered, and will not
+        // recognize cases where the fourth iteration would't be completed), but
+        // should be good enough for practical purposes.
+        if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+          Engine.FunctionSummaries->markShouldNotInline(
+              LC->getStackFrame()->getDecl());
+        }
+      }
     }
 
     if (StFalse) {
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 80cad54b039f4..42c622f2cd51b 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -92,6 +92,7 @@
 // CHECK-NEXT: inline-lambdas = true
 // CHECK-NEXT: ipa = dynamic-bifurcate
 // CHECK-NEXT: ipa-always-inline-size = 3
+// CHECK-NEXT: legacy-inlining-prevention = true
 // CHECK-NEXT: max-inlinable-size = 100
 // CHECK-NEXT: max-nodes = 225000
 // CHECK-NEXT: max-symbol-complexity = 35
diff --git a/clang/test/Analysis/legacy-inlining-prevention.c b/clang/test/Analysis/legacy-inlining-prevention.c
new file mode 100644
index 0000000000000..976e2ba6ddb5d
--- /dev/null
+++ b/clang/test/Analysis/legacy-inlining-prevention.c
@@ -0,0 +1,198 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config legacy-inlining-prevention=false -verify=expected,disabled %s
+
+int getNum(void); // Get an opaque number.
+
+void clang_analyzer_numTimesReached(void);
+void clang_analyzer_dump(int arg);
+
+//-----------------------------------------------------------------------------
+// Simple case: inlined function never reaches `analyzer-max-loop`.
+
+int inner_simple(void) {
+  clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  return 42;
+}
+
+int outer_simple(void) {
+  int x = inner_simple();
+  int y = inner_simple();
+  return 53 / (x - y); // expected-warning {{Division by zero}}
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`.
+
+int inner_fixed_loop_1(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  for (i = 0; i < 10; i++);
+  clang_analyzer_numTimesReached(); // no-warning
+  return 42;
+}
+
+int outer_fixed_loop_1(void) {
+  int x = inner_fixed_loop_1();
+  int y = inner_fixed_loop_1();
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
+// even for different entry points.
+// This test uses `clang_analyzer_dump` and distinct `arg` values because
+// `clang_analyzer_numTimesReached` only counts the paths reaching that node
+// during the analysis of one particular entry point, so it cannot distinguish
+// "two entry points reached this, both with one path" (where the two reports
+// are unified as duplicates by the generic report postprocessing) and "one
+// entry point reached this with one path" (where naturally nothing shows that
+// the second entry point _tried_ to reach it).
+
+int inner_fixed_loop_2(int arg) {
+  // Identical copy of inner_fixed_loop_1
+  int i;
+  clang_analyzer_dump(arg); // expected-warning {{2}}
+  for (i = 0; i < 10; i++);
+  clang_analyzer_dump(arg); // no-warning
+  return 42;
+}
+
+int outer_1_fixed_loop_2(void) {
+  return inner_fixed_loop_2(1);
+}
+
+int outer_2_fixed_loop_2(void) {
+  return inner_fixed_loop_2(2);
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` only in its second call. The
+// function is inlined twice but the second call doesn't finish and ends up
+// being conservatively evaluated.
+
+int inner_parametrized_loop_1(int count) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  for (i = 0; i < count; i++);
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  return 42;
+}
+
+int outer_parametrized_loop_1(void) {
+  int x = inner_parametrized_loop_1(2);
+  int y = inner_parametrized_loop_1(10);
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` on its first call, so the
+// second call isn't inlined (although it could be fully evaluated).
+
+int inner_parametrized_loop_2(int count) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  for (i = 0; i < count; i++);
+  clang_analyzer_numTimesReached(); // no-warning
+  return 42;
+}
+
+int outer_parametrized_loop_2(void) {
+  int y = inner_parametrized_loop_2(10);
+  int x = inner_parametrized_loop_2(2);
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function may or may not reach `analyzer-max-loop` depending on an
+// opaque check before the loop. This is very similar to the "fixed loop"
+// cases: the function is placed on the "don't inline" list when any execution
+// path reaches `analyzer-max-loop` (even if other execution paths reach the
+// end of the function).
+
+int inner_conditional_loop(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  if (getNum() == 777) {
+    for (i = 0; i < 10; i++);
+  }
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  return 42;
+}
+
+int outer_1_conditional_loop(void) {
+  return inner_conditional_loop();
+}
+
+int outer_2_conditional_loop(void) {
+  return inner_conditional_loop();
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function executes an opaque loop that may or may not reach
+// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
+// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
+// `conditional_loop` cases: the analyzer was able to find a path reaching
+// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
+// does not _assume_ a third (or later) iteration (i.e. does not enter those
+// iterations if the loop condition is an unknown value), so e.g. this test
+// function does not reach `analyzer-max-loop` iterations and the inlining is
+// not disabled.
+// Unfortunately this change significantly increased the workload and
+// runtime of the analyzer (more entry points used up their budget), so the
+// option `legacy-inlining-prevention` was introduced and enabled by default to
+// suppress the inlining in situations where the "don't assume third iteration"
+// logic activates.
+// This testcase demonstrate that the inlining is prevented with the default
+// `legacy-inlining-prevention=true` config, but is not prevented when this
+// option is disabled (set to false).
+
+int inner_opaque_loop_1(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // default-warning {{1}} disabled-warning {{2}}
+  for (i = 0; i < getNum(); i++);
+  return i;
+}
+
+int outer_opaque_loop_1(void) {
+  int iterCount = inner_opaque_loop_1();
+
+  // The first call to `inner_opaque_loop_1()` splits three execution paths that
+  // differ in the number of performed iterations (0, 1 or 2). The function
+  // `inner_opaque_loop_1` is added to the "do not inline this" list when the
+  // path that performed two iterations tries to enter the third iteration (and
+  // the "don't assume third iteration" logic prevents this) -- but the other
+  // two paths (which performed 0 and 1 iterations) would reach and inline the
+  // second `inner_opaque_loop_1()` before this would happen (because the
+  // default traversal is a complex heuristic that happens to prefer this). The
+  // following `if` will discard these "early exit" paths to highlight the
+  // difference between the default and disabled state:
+  if (iterCount < 2)
+    return 0;
+
+  return inner_opaque_loop_1();
+}
+
+//-----------------------------------------------------------------------------
+// Another less contrived testcase that demonstrates the difference between the
+// enabled (default) and disabled state of `legacy-inlining-prevention`.
+// Here the two calls to `inner_opaque_loop_2()` are in different entry points
+// so the first call is fully analyzed (and can put the function on the "do
+// not inline" list) before reaching the second call.
+// This test uses `clang_analyzer_dump` because (as explained in an earlier
+// comment block) `clang_analyzer_numTimesReached` is not suitable for counting
+// visits from separate entry points.
+
+int inner_opaque_loop_2(int arg) {
+  int i;
+  clang_analyzer_dump(arg); // default-warning {{2}}
+                            // disabled-warning@-1 {{1}} disabled-warning@-1 {{2}}
+  for (i = 0; i < getNum(); i++);
+  return i;
+}
+
+int outer_1_opaque_loop_2(void) {
+  return inner_opaque_loop_2(1);
+}
+int outer_2_opaque_loop(void) {
+  return inner_opaque_loop_2(2);
+}
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index bf05a7739ce48..ebae81e000c7a 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
 
 void clang_analyzer_numTimesReached();
 void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
 }
 
 int simple_known_bound_loop() {
+  // Iteration count visible: can be unrolled and fully executed.
   for (int i = 2; i < 12; i++) {
     // This function is inlined in nested_inlined_unroll1()
     clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
 }
 
 int simple_unknown_bound_loop() {
+  // Iteration count unknown: unrolling won't happen and the execution will be
+  // split two times:
+  // (1) split between skipped loop (immediate exit) and entering the loop
+  // (2) split between exit after 1 iteration and entering the second iteration
+  // After these there is no third state split because the "don't assume third
+  // iteration" logic in `ExprEngine::processBranch` prevents it; but the
+  // `legacy-inlining-prevention` logic will put this function onto the list of
+  // functions that may not be inlined in the future.
+  // The exploration strategy apparently influences the number of times this
+  // function can be inlined before it's placed on the "don't inline" list.
   for (int i = 2; i < getNum(); i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{8}}
+    clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
   }
   return 0;
 }
 
 int nested_inlined_unroll1() {
+  // Here the analyzer can unroll and fully execute both the outer loop and the
+  // inner loop within simple_known_bound_loop().
   int k;
   for (int i = 0; i < 9; i++) {
     clang_analyzer_numTimesReached(); // expected-warning {{9}}
-    k = simple_known_bound_loop();    // no reevaluation without inlining
+    k = simple_known_bound_loop();
   }
   int a = 22 / k; // expected-warning {{Division by zero}}
   return 0;
 }
 
 int nested_inlined_no_unroll1() {
+  // Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
+  // iterations of the loop within this function, but some state splits happen
+  // in `simple_unknown_bound_loop()` calls.
   int k;
-  for (int i = 0; i < 9; i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{10}}
-    k = simple_unknown_bound_loop();  // reevaluation without inlining, splits the state as well
+  for (int i = 0; i < 40; i++) {
+    clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
+    k = simple_unknown_bound_loop(); 
   }
   int a = 22 / k; // no-warning
   return 0;

@llvmbot
Copy link
Member

llvmbot commented Apr 22, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Donát Nagy (NagyDonat)

Changes

Recently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high.

Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.)

Further profiling and investigation proved that this increase is caused by an increase of analysis scope because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the -analyzer-max-loop limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope.

However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).


Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári who helped me in profiling.

My preliminary measurements confirm that this commit does improve the runtime: I checked the analysis of window-copy.c from tmux where the analysis takes roughly:

  • 3 minutes on a recent main revision (the parent of this commit);
  • 50 seconds if the "Don't assume third iteration" commit is reverted;
  • 40 seconds if this commit is applied.
    In the upcoming days I'll do more accurate time measurements and check the impact of this change on the quality and quantity of the results.

Full diff: https://github.com/llvm/llvm-project/pull/136720.diff

6 Files Affected:

  • (modified) clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def (+14)
  • (modified) clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h (-4)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+49-11)
  • (modified) clang/test/Analysis/analyzer-config.c (+1)
  • (added) clang/test/Analysis/legacy-inlining-prevention.c (+198)
  • (modified) clang/test/Analysis/loop-unrolling.cpp (+23-7)
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index f9f22a9ced650..0ac85b14147e0 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -414,6 +414,20 @@ ANALYZER_OPTION(
     "any target.",
     true)
 
+ANALYZER_OPTION(
+    bool, LegacyInliningPrevention, "legacy-inlining-prevention",
+    "If enabled, the analyzer puts functions on a \"do not inline this\" list "
+    "if it finds an execution path within that function that may potentially "
+    "perform 'analyzer-max-loop' (= 4 by default) iterations in a loop. "
+    "Note that functions that _definitely_ reach the loop limit on some "
+    "execution path are currently marked as \"do not inline\" even if this "
+    "option is disabled (but this may change in future versions). This option "
+    "is a dumb and arbitrary restriction on inlining, but disabling it would "
+    "significantly increase the analysis workload (and the time taken) "
+    "compared to older clang versions because more top-level functions can "
+    "use up their 'max-nodes' limit if inlining is not restricted.",
+    true)
+
 //===----------------------------------------------------------------------===//
 // Unsigned analyzer options.
 //===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
index 3ee0d229cfc29..761395260a0cf 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
     I->second.MayInline = 0;
   }
 
-  void markReachedMaxBlockCount(const Decl *D) {
-    markShouldNotInline(D);
-  }
-
   std::optional<bool> mayInline(const Decl *D) {
     MapTy::const_iterator I = Map.find(D);
     if (I != Map.end() && I->second.InlineChecked)
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 86e2e8f634bfd..1ff36eb83dbe7 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2523,6 +2523,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
   return true;
 }
 
+/// Return the innermost location context which is inlined at `Node`, unless
+/// it's the top-level (entry point) location context.
+static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
+                                                        ExplodedGraph &G) {
+  const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
+  const LocationContext *RootLC =
+      (*G.roots_begin())->getLocation().getLocationContext();
+
+  if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
+    return nullptr;
+
+  return CalleeLC;
+}
+
 /// Block entrance.  (Update counters).
 void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
                                          NodeBuilderWithSinks &nodeBuilder,
@@ -2570,21 +2584,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
     const ExplodedNode *Sink =
                    nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
 
-    // Check if we stopped at the top level function or not.
-    // Root node should have the location context of the top most function.
-    const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
-    const LocationContext *CalleeSF = CalleeLC->getStackFrame();
-    const LocationContext *RootLC =
-                        (*G.roots_begin())->getLocation().getLocationContext();
-    if (RootLC->getStackFrame() != CalleeSF) {
-      Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
+    if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+      // FIXME: This will unconditionally prevent inlining this function (even
+      // from other entrypoints), which is not a reasonable heuristic: even if
+      // we reached max block count on this particular execution path, there
+      // may be other execution paths (especially with other parametrizations)
+      // where the analyzer can reach the end of the function (so there is no
+      // natural reason to avoid inlining it). However, disabling this would
+      // significantly increase the analysis time (because more entrypoints
+      // would exhaust their allocated budget), so it must be compensated by a
+      // different (more reasonable) reduction of analysis scope.
+      Engine.FunctionSummaries->markShouldNotInline(
+          LC->getStackFrame()->getDecl());
 
       // Re-run the call evaluation without inlining it, by storing the
       // no-inlining policy in the state and enqueuing the new work item on
       // the list. Replay should almost never fail. Use the stats to catch it
       // if it does.
-      if ((!AMgr.options.NoRetryExhausted &&
-           replayWithoutInlining(Pred, CalleeLC)))
+      if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
         return;
       NumMaxBlockCountReachedInInlined++;
     } else
@@ -2856,8 +2873,29 @@ void ExprEngine::processBranch(
       // conflicts with the widen-loop analysis option (which is off by
       // default). If we intend to support and stabilize the loop widening,
       // we must ensure that it 'plays nicely' with this logic.
-      if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
+      if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
         Builder.generateNode(StTrue, true, PredN);
+      } else if (AMgr.options.LegacyInliningPrevention) {
+        // FIXME: There is an ancient and very arbitrary heuristic in
+        // `ExprEngine::processCFGBlockEntrance` which prevents all further
+        // inlining of a function if it finds an execution path within that
+        // function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
+        // `analyzer-max-loop`, by default four iterations in a loop). Adding
+        // this "don't assume third iteration" logic significantly increased
+        // the analysis runtime on some inputs because less functions were
+        // arbitrarily excluded from being inlined, so more entrypoints used
+        // up their full allocated budget. As a hacky compensation for this,
+        // here we apply the "should not inline" mark in cases when the loop
+        // could potentially reach the `MaxBlockVisitOnPath` limit without the
+        // "don't assume third iteration" logic. This slightly overcompensates
+        // (activates if the third iteration can be entered, and will not
+        // recognize cases where the fourth iteration would't be completed), but
+        // should be good enough for practical purposes.
+        if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+          Engine.FunctionSummaries->markShouldNotInline(
+              LC->getStackFrame()->getDecl());
+        }
+      }
     }
 
     if (StFalse) {
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 80cad54b039f4..42c622f2cd51b 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -92,6 +92,7 @@
 // CHECK-NEXT: inline-lambdas = true
 // CHECK-NEXT: ipa = dynamic-bifurcate
 // CHECK-NEXT: ipa-always-inline-size = 3
+// CHECK-NEXT: legacy-inlining-prevention = true
 // CHECK-NEXT: max-inlinable-size = 100
 // CHECK-NEXT: max-nodes = 225000
 // CHECK-NEXT: max-symbol-complexity = 35
diff --git a/clang/test/Analysis/legacy-inlining-prevention.c b/clang/test/Analysis/legacy-inlining-prevention.c
new file mode 100644
index 0000000000000..976e2ba6ddb5d
--- /dev/null
+++ b/clang/test/Analysis/legacy-inlining-prevention.c
@@ -0,0 +1,198 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config legacy-inlining-prevention=false -verify=expected,disabled %s
+
+int getNum(void); // Get an opaque number.
+
+void clang_analyzer_numTimesReached(void);
+void clang_analyzer_dump(int arg);
+
+//-----------------------------------------------------------------------------
+// Simple case: inlined function never reaches `analyzer-max-loop`.
+
+int inner_simple(void) {
+  clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  return 42;
+}
+
+int outer_simple(void) {
+  int x = inner_simple();
+  int y = inner_simple();
+  return 53 / (x - y); // expected-warning {{Division by zero}}
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`.
+
+int inner_fixed_loop_1(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  for (i = 0; i < 10; i++);
+  clang_analyzer_numTimesReached(); // no-warning
+  return 42;
+}
+
+int outer_fixed_loop_1(void) {
+  int x = inner_fixed_loop_1();
+  int y = inner_fixed_loop_1();
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
+// even for different entry points.
+// This test uses `clang_analyzer_dump` and distinct `arg` values because
+// `clang_analyzer_numTimesReached` only counts the paths reaching that node
+// during the analysis of one particular entry point, so it cannot distinguish
+// "two entry points reached this, both with one path" (where the two reports
+// are unified as duplicates by the generic report postprocessing) and "one
+// entry point reached this with one path" (where naturally nothing shows that
+// the second entry point _tried_ to reach it).
+
+int inner_fixed_loop_2(int arg) {
+  // Identical copy of inner_fixed_loop_1
+  int i;
+  clang_analyzer_dump(arg); // expected-warning {{2}}
+  for (i = 0; i < 10; i++);
+  clang_analyzer_dump(arg); // no-warning
+  return 42;
+}
+
+int outer_1_fixed_loop_2(void) {
+  return inner_fixed_loop_2(1);
+}
+
+int outer_2_fixed_loop_2(void) {
+  return inner_fixed_loop_2(2);
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` only in its second call. The
+// function is inlined twice but the second call doesn't finish and ends up
+// being conservatively evaluated.
+
+int inner_parametrized_loop_1(int count) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{2}}
+  for (i = 0; i < count; i++);
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  return 42;
+}
+
+int outer_parametrized_loop_1(void) {
+  int x = inner_parametrized_loop_1(2);
+  int y = inner_parametrized_loop_1(10);
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` on its first call, so the
+// second call isn't inlined (although it could be fully evaluated).
+
+int inner_parametrized_loop_2(int count) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  for (i = 0; i < count; i++);
+  clang_analyzer_numTimesReached(); // no-warning
+  return 42;
+}
+
+int outer_parametrized_loop_2(void) {
+  int y = inner_parametrized_loop_2(10);
+  int x = inner_parametrized_loop_2(2);
+  return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function may or may not reach `analyzer-max-loop` depending on an
+// opaque check before the loop. This is very similar to the "fixed loop"
+// cases: the function is placed on the "don't inline" list when any execution
+// path reaches `analyzer-max-loop` (even if other execution paths reach the
+// end of the function).
+
+int inner_conditional_loop(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  if (getNum() == 777) {
+    for (i = 0; i < 10; i++);
+  }
+  clang_analyzer_numTimesReached(); // expected-warning {{1}}
+  return 42;
+}
+
+int outer_1_conditional_loop(void) {
+  return inner_conditional_loop();
+}
+
+int outer_2_conditional_loop(void) {
+  return inner_conditional_loop();
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function executes an opaque loop that may or may not reach
+// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
+// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
+// `conditional_loop` cases: the analyzer was able to find a path reaching
+// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
+// does not _assume_ a third (or later) iteration (i.e. does not enter those
+// iterations if the loop condition is an unknown value), so e.g. this test
+// function does not reach `analyzer-max-loop` iterations and the inlining is
+// not disabled.
+// Unfortunately this change significantly increased the workload and
+// runtime of the analyzer (more entry points used up their budget), so the
+// option `legacy-inlining-prevention` was introduced and enabled by default to
+// suppress the inlining in situations where the "don't assume third iteration"
+// logic activates.
+// This testcase demonstrate that the inlining is prevented with the default
+// `legacy-inlining-prevention=true` config, but is not prevented when this
+// option is disabled (set to false).
+
+int inner_opaque_loop_1(void) {
+  int i;
+  clang_analyzer_numTimesReached(); // default-warning {{1}} disabled-warning {{2}}
+  for (i = 0; i < getNum(); i++);
+  return i;
+}
+
+int outer_opaque_loop_1(void) {
+  int iterCount = inner_opaque_loop_1();
+
+  // The first call to `inner_opaque_loop_1()` splits three execution paths that
+  // differ in the number of performed iterations (0, 1 or 2). The function
+  // `inner_opaque_loop_1` is added to the "do not inline this" list when the
+  // path that performed two iterations tries to enter the third iteration (and
+  // the "don't assume third iteration" logic prevents this) -- but the other
+  // two paths (which performed 0 and 1 iterations) would reach and inline the
+  // second `inner_opaque_loop_1()` before this would happen (because the
+  // default traversal is a complex heuristic that happens to prefer this). The
+  // following `if` will discard these "early exit" paths to highlight the
+  // difference between the default and disabled state:
+  if (iterCount < 2)
+    return 0;
+
+  return inner_opaque_loop_1();
+}
+
+//-----------------------------------------------------------------------------
+// Another less contrived testcase that demonstrates the difference between the
+// enabled (default) and disabled state of `legacy-inlining-prevention`.
+// Here the two calls to `inner_opaque_loop_2()` are in different entry points
+// so the first call is fully analyzed (and can put the function on the "do
+// not inline" list) before reaching the second call.
+// This test uses `clang_analyzer_dump` because (as explained in an earlier
+// comment block) `clang_analyzer_numTimesReached` is not suitable for counting
+// visits from separate entry points.
+
+int inner_opaque_loop_2(int arg) {
+  int i;
+  clang_analyzer_dump(arg); // default-warning {{2}}
+                            // disabled-warning@-1 {{1}} disabled-warning@-1 {{2}}
+  for (i = 0; i < getNum(); i++);
+  return i;
+}
+
+int outer_1_opaque_loop_2(void) {
+  return inner_opaque_loop_2(1);
+}
+int outer_2_opaque_loop(void) {
+  return inner_opaque_loop_2(2);
+}
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index bf05a7739ce48..ebae81e000c7a 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
 
 void clang_analyzer_numTimesReached();
 void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
 }
 
 int simple_known_bound_loop() {
+  // Iteration count visible: can be unrolled and fully executed.
   for (int i = 2; i < 12; i++) {
     // This function is inlined in nested_inlined_unroll1()
     clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
 }
 
 int simple_unknown_bound_loop() {
+  // Iteration count unknown: unrolling won't happen and the execution will be
+  // split two times:
+  // (1) split between skipped loop (immediate exit) and entering the loop
+  // (2) split between exit after 1 iteration and entering the second iteration
+  // After these there is no third state split because the "don't assume third
+  // iteration" logic in `ExprEngine::processBranch` prevents it; but the
+  // `legacy-inlining-prevention` logic will put this function onto the list of
+  // functions that may not be inlined in the future.
+  // The exploration strategy apparently influences the number of times this
+  // function can be inlined before it's placed on the "don't inline" list.
   for (int i = 2; i < getNum(); i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{8}}
+    clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
   }
   return 0;
 }
 
 int nested_inlined_unroll1() {
+  // Here the analyzer can unroll and fully execute both the outer loop and the
+  // inner loop within simple_known_bound_loop().
   int k;
   for (int i = 0; i < 9; i++) {
     clang_analyzer_numTimesReached(); // expected-warning {{9}}
-    k = simple_known_bound_loop();    // no reevaluation without inlining
+    k = simple_known_bound_loop();
   }
   int a = 22 / k; // expected-warning {{Division by zero}}
   return 0;
 }
 
 int nested_inlined_no_unroll1() {
+  // Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
+  // iterations of the loop within this function, but some state splits happen
+  // in `simple_unknown_bound_loop()` calls.
   int k;
-  for (int i = 0; i < 9; i++) {
-    clang_analyzer_numTimesReached(); // expected-warning {{10}}
-    k = simple_unknown_bound_loop();  // reevaluation without inlining, splits the state as well
+  for (int i = 0; i < 40; i++) {
+    clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
+    k = simple_unknown_bound_loop(); 
   }
   int a = 22 / k; // no-warning
   return 0;

// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards eliminating the alternative RUN: line that uses dfs because the only difference between dfs and the default exploration algorithm is a "control group" testcase where unroll-loops does not activate.

I'll probably do so in a separate follow-up commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you wish.

Copy link
Contributor

@balazs-benics-sonarsource balazs-benics-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating this. At Sonar, we have not yet started the upgrade to clang-20. I suppose, you already did then, thus found this regression on trunk.

Maybe we should also reflect of the quality control of our submissions of core changes too, but let's leave that after the PR is discussed, and we put out the fire.

My impression is that we shouldn't have a new flag for this, we should unconditionally apply this for now. Did you think about this?

I left a few other comments about testing but overall I'm all for this change.
Thanks again!

Comment on lines 2878 to 2898
} else if (AMgr.options.LegacyInliningPrevention) {
// FIXME: There is an ancient and very arbitrary heuristic in
// `ExprEngine::processCFGBlockEntrance` which prevents all further
// inlining of a function if it finds an execution path within that
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
// `analyzer-max-loop`, by default four iterations in a loop). Adding
// this "don't assume third iteration" logic significantly increased
// the analysis runtime on some inputs because less functions were
// arbitrarily excluded from being inlined, so more entrypoints used
// up their full allocated budget. As a hacky compensation for this,
// here we apply the "should not inline" mark in cases when the loop
// could potentially reach the `MaxBlockVisitOnPath` limit without the
// "don't assume third iteration" logic. This slightly overcompensates
// (activates if the third iteration can be entered, and will not
// recognize cases where the fourth iteration would't be completed), but
// should be good enough for practical purposes.
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
Engine.FunctionSummaries->markShouldNotInline(
LC->getStackFrame()->getDecl());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if we would apply this chunk unconditionally? I don't think anyone would want to regress 6x voluntarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this behind a flag (which restores the original behavior by default) because it is not just "wasted" runtime, but offers a tradeoff between runtime and the depth/accuracy of the analysis. There are several other options like max-nodes or the high-level mode setting that offer analogous tradeoffs. However, I will refine this judgement when I check the result produced with and without this patch -- and if the difference is negligible, I'm more open to applying this chunk unconditionally.

On a long term we will probably need a heuristic to "avoid inlining large functions", but I strongly suspect that this loop-based hacky condition is not the "right" (natural, straightforward, robust etc.) way of formalizing this idea. I feel that the current bounds of the analysis scope are almost "accidental" (or at least I don't know about any measurements that justify e.g. the default max-nodes budget value or the various "don't inline if ..." conditions) and I hope that somewhen I could spend some time on understanding and improving this area.

Right now the outline of my plan roughly looks like:

  • (1) Publish this workaround ASAP to restore the "old normal" runtimes and minimize the disruption.
  • (2) Investigate the (runtime) costs and benefits of various inlining or exploration strategies, and publish alternative heuristics that express "large functions" in different, more natural terms (e.g. block count, total # of nodes created etc. instead of "did it reach four iterations in a loop").
  • (3) Adjust the default config: disable this legacy-inlining-prevention by default and instead enable other heuristics which on average guarantee similar total runtime, but perhaps provide better coverage / more useful results / better robustness, ...
    • Even then the users would be able to return to this legacy config if their project is an outlier where this particular heuristic was especially aggressive and they liked it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, if Ericsson will ship this true, and we are also going to ship this with true, then who would want to disable this? Do you expect users disabling this? Would anyone complain if we would just land this fix without an option for opting out?
Speaking of arbitrary cut heuristics, maybe they are arbitrary, but that should only make use more careful of changing them because continuity itself is valuable - but I'm not opposed, just cautions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure we should have legacy in the name of something that is enabled by default.

Copy link
Contributor Author

@NagyDonat NagyDonat Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, if Ericsson will ship this true, and we are also going to ship this with true, then who would want to disable this? Do you expect users disabling this?

Good point.

Would anyone complain if we would just land this fix without an option for opting out?

I felt that the logic introduced by this commit is so ugly that it should not be unconditionally part of the analyzer behavior (especially if it is git blamed to me...). By introducing this option I wanted to express that "this is not a stable permanent part of the analyzer, please don't rely on its existence" -- and I felt that yet another "FIXME" comment is not enough for this hack.

However, I can accept removing this option (and making the hack unconditional) if you strongly prefer that.

Also I am not sure we should have legacy in the name of something that is enabled by default.

When I wrote this change, I hoped that this config option could be switched off and deprecated in the foreseeable future (in fact it was disabled by default in the first draft of the commit).

Also, this heuristic is so crazy that I couldn't give a descriptive option name (do-not-inline-function-if-it-would-assume-third-iteration-in-a-loop is too much even compared to the other option names), so I named it "legacy" to say that it's connected to some old awkward heuristic.

int i;
clang_analyzer_numTimesReached(); // expected-warning {{1}}
for (i = 0; i < 10; i++);
clang_analyzer_numTimesReached(); // no-warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
clang_analyzer_numTimesReached(); // no-warning
clang_analyzer_numTimesReached(); // FIXME: It should be reachable

Copy link
Contributor Author

@NagyDonat NagyDonat Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be careful with this FIXME -- if somebody follows it and fixes this issue (e.g. by enabling unroll-loops by default) we could get yet another large slowdown when less loops hit the analyzer-max-loop limit, less functions end on the "don't inline" list and more entry points spend their budget.

Obviously in an ideal world an empty loop like this shouldn't stop the analyzer – but we must replace this awkward loop-based inlining restriction with a more robust heuristic before we can do that. (I don't want to spread "older dumber analyzer would give up here, don't inline this function" hacks in the codebase as we improve the loop handling.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get the argument why would be a risk of having a FIXME here. It's the reviewer's and maintainer's responsibility to ensure the changes align with the long term directions of the project.
If someone would just override analyzer-max-loop for this test "to fix the FIXME", that should be rejected.
However, if we found a way to enable it by default, or have a smarter engine, then it's fine - but that would also touch files beyond this single test file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the situation is complex because:

  • from the POV of the users this "analysis is stopped by long loops" behavior is obviously a bug that needs to be fixed;
  • but from the POV of the inlining prevention heuristic (which is tested here) this is a known feature (or at least "property") of the analyzer engine which is important for the correct delineation of the scope.

You're right that these deserve a FIXME, but I'll probably write a longer FIXME block that explains the full situation instead of short inline FIXME notes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines 26 to 28
int inner_fixed_loop_1(void) {
int i;
clang_analyzer_numTimesReached(); // expected-warning {{1}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass a const char* from a string literal to differentiate which is inlined when observing its value using a clang_analyzer_dump() at the beginning of this inlined function. That should make it clear that for the first time its inlined, but for the second time it's not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can switch all testcases to using clang_analyzer_dump(arg) instead of numTimesReached to highlight which calls were inlined (and not just their count). There are already a few testcases where I use that solution (because I need it for multiple entry points), but you're right that it would be helpful in the other cases as well.

Comment on lines 35 to 36
int x = inner_fixed_loop_1();
int y = inner_fixed_loop_1();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have comments here that first it's inlined, second it's not because of the given heuristic.

int outer_fixed_loop_1(void) {
int x = inner_fixed_loop_1();
int y = inner_fixed_loop_1();
return 53 / (x - y); // no-warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My problem with these no-warnings in general in this PR that it documents what the test currently does, but what they should document what the tests should/could expect.
In this case in an ideal world, we should actually get a diagnostic, thus the desired outcome is not a no-warning.
Consequently, a FIXME would be more appropriate I think.

int i;
clang_analyzer_dump(arg); // expected-warning {{2}}
for (i = 0; i < 10; i++);
clang_analyzer_dump(arg); // no-warning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME.

@balazs-benics-sonarsource
Copy link
Contributor

One other note. We should backport this fix to clang-20 once it lands to main.

Note that we need to expect e.g. {{2 S32}} because the pattern {{2}}
would match any other 32-bit integer as well.

The last two testcases in the file were testing the same thing in
different ways, so this commit only keeps the one that used
`clang_analyzer_dump` (which was previously an exception).
@NagyDonat
Copy link
Contributor Author

NagyDonat commented Apr 24, 2025

I tested the impact of this commit on various open source projects and the changes in the result counts are surprisingly small:

Project New Reports Resolved Reports
memcached 0 new reports 0 resolved reports
tmux 2 new reports 6 resolved reports
curl 0 new reports 0 resolved reports
twin 3 new reports 2 resolved reports
vim 11 new reports 18 resolved reports
openssl 0 new reports 1 resolved reports
sqlite 28 new reports 8 resolved reports
ffmpeg 22 new reports 23 resolved reports
postgres 3 new reports 20 resolved reports
tinyxml2 0 new reports 0 resolved reports
libwebm 1 new reports 0 resolved reports
xerces 3 new reports 5 resolved reports
bitcoin 0 new reports 0 resolved reports
protobuf 2 new reports 3 resolved reports
qtbase 27 new reports 23 resolved reports
openrct2 0 new reports 7 resolved reports

Unfortunately there was some trouble with our publicly available "demo" CodeChecker server (where we usually present the analysis results), so I had to store these results to an internal server and they are not available externally. (If somebody is interested, I can give priority to finding a way to publish the full report set.)

@balazs-benics-sonarsource
Copy link
Contributor

Do you have data about the analysis times per file, and per analysis entry point?
Compared against the current llvm main, and also if this workaround would restore the original running times before bb27d5e?

"Note that functions that _definitely_ reach the loop limit on some "
"execution path are currently marked as \"do not inline\" even if this "
"option is disabled (but this may change in future versions). This option "
"is a dumb and arbitrary restriction on inlining, but disabling it would "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd remove "dumb" from the user facing string.

Copy link
Contributor Author

@NagyDonat NagyDonat Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I was annoyed by the situation when I wrote this description, but I was already planning to rewrite these before finalizing the commit.

EDIT: done.

Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
// FIXME: This will unconditionally prevent inlining this function (even
// from other entrypoints), which is not a reasonable heuristic: even if
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think entry points are two distinct words.

Copy link
Contributor Author

@NagyDonat NagyDonat Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this! For some reason I tend to automatically write "entrypoint"; before uploading the PR I fixed it in many locations, but apparently I failed to catch this one...

EDIT: fixed this and two other occurrences.

Comment on lines 2878 to 2898
} else if (AMgr.options.LegacyInliningPrevention) {
// FIXME: There is an ancient and very arbitrary heuristic in
// `ExprEngine::processCFGBlockEntrance` which prevents all further
// inlining of a function if it finds an execution path within that
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
// `analyzer-max-loop`, by default four iterations in a loop). Adding
// this "don't assume third iteration" logic significantly increased
// the analysis runtime on some inputs because less functions were
// arbitrarily excluded from being inlined, so more entrypoints used
// up their full allocated budget. As a hacky compensation for this,
// here we apply the "should not inline" mark in cases when the loop
// could potentially reach the `MaxBlockVisitOnPath` limit without the
// "don't assume third iteration" logic. This slightly overcompensates
// (activates if the third iteration can be entered, and will not
// recognize cases where the fourth iteration would't be completed), but
// should be good enough for practical purposes.
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
Engine.FunctionSummaries->markShouldNotInline(
LC->getStackFrame()->getDecl());
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure we should have legacy in the name of something that is enabled by default.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Apr 24, 2025

@balazs-benics-sonarsource Unfortunately our current CI (which runs the static analyzer via CodeChecker) cannot provide file-level or entrypoint-level timing information, but I do have runtime measurements of the full analysis (on our Jenkins node with 16 parallel threads):

Project legacy-inlining-prevention=false legacy-inlining-prevention=true speedup
memcached 22.92 22.97 0%
tmux 170.30 80.04 53%
curl 133.18 123.04 8%
twin 48.47 45.71 6%
vim 168.95 155.50 8%
openssl 272.56 254.20 7%
sqlite 1046.26 842.43 19%
ffmpeg 1210.35 1178.02 3%
postgres 391.50 323.61 17%
tinyxml2 44.60 29.61 34%
libwebm 127.05 65.92 48%
xerces 230.21 172.86 25%
bitcoin 1303.82 935.25 28%
protobuf 2180.02 1950.36 11%
qtbase 3014.14 2655.59 12%
openrct2 835.05 763.86 9%

Both measurements were performed on the commit under review with different config: legacy-inlining-prevention=false (which is practically equivalent to the recent main revision a84a6f7) and legacy-inlining-prevention=true (the default in this patch). The second and third column contains the time in seconds; while the "speedup" column measures the effect of enabling this commit.

If needed, I can measure timing differences compared to other commits as well. I think the most natural comparison would be between

  • the commit under review with its default legacy-inlining-prevention=true mode

and

  • reverting bb27d5e ("Don't assume third iteration") atop a84a6f7 (the parent of the commit under review)

because that way the comparison wouldn't be influenced by other changes that were merged in the last few months. EDIT: I started a CI job to compare these two revisions, I'll post results tomorrow.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Apr 24, 2025

I pushed two commits (including d57946b which is hidden among the earlier review comments based on its creation time), and these in theory address most review comments, with the exception of two areas:

  • Writing a detailed FIXME note about the "analysis is interrupted if it reaches four iterations" question.
  • Changing this hack from "enabled by default config" to "unconditionally enabled" -- or just renaming the config.

I'll try to address these tomorrow.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Apr 25, 2025

Execution time comparisons between the commit under review and just reverting bb27d5e ("Don't assume third iteration") atop a84a6f7 (the parent of the commit under review):

Project just revert commit under review Ratio
memcached 27.50 25.68 93.4%
tmux 79.49 73.16 92.0%
curl 133.26 117.82 88.4%
twin 54.28 48.40 89.2%
vim 221.86 165.80 74.7%
openssl 304.98 256.52 84.1%
sqlite 1039.42 870.52 83.8%
ffmpeg 1272.77 1190.92 93.6%
postgres 402.34 321.95 80.0%
tinyxml2 49.82 28.65 57.5%
libwebm 83.14 56.02 67.4%
xerces 202.53 175.92 86.9%
bitcoin 162.05 155.26 95.8%
protobuf 2367.55 1969.73 83.2%
qtbase 3022.01 2729.55 90.3%
openrct2 889.14 782.47 88.0%

🤔 The version with the commit under review is surprisingly fast and I don't exactly know why. My most plausible theory is that bb27d5e ("Don't assume third iteration") has two effects on the analysis runtime:

  • it inherently speeds up the analysis of loops (because some iterations are skipped);
  • it slows down the analysis because it affects the inlining heuristic and prevents functions from being placed on the inlining blacklist.

This would explain why just reverting that commit (which undoes both effects) produces slower analysis than applying the commit under review (which undoes the slowdown and keeps the speedup).

@balazs-benics-sonarsource
Copy link
Contributor

🤔 The version with the commit under review is surprisingly fast and I don't exactly know why. My most plausible theory is that bb27d5e ("Don't assume third iteration") has two effects on the analysis runtime:

  • it inherently speeds up the analysis of loops (because some iterations are skipped);
  • it slows down the analysis because it affects the inlining heuristic and prevents functions from being placed on the inlining blacklist.

This would explain why just reverting that commit (which undoes both effects) produces slower analysis than applying the commit under review (which undoes the slowdown and keeps the speedup).

This makes sense, and I was expecting even back in the day. But I was shocked that sometimes intuition fails, and we didn't check the RT for that patch. Now that you did the work, it leaves me in a lot more relaxed situation. Thanks!

@NagyDonat
Copy link
Contributor Author

But I was shocked that sometimes intuition fails, and we didn't check the RT for that patch.

This is a legacy project with many skeletons hidden in various closets... The main moral of this story is probably that we should be more paranoid about measuring runtime (I'm thinking about whether we can set up some regular or semi-regular runtime measurements in our CI), but this also shows why we should work on reducing the technical debt.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Apr 25, 2025

With the two recent commits I think I addressed all review comments except for the "on-by-default config option" vs "unconditionally active logic" question.

In this area (which was discussed in inline comments) I'm still leaning towards the currently implemented legacy-inlining-prevention config option (which is enabled by default), because this would provide very minor advantages in several areas:

  • The average slowdown caused by disabling this workaround seems to be around 10-15% on larger projects, which is painful, but not completely prohibitive (and I guess that it could be offset if the user e.g. reduces max-nodes by 20%).
  • If somebody happens to run into false negatives (or invalidation-based false positives) which are caused by this inlining prevention, we can suggest legacy-inlining-prevention=false as a workaround.
  • I hope that I will be able to develop other heuristics that prevent the inlining of complex functions based on more "natural" and robust conditions -- when those are available, some users may want to disable this and enable those instead. (Or, if they are proven to be better and we change the defaults, this option should remain as a disabled-by-default fallback for those who might want return to the earlier behavior.)
  • During the development and testing of those alternative heuristics it would be nice to have this config option to enable/disable this workaround.
  • Last, but perhaps not least, having a named config option will make it easier to speak about this code fragment (instead of "that hack where we disable inlining of a function if we would assume entering the third iteration in a loop").

If we keep the config option, we can also think about renaming it. I agree that legacy is a bit unfortunate in the name of an option that is on-by-default (at least for now), but I don't have a better idea. (Feel free to suggest names if you have!)

@balazs-benics-sonarsource @Xazax-hun WDYT about this area? Do you have any other question/remark?

@balazs-benics-sonarsource
Copy link
Contributor

I can see your points. I think they indeed moves the needle slightly for having this (or a similar) flag but barely.
I'd need to think about a better flag name, but right now I'm too busy for that.

@NagyDonat
Copy link
Contributor Author

I did yet another measurement where I measured the runtime of bb27d5e ("Don't assume third iteration") compared to its parent commit (i.e. the measurement that we should've done before merging it) and it turns out that on average it does not increase the runtime and tmux (where Endre observed a significant slowdown) is an extreme outlier:

Project without bb27d5e with bb27d5e Ratio
memcached 23.25 22.26 95.7%
tmux 67.47 146.70 217.4%
curl 116.58 111.74 95.8%
twin 42.81 40.33 94.2%
vim 164.54 154.02 93.6%
openssl 274.13 258.72 94.4%
sqlite 902.19 943.20 104.5%
ffmpeg 1161.76 1130.22 97.3%
postgres 367.47 335.01 91.2%
tinyxml2 36.11 45.79 126.8%
libwebm 75.85 64.33 84.8%
xerces 191.27 212.58 111.1%
bitcoin 159.05 159.09 100.0%
protobuf 2249.66 2108.13 93.7%
qtbase 2796.47 2812.94 100.6%
openrct2 946.48 880.49 93.0%
sum of all projects 9575.09 9425.55 98.4%

By the way Endre Fülöp picked tmux arbitrarily as a "random small open source project that is easy to build" when he tried to reproduce the slowdown reported by the user and as it immediately reproduced a significant slowdown, we assumed that this is an universal phenomenon -- but it turns out that this was just a coincidence.

These results show that most users won't experience a brutal slowdown (especially if they have a large varied codebase where the law of large numbers guarantees average results), which reduces the urgency of this PR a bit. (And I will rewrite the commit message, clarifying that the runtime increases are mostly concentrated on individual TUs and aren't too large on average.)

@NagyDonat NagyDonat changed the title [analyzer] Workaround for unintended slowdown (scope increase) [analyzer] Workaround for slowdown spikes (unintended scope increase) Apr 28, 2025
@NagyDonat
Copy link
Contributor Author

@balazs-benics-sonarsource Gentle ping

Copy link
Contributor

@balazs-benics-sonarsource balazs-benics-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. There were two points unaddressed:

  • Finding a name for the flag without the legacy- prefix
  • Find out if we can ever have multiple root nodes in an exploded graph.

@NagyDonat
Copy link
Contributor Author

NagyDonat commented May 6, 2025

I discussed the name with Endre (@gamesh411) in person and our best idea is inline-functions-with-ambiguous-loops which would be the opposite of the currently implemented legacy-inlining-prevention (i.e. the default would be inline-functions-with-ambiguous-loops=false which is equivalent to legacy-inlining-prevention=true).

(Alternatively, inline-functions-containing-ambiguous-loops or inline-functions-containing-loop-with-ambiguous-condition would be more accurate and even longer variants of this name.)

@balazs-benics-sonarsource @ anybody else
What do you think about this name? Would this be acceptable for you?

@balazs-benics-sonarsource
Copy link
Contributor

balazs-benics-sonarsource commented May 6, 2025

@Xazax-hun WDYT of the proposed alternative flag name? (as you challenged the currently proposed name)

@Xazax-hun
Copy link
Collaborator

inline-functions-with-ambiguous-loops sounds good to me. I don't think it makes sense to have legacy in the name of an on by default setting.

@balazs-benics-sonarsource
Copy link
Contributor

LGTM, given we use the new flag name.

Copy link
Contributor

@balazs-benics-sonarsource balazs-benics-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@NagyDonat NagyDonat merged commit 9600a12 into llvm:main May 12, 2025
11 checks passed
@balazs-benics-sonarsource
Copy link
Contributor

Thanks for the fix, I'll proceed with the backport if you still believe it's worthy. @NagyDonat

@NagyDonat
Copy link
Contributor Author

This is not an automatic "every project is affected" slowdown, but on smaller outlier projects like tmux the runtime may double, so I'd guess that it's reasonable to backport this commit (although I'm not familiar with the exact conditions when a commit should be backported). Thanks for doing the backport (if you decide to do it).

steakhal pushed a commit to steakhal/llvm-project that referenced this pull request May 12, 2025
…llvm#136720)

Recently some users reported that they observed large increases of
runtime (up to +600% on some translation units) when they upgraded to a
more recent (slightly patched, internal) clang version. Bisection
revealed that the bulk of this increase was probably caused by my
earlier commit bb27d5e ("Don't assume
third iteration in loops").

As I evaluated that earlier commit on several open source project, it
turns out that on average it's runtime-neutral (or slightly helpful: it
reduced the total analysis time by 1.5%) but it can cause runtime spikes
on some code: in particular it more than doubled the time to analyze
`tmux` (one of the smaller test projects).

Further profiling and investigation proved that these spikes were caused
by an _increase of analysis scope_ because there was an heuristic that
placed functions on a "don't inline this" blacklist if they reached the
`-analyzer-max-loop` limit (anywhere, on any one execution path) --
which became significantly rarer when my commit ensured the analyzer no
longer "just assumes" four iterations. (With more inlining significantly
more entry points use up their allocated budgets, which leads to the
increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is
unjustified and arbitrary, because reaching the "retry without inlining"
limit on one path does not imply that inlining the function won't be
valuable on other paths -- so I hope that we can eventually replace it
with more "natural" limits of the analysis scope.

However, the runtime increases are annoying for the users whose project
is affected, so I created this quick workaround commit that approximates
the "don't inline" blacklist effects of ambiguous loops (where the
analyzer doesn't understand the loop condition) without fully reverting
the "Don't assume third iteration" commit (to avoid reintroducing the
false positives that were eliminated by it).

Investigating this issue was a team effort: I'm grateful to Endre Fülöp
(gamesh411) who did the bisection and shared his time measurement setup,
and Gábor Tóthvári (tigbr) who helped me in profiling.

(cherry picked from commit 9600a12)
@steakhal
Copy link
Contributor

Backport requested in #139597

swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request May 13, 2025
…llvm#136720)

Recently some users reported that they observed large increases of
runtime (up to +600% on some translation units) when they upgraded to a
more recent (slightly patched, internal) clang version. Bisection
revealed that the bulk of this increase was probably caused by my
earlier commit bb27d5e ("Don't assume
third iteration in loops").

As I evaluated that earlier commit on several open source project, it
turns out that on average it's runtime-neutral (or slightly helpful: it
reduced the total analysis time by 1.5%) but it can cause runtime spikes
on some code: in particular it more than doubled the time to analyze
`tmux` (one of the smaller test projects).

Further profiling and investigation proved that these spikes were caused
by an _increase of analysis scope_ because there was an heuristic that
placed functions on a "don't inline this" blacklist if they reached the
`-analyzer-max-loop` limit (anywhere, on any one execution path) --
which became significantly rarer when my commit ensured the analyzer no
longer "just assumes" four iterations. (With more inlining significantly
more entry points use up their allocated budgets, which leads to the
increased runtime.)

I feel that this heuristic for the "don't inline" blacklist is
unjustified and arbitrary, because reaching the "retry without inlining"
limit on one path does not imply that inlining the function won't be
valuable on other paths -- so I hope that we can eventually replace it
with more "natural" limits of the analysis scope.

However, the runtime increases are annoying for the users whose project
is affected, so I created this quick workaround commit that approximates
the "don't inline" blacklist effects of ambiguous loops (where the
analyzer doesn't understand the loop condition) without fully reverting
the "Don't assume third iteration" commit (to avoid reintroducing the
false positives that were eliminated by it).

Investigating this issue was a team effort: I'm grateful to Endre Fülöp
(gamesh411) who did the bisection and shared his time measurement setup,
and Gábor Tóthvári (tigbr) who helped me in profiling.

(cherry picked from commit 9600a12)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants