-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[analyzer] Workaround for slowdown spikes (unintended scope increase) #136720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Recently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high. Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.) Further profiling and investigation proved that this increase is caused by an _increase of analysis scope_ because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the `-analyzer-max-loop` limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.) I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).
@llvm/pr-subscribers-clang Author: Donát Nagy (NagyDonat) ChangesRecently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high. Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.) Further profiling and investigation proved that this increase is caused by an increase of analysis scope because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it). Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári who helped me in profiling. My preliminary measurements confirm that this commit does improve the runtime: I checked the analysis of
Full diff: https://github.com/llvm/llvm-project/pull/136720.diff 6 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index f9f22a9ced650..0ac85b14147e0 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -414,6 +414,20 @@ ANALYZER_OPTION(
"any target.",
true)
+ANALYZER_OPTION(
+ bool, LegacyInliningPrevention, "legacy-inlining-prevention",
+ "If enabled, the analyzer puts functions on a \"do not inline this\" list "
+ "if it finds an execution path within that function that may potentially "
+ "perform 'analyzer-max-loop' (= 4 by default) iterations in a loop. "
+ "Note that functions that _definitely_ reach the loop limit on some "
+ "execution path are currently marked as \"do not inline\" even if this "
+ "option is disabled (but this may change in future versions). This option "
+ "is a dumb and arbitrary restriction on inlining, but disabling it would "
+ "significantly increase the analysis workload (and the time taken) "
+ "compared to older clang versions because more top-level functions can "
+ "use up their 'max-nodes' limit if inlining is not restricted.",
+ true)
+
//===----------------------------------------------------------------------===//
// Unsigned analyzer options.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
index 3ee0d229cfc29..761395260a0cf 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
I->second.MayInline = 0;
}
- void markReachedMaxBlockCount(const Decl *D) {
- markShouldNotInline(D);
- }
-
std::optional<bool> mayInline(const Decl *D) {
MapTy::const_iterator I = Map.find(D);
if (I != Map.end() && I->second.InlineChecked)
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 86e2e8f634bfd..1ff36eb83dbe7 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2523,6 +2523,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
return true;
}
+/// Return the innermost location context which is inlined at `Node`, unless
+/// it's the top-level (entry point) location context.
+static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
+ ExplodedGraph &G) {
+ const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
+ const LocationContext *RootLC =
+ (*G.roots_begin())->getLocation().getLocationContext();
+
+ if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
+ return nullptr;
+
+ return CalleeLC;
+}
+
/// Block entrance. (Update counters).
void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
NodeBuilderWithSinks &nodeBuilder,
@@ -2570,21 +2584,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
const ExplodedNode *Sink =
nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
- // Check if we stopped at the top level function or not.
- // Root node should have the location context of the top most function.
- const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
- const LocationContext *CalleeSF = CalleeLC->getStackFrame();
- const LocationContext *RootLC =
- (*G.roots_begin())->getLocation().getLocationContext();
- if (RootLC->getStackFrame() != CalleeSF) {
- Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
+ if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+ // FIXME: This will unconditionally prevent inlining this function (even
+ // from other entrypoints), which is not a reasonable heuristic: even if
+ // we reached max block count on this particular execution path, there
+ // may be other execution paths (especially with other parametrizations)
+ // where the analyzer can reach the end of the function (so there is no
+ // natural reason to avoid inlining it). However, disabling this would
+ // significantly increase the analysis time (because more entrypoints
+ // would exhaust their allocated budget), so it must be compensated by a
+ // different (more reasonable) reduction of analysis scope.
+ Engine.FunctionSummaries->markShouldNotInline(
+ LC->getStackFrame()->getDecl());
// Re-run the call evaluation without inlining it, by storing the
// no-inlining policy in the state and enqueuing the new work item on
// the list. Replay should almost never fail. Use the stats to catch it
// if it does.
- if ((!AMgr.options.NoRetryExhausted &&
- replayWithoutInlining(Pred, CalleeLC)))
+ if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
return;
NumMaxBlockCountReachedInInlined++;
} else
@@ -2856,8 +2873,29 @@ void ExprEngine::processBranch(
// conflicts with the widen-loop analysis option (which is off by
// default). If we intend to support and stabilize the loop widening,
// we must ensure that it 'plays nicely' with this logic.
- if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
+ if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
Builder.generateNode(StTrue, true, PredN);
+ } else if (AMgr.options.LegacyInliningPrevention) {
+ // FIXME: There is an ancient and very arbitrary heuristic in
+ // `ExprEngine::processCFGBlockEntrance` which prevents all further
+ // inlining of a function if it finds an execution path within that
+ // function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
+ // `analyzer-max-loop`, by default four iterations in a loop). Adding
+ // this "don't assume third iteration" logic significantly increased
+ // the analysis runtime on some inputs because less functions were
+ // arbitrarily excluded from being inlined, so more entrypoints used
+ // up their full allocated budget. As a hacky compensation for this,
+ // here we apply the "should not inline" mark in cases when the loop
+ // could potentially reach the `MaxBlockVisitOnPath` limit without the
+ // "don't assume third iteration" logic. This slightly overcompensates
+ // (activates if the third iteration can be entered, and will not
+ // recognize cases where the fourth iteration would't be completed), but
+ // should be good enough for practical purposes.
+ if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+ Engine.FunctionSummaries->markShouldNotInline(
+ LC->getStackFrame()->getDecl());
+ }
+ }
}
if (StFalse) {
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 80cad54b039f4..42c622f2cd51b 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -92,6 +92,7 @@
// CHECK-NEXT: inline-lambdas = true
// CHECK-NEXT: ipa = dynamic-bifurcate
// CHECK-NEXT: ipa-always-inline-size = 3
+// CHECK-NEXT: legacy-inlining-prevention = true
// CHECK-NEXT: max-inlinable-size = 100
// CHECK-NEXT: max-nodes = 225000
// CHECK-NEXT: max-symbol-complexity = 35
diff --git a/clang/test/Analysis/legacy-inlining-prevention.c b/clang/test/Analysis/legacy-inlining-prevention.c
new file mode 100644
index 0000000000000..976e2ba6ddb5d
--- /dev/null
+++ b/clang/test/Analysis/legacy-inlining-prevention.c
@@ -0,0 +1,198 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config legacy-inlining-prevention=false -verify=expected,disabled %s
+
+int getNum(void); // Get an opaque number.
+
+void clang_analyzer_numTimesReached(void);
+void clang_analyzer_dump(int arg);
+
+//-----------------------------------------------------------------------------
+// Simple case: inlined function never reaches `analyzer-max-loop`.
+
+int inner_simple(void) {
+ clang_analyzer_numTimesReached(); // expected-warning {{2}}
+ return 42;
+}
+
+int outer_simple(void) {
+ int x = inner_simple();
+ int y = inner_simple();
+ return 53 / (x - y); // expected-warning {{Division by zero}}
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`.
+
+int inner_fixed_loop_1(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ for (i = 0; i < 10; i++);
+ clang_analyzer_numTimesReached(); // no-warning
+ return 42;
+}
+
+int outer_fixed_loop_1(void) {
+ int x = inner_fixed_loop_1();
+ int y = inner_fixed_loop_1();
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
+// even for different entry points.
+// This test uses `clang_analyzer_dump` and distinct `arg` values because
+// `clang_analyzer_numTimesReached` only counts the paths reaching that node
+// during the analysis of one particular entry point, so it cannot distinguish
+// "two entry points reached this, both with one path" (where the two reports
+// are unified as duplicates by the generic report postprocessing) and "one
+// entry point reached this with one path" (where naturally nothing shows that
+// the second entry point _tried_ to reach it).
+
+int inner_fixed_loop_2(int arg) {
+ // Identical copy of inner_fixed_loop_1
+ int i;
+ clang_analyzer_dump(arg); // expected-warning {{2}}
+ for (i = 0; i < 10; i++);
+ clang_analyzer_dump(arg); // no-warning
+ return 42;
+}
+
+int outer_1_fixed_loop_2(void) {
+ return inner_fixed_loop_2(1);
+}
+
+int outer_2_fixed_loop_2(void) {
+ return inner_fixed_loop_2(2);
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` only in its second call. The
+// function is inlined twice but the second call doesn't finish and ends up
+// being conservatively evaluated.
+
+int inner_parametrized_loop_1(int count) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{2}}
+ for (i = 0; i < count; i++);
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ return 42;
+}
+
+int outer_parametrized_loop_1(void) {
+ int x = inner_parametrized_loop_1(2);
+ int y = inner_parametrized_loop_1(10);
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` on its first call, so the
+// second call isn't inlined (although it could be fully evaluated).
+
+int inner_parametrized_loop_2(int count) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ for (i = 0; i < count; i++);
+ clang_analyzer_numTimesReached(); // no-warning
+ return 42;
+}
+
+int outer_parametrized_loop_2(void) {
+ int y = inner_parametrized_loop_2(10);
+ int x = inner_parametrized_loop_2(2);
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function may or may not reach `analyzer-max-loop` depending on an
+// opaque check before the loop. This is very similar to the "fixed loop"
+// cases: the function is placed on the "don't inline" list when any execution
+// path reaches `analyzer-max-loop` (even if other execution paths reach the
+// end of the function).
+
+int inner_conditional_loop(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ if (getNum() == 777) {
+ for (i = 0; i < 10; i++);
+ }
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ return 42;
+}
+
+int outer_1_conditional_loop(void) {
+ return inner_conditional_loop();
+}
+
+int outer_2_conditional_loop(void) {
+ return inner_conditional_loop();
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function executes an opaque loop that may or may not reach
+// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
+// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
+// `conditional_loop` cases: the analyzer was able to find a path reaching
+// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
+// does not _assume_ a third (or later) iteration (i.e. does not enter those
+// iterations if the loop condition is an unknown value), so e.g. this test
+// function does not reach `analyzer-max-loop` iterations and the inlining is
+// not disabled.
+// Unfortunately this change significantly increased the workload and
+// runtime of the analyzer (more entry points used up their budget), so the
+// option `legacy-inlining-prevention` was introduced and enabled by default to
+// suppress the inlining in situations where the "don't assume third iteration"
+// logic activates.
+// This testcase demonstrate that the inlining is prevented with the default
+// `legacy-inlining-prevention=true` config, but is not prevented when this
+// option is disabled (set to false).
+
+int inner_opaque_loop_1(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // default-warning {{1}} disabled-warning {{2}}
+ for (i = 0; i < getNum(); i++);
+ return i;
+}
+
+int outer_opaque_loop_1(void) {
+ int iterCount = inner_opaque_loop_1();
+
+ // The first call to `inner_opaque_loop_1()` splits three execution paths that
+ // differ in the number of performed iterations (0, 1 or 2). The function
+ // `inner_opaque_loop_1` is added to the "do not inline this" list when the
+ // path that performed two iterations tries to enter the third iteration (and
+ // the "don't assume third iteration" logic prevents this) -- but the other
+ // two paths (which performed 0 and 1 iterations) would reach and inline the
+ // second `inner_opaque_loop_1()` before this would happen (because the
+ // default traversal is a complex heuristic that happens to prefer this). The
+ // following `if` will discard these "early exit" paths to highlight the
+ // difference between the default and disabled state:
+ if (iterCount < 2)
+ return 0;
+
+ return inner_opaque_loop_1();
+}
+
+//-----------------------------------------------------------------------------
+// Another less contrived testcase that demonstrates the difference between the
+// enabled (default) and disabled state of `legacy-inlining-prevention`.
+// Here the two calls to `inner_opaque_loop_2()` are in different entry points
+// so the first call is fully analyzed (and can put the function on the "do
+// not inline" list) before reaching the second call.
+// This test uses `clang_analyzer_dump` because (as explained in an earlier
+// comment block) `clang_analyzer_numTimesReached` is not suitable for counting
+// visits from separate entry points.
+
+int inner_opaque_loop_2(int arg) {
+ int i;
+ clang_analyzer_dump(arg); // default-warning {{2}}
+ // disabled-warning@-1 {{1}} disabled-warning@-1 {{2}}
+ for (i = 0; i < getNum(); i++);
+ return i;
+}
+
+int outer_1_opaque_loop_2(void) {
+ return inner_opaque_loop_2(1);
+}
+int outer_2_opaque_loop(void) {
+ return inner_opaque_loop_2(2);
+}
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index bf05a7739ce48..ebae81e000c7a 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
void clang_analyzer_numTimesReached();
void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
}
int simple_known_bound_loop() {
+ // Iteration count visible: can be unrolled and fully executed.
for (int i = 2; i < 12; i++) {
// This function is inlined in nested_inlined_unroll1()
clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
}
int simple_unknown_bound_loop() {
+ // Iteration count unknown: unrolling won't happen and the execution will be
+ // split two times:
+ // (1) split between skipped loop (immediate exit) and entering the loop
+ // (2) split between exit after 1 iteration and entering the second iteration
+ // After these there is no third state split because the "don't assume third
+ // iteration" logic in `ExprEngine::processBranch` prevents it; but the
+ // `legacy-inlining-prevention` logic will put this function onto the list of
+ // functions that may not be inlined in the future.
+ // The exploration strategy apparently influences the number of times this
+ // function can be inlined before it's placed on the "don't inline" list.
for (int i = 2; i < getNum(); i++) {
- clang_analyzer_numTimesReached(); // expected-warning {{8}}
+ clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
}
return 0;
}
int nested_inlined_unroll1() {
+ // Here the analyzer can unroll and fully execute both the outer loop and the
+ // inner loop within simple_known_bound_loop().
int k;
for (int i = 0; i < 9; i++) {
clang_analyzer_numTimesReached(); // expected-warning {{9}}
- k = simple_known_bound_loop(); // no reevaluation without inlining
+ k = simple_known_bound_loop();
}
int a = 22 / k; // expected-warning {{Division by zero}}
return 0;
}
int nested_inlined_no_unroll1() {
+ // Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
+ // iterations of the loop within this function, but some state splits happen
+ // in `simple_unknown_bound_loop()` calls.
int k;
- for (int i = 0; i < 9; i++) {
- clang_analyzer_numTimesReached(); // expected-warning {{10}}
- k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
+ for (int i = 0; i < 40; i++) {
+ clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
+ k = simple_unknown_bound_loop();
}
int a = 22 / k; // no-warning
return 0;
|
@llvm/pr-subscribers-clang-static-analyzer-1 Author: Donát Nagy (NagyDonat) ChangesRecently some users reported that the runtime of the static analyzer drastically increased on their codebase when they upgraded to a more recent (sligthly patched, internal) clang version: some of their translation units got more than +600% runtime (yes, with two zeroes) and IIUC the average increase was also high. Bisection revealed that the bulk of the increase was caused by my earlier commit bb27d5e ("Don't assume third iteration in loops") and I verified that the slowdown affects the downstream clang. (I used tmux as an arbitrary easy-to-analyze open source project and measured that reverting this commit reduces the total runtime by more than 50% compared to a recent main revision.) Further profiling and investigation proved that this increase is caused by an increase of analysis scope because there was an arbitrary heuristic that placed functions on a "don't inline this" blacklist if they reached the I feel that this heuristic for the "don't inline" blacklist is unjustified, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the brutal jump of analysis runtime significantly degrades the user experience, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of opaque loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it). Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári who helped me in profiling. My preliminary measurements confirm that this commit does improve the runtime: I checked the analysis of
Full diff: https://github.com/llvm/llvm-project/pull/136720.diff 6 Files Affected:
diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
index f9f22a9ced650..0ac85b14147e0 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
@@ -414,6 +414,20 @@ ANALYZER_OPTION(
"any target.",
true)
+ANALYZER_OPTION(
+ bool, LegacyInliningPrevention, "legacy-inlining-prevention",
+ "If enabled, the analyzer puts functions on a \"do not inline this\" list "
+ "if it finds an execution path within that function that may potentially "
+ "perform 'analyzer-max-loop' (= 4 by default) iterations in a loop. "
+ "Note that functions that _definitely_ reach the loop limit on some "
+ "execution path are currently marked as \"do not inline\" even if this "
+ "option is disabled (but this may change in future versions). This option "
+ "is a dumb and arbitrary restriction on inlining, but disabling it would "
+ "significantly increase the analysis workload (and the time taken) "
+ "compared to older clang versions because more top-level functions can "
+ "use up their 'max-nodes' limit if inlining is not restricted.",
+ true)
+
//===----------------------------------------------------------------------===//
// Unsigned analyzer options.
//===----------------------------------------------------------------------===//
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
index 3ee0d229cfc29..761395260a0cf 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h
@@ -81,10 +81,6 @@ class FunctionSummariesTy {
I->second.MayInline = 0;
}
- void markReachedMaxBlockCount(const Decl *D) {
- markShouldNotInline(D);
- }
-
std::optional<bool> mayInline(const Decl *D) {
MapTy::const_iterator I = Map.find(D);
if (I != Map.end() && I->second.InlineChecked)
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 86e2e8f634bfd..1ff36eb83dbe7 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -2523,6 +2523,20 @@ bool ExprEngine::replayWithoutInlining(ExplodedNode *N,
return true;
}
+/// Return the innermost location context which is inlined at `Node`, unless
+/// it's the top-level (entry point) location context.
+static const LocationContext *getInlinedLocationContext(ExplodedNode *Node,
+ ExplodedGraph &G) {
+ const LocationContext *CalleeLC = Node->getLocation().getLocationContext();
+ const LocationContext *RootLC =
+ (*G.roots_begin())->getLocation().getLocationContext();
+
+ if (CalleeLC->getStackFrame() == RootLC->getStackFrame())
+ return nullptr;
+
+ return CalleeLC;
+}
+
/// Block entrance. (Update counters).
void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
NodeBuilderWithSinks &nodeBuilder,
@@ -2570,21 +2584,24 @@ void ExprEngine::processCFGBlockEntrance(const BlockEdge &L,
const ExplodedNode *Sink =
nodeBuilder.generateSink(Pred->getState(), Pred, &tag);
- // Check if we stopped at the top level function or not.
- // Root node should have the location context of the top most function.
- const LocationContext *CalleeLC = Pred->getLocation().getLocationContext();
- const LocationContext *CalleeSF = CalleeLC->getStackFrame();
- const LocationContext *RootLC =
- (*G.roots_begin())->getLocation().getLocationContext();
- if (RootLC->getStackFrame() != CalleeSF) {
- Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl());
+ if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+ // FIXME: This will unconditionally prevent inlining this function (even
+ // from other entrypoints), which is not a reasonable heuristic: even if
+ // we reached max block count on this particular execution path, there
+ // may be other execution paths (especially with other parametrizations)
+ // where the analyzer can reach the end of the function (so there is no
+ // natural reason to avoid inlining it). However, disabling this would
+ // significantly increase the analysis time (because more entrypoints
+ // would exhaust their allocated budget), so it must be compensated by a
+ // different (more reasonable) reduction of analysis scope.
+ Engine.FunctionSummaries->markShouldNotInline(
+ LC->getStackFrame()->getDecl());
// Re-run the call evaluation without inlining it, by storing the
// no-inlining policy in the state and enqueuing the new work item on
// the list. Replay should almost never fail. Use the stats to catch it
// if it does.
- if ((!AMgr.options.NoRetryExhausted &&
- replayWithoutInlining(Pred, CalleeLC)))
+ if ((!AMgr.options.NoRetryExhausted && replayWithoutInlining(Pred, LC)))
return;
NumMaxBlockCountReachedInInlined++;
} else
@@ -2856,8 +2873,29 @@ void ExprEngine::processBranch(
// conflicts with the widen-loop analysis option (which is off by
// default). If we intend to support and stabilize the loop widening,
// we must ensure that it 'plays nicely' with this logic.
- if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops)
+ if (!SkipTrueBranch || AMgr.options.ShouldWidenLoops) {
Builder.generateNode(StTrue, true, PredN);
+ } else if (AMgr.options.LegacyInliningPrevention) {
+ // FIXME: There is an ancient and very arbitrary heuristic in
+ // `ExprEngine::processCFGBlockEntrance` which prevents all further
+ // inlining of a function if it finds an execution path within that
+ // function which reaches the `MaxBlockVisitOnPath` limit (a/k/a
+ // `analyzer-max-loop`, by default four iterations in a loop). Adding
+ // this "don't assume third iteration" logic significantly increased
+ // the analysis runtime on some inputs because less functions were
+ // arbitrarily excluded from being inlined, so more entrypoints used
+ // up their full allocated budget. As a hacky compensation for this,
+ // here we apply the "should not inline" mark in cases when the loop
+ // could potentially reach the `MaxBlockVisitOnPath` limit without the
+ // "don't assume third iteration" logic. This slightly overcompensates
+ // (activates if the third iteration can be entered, and will not
+ // recognize cases where the fourth iteration would't be completed), but
+ // should be good enough for practical purposes.
+ if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) {
+ Engine.FunctionSummaries->markShouldNotInline(
+ LC->getStackFrame()->getDecl());
+ }
+ }
}
if (StFalse) {
diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c
index 80cad54b039f4..42c622f2cd51b 100644
--- a/clang/test/Analysis/analyzer-config.c
+++ b/clang/test/Analysis/analyzer-config.c
@@ -92,6 +92,7 @@
// CHECK-NEXT: inline-lambdas = true
// CHECK-NEXT: ipa = dynamic-bifurcate
// CHECK-NEXT: ipa-always-inline-size = 3
+// CHECK-NEXT: legacy-inlining-prevention = true
// CHECK-NEXT: max-inlinable-size = 100
// CHECK-NEXT: max-nodes = 225000
// CHECK-NEXT: max-symbol-complexity = 35
diff --git a/clang/test/Analysis/legacy-inlining-prevention.c b/clang/test/Analysis/legacy-inlining-prevention.c
new file mode 100644
index 0000000000000..976e2ba6ddb5d
--- /dev/null
+++ b/clang/test/Analysis/legacy-inlining-prevention.c
@@ -0,0 +1,198 @@
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -verify=expected,default %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config legacy-inlining-prevention=false -verify=expected,disabled %s
+
+int getNum(void); // Get an opaque number.
+
+void clang_analyzer_numTimesReached(void);
+void clang_analyzer_dump(int arg);
+
+//-----------------------------------------------------------------------------
+// Simple case: inlined function never reaches `analyzer-max-loop`.
+
+int inner_simple(void) {
+ clang_analyzer_numTimesReached(); // expected-warning {{2}}
+ return 42;
+}
+
+int outer_simple(void) {
+ int x = inner_simple();
+ int y = inner_simple();
+ return 53 / (x - y); // expected-warning {{Division by zero}}
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`.
+
+int inner_fixed_loop_1(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ for (i = 0; i < 10; i++);
+ clang_analyzer_numTimesReached(); // no-warning
+ return 42;
+}
+
+int outer_fixed_loop_1(void) {
+ int x = inner_fixed_loop_1();
+ int y = inner_fixed_loop_1();
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function always reaches `analyzer-max-loop`; inlining is prevented
+// even for different entry points.
+// This test uses `clang_analyzer_dump` and distinct `arg` values because
+// `clang_analyzer_numTimesReached` only counts the paths reaching that node
+// during the analysis of one particular entry point, so it cannot distinguish
+// "two entry points reached this, both with one path" (where the two reports
+// are unified as duplicates by the generic report postprocessing) and "one
+// entry point reached this with one path" (where naturally nothing shows that
+// the second entry point _tried_ to reach it).
+
+int inner_fixed_loop_2(int arg) {
+ // Identical copy of inner_fixed_loop_1
+ int i;
+ clang_analyzer_dump(arg); // expected-warning {{2}}
+ for (i = 0; i < 10; i++);
+ clang_analyzer_dump(arg); // no-warning
+ return 42;
+}
+
+int outer_1_fixed_loop_2(void) {
+ return inner_fixed_loop_2(1);
+}
+
+int outer_2_fixed_loop_2(void) {
+ return inner_fixed_loop_2(2);
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` only in its second call. The
+// function is inlined twice but the second call doesn't finish and ends up
+// being conservatively evaluated.
+
+int inner_parametrized_loop_1(int count) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{2}}
+ for (i = 0; i < count; i++);
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ return 42;
+}
+
+int outer_parametrized_loop_1(void) {
+ int x = inner_parametrized_loop_1(2);
+ int y = inner_parametrized_loop_1(10);
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function reaches `analyzer-max-loop` on its first call, so the
+// second call isn't inlined (although it could be fully evaluated).
+
+int inner_parametrized_loop_2(int count) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ for (i = 0; i < count; i++);
+ clang_analyzer_numTimesReached(); // no-warning
+ return 42;
+}
+
+int outer_parametrized_loop_2(void) {
+ int y = inner_parametrized_loop_2(10);
+ int x = inner_parametrized_loop_2(2);
+ return 53 / (x - y); // no-warning
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function may or may not reach `analyzer-max-loop` depending on an
+// opaque check before the loop. This is very similar to the "fixed loop"
+// cases: the function is placed on the "don't inline" list when any execution
+// path reaches `analyzer-max-loop` (even if other execution paths reach the
+// end of the function).
+
+int inner_conditional_loop(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ if (getNum() == 777) {
+ for (i = 0; i < 10; i++);
+ }
+ clang_analyzer_numTimesReached(); // expected-warning {{1}}
+ return 42;
+}
+
+int outer_1_conditional_loop(void) {
+ return inner_conditional_loop();
+}
+
+int outer_2_conditional_loop(void) {
+ return inner_conditional_loop();
+}
+
+//-----------------------------------------------------------------------------
+// Inlined function executes an opaque loop that may or may not reach
+// `analyzer-max-loop`. Historically, before the "don't assume third iteration"
+// commit (bb27d5e5c6b194a1440b8ac4e5ace68d0ee2a849) this worked like the
+// `conditional_loop` cases: the analyzer was able to find a path reaching
+// `analyzer-max-loop` so inlining was disabled. After that commit the analyzer
+// does not _assume_ a third (or later) iteration (i.e. does not enter those
+// iterations if the loop condition is an unknown value), so e.g. this test
+// function does not reach `analyzer-max-loop` iterations and the inlining is
+// not disabled.
+// Unfortunately this change significantly increased the workload and
+// runtime of the analyzer (more entry points used up their budget), so the
+// option `legacy-inlining-prevention` was introduced and enabled by default to
+// suppress the inlining in situations where the "don't assume third iteration"
+// logic activates.
+// This testcase demonstrate that the inlining is prevented with the default
+// `legacy-inlining-prevention=true` config, but is not prevented when this
+// option is disabled (set to false).
+
+int inner_opaque_loop_1(void) {
+ int i;
+ clang_analyzer_numTimesReached(); // default-warning {{1}} disabled-warning {{2}}
+ for (i = 0; i < getNum(); i++);
+ return i;
+}
+
+int outer_opaque_loop_1(void) {
+ int iterCount = inner_opaque_loop_1();
+
+ // The first call to `inner_opaque_loop_1()` splits three execution paths that
+ // differ in the number of performed iterations (0, 1 or 2). The function
+ // `inner_opaque_loop_1` is added to the "do not inline this" list when the
+ // path that performed two iterations tries to enter the third iteration (and
+ // the "don't assume third iteration" logic prevents this) -- but the other
+ // two paths (which performed 0 and 1 iterations) would reach and inline the
+ // second `inner_opaque_loop_1()` before this would happen (because the
+ // default traversal is a complex heuristic that happens to prefer this). The
+ // following `if` will discard these "early exit" paths to highlight the
+ // difference between the default and disabled state:
+ if (iterCount < 2)
+ return 0;
+
+ return inner_opaque_loop_1();
+}
+
+//-----------------------------------------------------------------------------
+// Another less contrived testcase that demonstrates the difference between the
+// enabled (default) and disabled state of `legacy-inlining-prevention`.
+// Here the two calls to `inner_opaque_loop_2()` are in different entry points
+// so the first call is fully analyzed (and can put the function on the "do
+// not inline" list) before reaching the second call.
+// This test uses `clang_analyzer_dump` because (as explained in an earlier
+// comment block) `clang_analyzer_numTimesReached` is not suitable for counting
+// visits from separate entry points.
+
+int inner_opaque_loop_2(int arg) {
+ int i;
+ clang_analyzer_dump(arg); // default-warning {{2}}
+ // disabled-warning@-1 {{1}} disabled-warning@-1 {{2}}
+ for (i = 0; i < getNum(); i++);
+ return i;
+}
+
+int outer_1_opaque_loop_2(void) {
+ return inner_opaque_loop_2(1);
+}
+int outer_2_opaque_loop(void) {
+ return inner_opaque_loop_2(2);
+}
diff --git a/clang/test/Analysis/loop-unrolling.cpp b/clang/test/Analysis/loop-unrolling.cpp
index bf05a7739ce48..ebae81e000c7a 100644
--- a/clang/test/Analysis/loop-unrolling.cpp
+++ b/clang/test/Analysis/loop-unrolling.cpp
@@ -1,5 +1,5 @@
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
-// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s
+// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s
void clang_analyzer_numTimesReached();
void clang_analyzer_warnIfReached();
@@ -337,6 +337,7 @@ int nested_both_unrolled() {
}
int simple_known_bound_loop() {
+ // Iteration count visible: can be unrolled and fully executed.
for (int i = 2; i < 12; i++) {
// This function is inlined in nested_inlined_unroll1()
clang_analyzer_numTimesReached(); // expected-warning {{90}}
@@ -345,27 +346,42 @@ int simple_known_bound_loop() {
}
int simple_unknown_bound_loop() {
+ // Iteration count unknown: unrolling won't happen and the execution will be
+ // split two times:
+ // (1) split between skipped loop (immediate exit) and entering the loop
+ // (2) split between exit after 1 iteration and entering the second iteration
+ // After these there is no third state split because the "don't assume third
+ // iteration" logic in `ExprEngine::processBranch` prevents it; but the
+ // `legacy-inlining-prevention` logic will put this function onto the list of
+ // functions that may not be inlined in the future.
+ // The exploration strategy apparently influences the number of times this
+ // function can be inlined before it's placed on the "don't inline" list.
for (int i = 2; i < getNum(); i++) {
- clang_analyzer_numTimesReached(); // expected-warning {{8}}
+ clang_analyzer_numTimesReached(); // default-warning {{4}} dfs-warning {{8}}
}
return 0;
}
int nested_inlined_unroll1() {
+ // Here the analyzer can unroll and fully execute both the outer loop and the
+ // inner loop within simple_known_bound_loop().
int k;
for (int i = 0; i < 9; i++) {
clang_analyzer_numTimesReached(); // expected-warning {{9}}
- k = simple_known_bound_loop(); // no reevaluation without inlining
+ k = simple_known_bound_loop();
}
int a = 22 / k; // expected-warning {{Division by zero}}
return 0;
}
int nested_inlined_no_unroll1() {
+ // Here no unrolling happens and we only run `analyzer-max-loop` (= 4)
+ // iterations of the loop within this function, but some state splits happen
+ // in `simple_unknown_bound_loop()` calls.
int k;
- for (int i = 0; i < 9; i++) {
- clang_analyzer_numTimesReached(); // expected-warning {{10}}
- k = simple_unknown_bound_loop(); // reevaluation without inlining, splits the state as well
+ for (int i = 0; i < 40; i++) {
+ clang_analyzer_numTimesReached(); // default-warning {{9}} dfs-warning {{12}}
+ k = simple_unknown_bound_loop();
}
int a = 22 / k; // no-warning
return 0;
|
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s | ||
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify -std=c++14 -DDFS=1 %s | ||
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true -verify=expected,default -std=c++14 -analyzer-config exploration_strategy=unexplored_first_queue %s | ||
// RUN: %clang_analyze_cc1 -analyzer-checker=core,debug.ExprInspection -analyzer-config unroll-loops=true,cfg-loopexit=true,exploration_strategy=dfs -verify=expected,dfs -std=c++14 %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm leaning towards eliminating the alternative RUN: line that uses dfs
because the only difference between dfs
and the default exploration algorithm is a "control group" testcase where unroll-loops
does not activate.
I'll probably do so in a separate follow-up commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for investigating this. At Sonar, we have not yet started the upgrade to clang-20. I suppose, you already did then, thus found this regression on trunk.
Maybe we should also reflect of the quality control of our submissions of core changes too, but let's leave that after the PR is discussed, and we put out the fire.
My impression is that we shouldn't have a new flag for this, we should unconditionally apply this for now. Did you think about this?
I left a few other comments about testing but overall I'm all for this change.
Thanks again!
} else if (AMgr.options.LegacyInliningPrevention) { | ||
// FIXME: There is an ancient and very arbitrary heuristic in | ||
// `ExprEngine::processCFGBlockEntrance` which prevents all further | ||
// inlining of a function if it finds an execution path within that | ||
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a | ||
// `analyzer-max-loop`, by default four iterations in a loop). Adding | ||
// this "don't assume third iteration" logic significantly increased | ||
// the analysis runtime on some inputs because less functions were | ||
// arbitrarily excluded from being inlined, so more entrypoints used | ||
// up their full allocated budget. As a hacky compensation for this, | ||
// here we apply the "should not inline" mark in cases when the loop | ||
// could potentially reach the `MaxBlockVisitOnPath` limit without the | ||
// "don't assume third iteration" logic. This slightly overcompensates | ||
// (activates if the third iteration can be entered, and will not | ||
// recognize cases where the fourth iteration would't be completed), but | ||
// should be good enough for practical purposes. | ||
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) { | ||
Engine.FunctionSummaries->markShouldNotInline( | ||
LC->getStackFrame()->getDecl()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about if we would apply this chunk unconditionally? I don't think anyone would want to regress 6x voluntarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this behind a flag (which restores the original behavior by default) because it is not just "wasted" runtime, but offers a tradeoff between runtime and the depth/accuracy of the analysis. There are several other options like max-nodes
or the high-level mode
setting that offer analogous tradeoffs. However, I will refine this judgement when I check the result produced with and without this patch -- and if the difference is negligible, I'm more open to applying this chunk unconditionally.
On a long term we will probably need a heuristic to "avoid inlining large functions", but I strongly suspect that this loop-based hacky condition is not the "right" (natural, straightforward, robust etc.) way of formalizing this idea. I feel that the current bounds of the analysis scope are almost "accidental" (or at least I don't know about any measurements that justify e.g. the default max-nodes
budget value or the various "don't inline if ..." conditions) and I hope that somewhen I could spend some time on understanding and improving this area.
Right now the outline of my plan roughly looks like:
- (1) Publish this workaround ASAP to restore the "old normal" runtimes and minimize the disruption.
- (2) Investigate the (runtime) costs and benefits of various inlining or exploration strategies, and publish alternative heuristics that express "large functions" in different, more natural terms (e.g. block count, total # of nodes created etc. instead of "did it reach four iterations in a loop").
- (3) Adjust the default config: disable this
legacy-inlining-prevention
by default and instead enable other heuristics which on average guarantee similar total runtime, but perhaps provide better coverage / more useful results / better robustness, ...- Even then the users would be able to return to this legacy config if their project is an outlier where this particular heuristic was especially aggressive and they liked it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is, if Ericsson will ship this true
, and we are also going to ship this with true
, then who would want to disable this? Do you expect users disabling this? Would anyone complain if we would just land this fix without an option for opting out?
Speaking of arbitrary cut heuristics, maybe they are arbitrary, but that should only make use more careful of changing them because continuity itself is valuable - but I'm not opposed, just cautions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I am not sure we should have legacy in the name of something that is enabled by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is, if Ericsson will ship this true, and we are also going to ship this with true, then who would want to disable this? Do you expect users disabling this?
Good point.
Would anyone complain if we would just land this fix without an option for opting out?
I felt that the logic introduced by this commit is so ugly that it should not be unconditionally part of the analyzer behavior (especially if it is git blame
d to me...). By introducing this option I wanted to express that "this is not a stable permanent part of the analyzer, please don't rely on its existence" -- and I felt that yet another "FIXME" comment is not enough for this hack.
However, I can accept removing this option (and making the hack unconditional) if you strongly prefer that.
Also I am not sure we should have legacy in the name of something that is enabled by default.
When I wrote this change, I hoped that this config option could be switched off and deprecated in the foreseeable future (in fact it was disabled by default in the first draft of the commit).
Also, this heuristic is so crazy that I couldn't give a descriptive option name (do-not-inline-function-if-it-would-assume-third-iteration-in-a-loop
is too much even compared to the other option names), so I named it "legacy" to say that it's connected to some old awkward heuristic.
int i; | ||
clang_analyzer_numTimesReached(); // expected-warning {{1}} | ||
for (i = 0; i < 10; i++); | ||
clang_analyzer_numTimesReached(); // no-warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang_analyzer_numTimesReached(); // no-warning | |
clang_analyzer_numTimesReached(); // FIXME: It should be reachable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be careful with this FIXME -- if somebody follows it and fixes this issue (e.g. by enabling unroll-loops
by default) we could get yet another large slowdown when less loops hit the analyzer-max-loop
limit, less functions end on the "don't inline" list and more entry points spend their budget.
Obviously in an ideal world an empty loop like this shouldn't stop the analyzer – but we must replace this awkward loop-based inlining restriction with a more robust heuristic before we can do that. (I don't want to spread "older dumber analyzer would give up here, don't inline this function" hacks in the codebase as we improve the loop handling.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get the argument why would be a risk of having a FIXME here. It's the reviewer's and maintainer's responsibility to ensure the changes align with the long term directions of the project.
If someone would just override analyzer-max-loop
for this test "to fix the FIXME", that should be rejected.
However, if we found a way to enable it by default, or have a smarter engine, then it's fine - but that would also touch files beyond this single test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the situation is complex because:
- from the POV of the users this "analysis is stopped by long loops" behavior is obviously a bug that needs to be fixed;
- but from the POV of the inlining prevention heuristic (which is tested here) this is a known feature (or at least "property") of the analyzer engine which is important for the correct delineation of the scope.
You're right that these deserve a FIXME, but I'll probably write a longer FIXME block that explains the full situation instead of short inline FIXME notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
int inner_fixed_loop_1(void) { | ||
int i; | ||
clang_analyzer_numTimesReached(); // expected-warning {{1}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could pass a const char*
from a string literal to differentiate which is inlined when observing its value using a clang_analyzer_dump()
at the beginning of this inlined function. That should make it clear that for the first time its inlined, but for the second time it's not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I can switch all testcases to using clang_analyzer_dump(arg)
instead of numTimesReached
to highlight which calls were inlined (and not just their count). There are already a few testcases where I use that solution (because I need it for multiple entry points), but you're right that it would be helpful in the other cases as well.
int x = inner_fixed_loop_1(); | ||
int y = inner_fixed_loop_1(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have comments here that first it's inlined, second it's not because of the given heuristic.
int outer_fixed_loop_1(void) { | ||
int x = inner_fixed_loop_1(); | ||
int y = inner_fixed_loop_1(); | ||
return 53 / (x - y); // no-warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My problem with these no-warnings
in general in this PR that it documents what the test currently does, but what they should document what the tests should/could expect.
In this case in an ideal world, we should actually get a diagnostic, thus the desired outcome is not a no-warning
.
Consequently, a FIXME would be more appropriate I think.
int i; | ||
clang_analyzer_dump(arg); // expected-warning {{2}} | ||
for (i = 0; i < 10; i++); | ||
clang_analyzer_dump(arg); // no-warning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME.
One other note. We should backport this fix to clang-20 once it lands to main. |
Note that we need to expect e.g. {{2 S32}} because the pattern {{2}} would match any other 32-bit integer as well. The last two testcases in the file were testing the same thing in different ways, so this commit only keeps the one that used `clang_analyzer_dump` (which was previously an exception).
I tested the impact of this commit on various open source projects and the changes in the result counts are surprisingly small:
Unfortunately there was some trouble with our publicly available "demo" CodeChecker server (where we usually present the analysis results), so I had to store these results to an internal server and they are not available externally. (If somebody is interested, I can give priority to finding a way to publish the full report set.) |
Do you have data about the analysis times per file, and per analysis entry point? |
"Note that functions that _definitely_ reach the loop limit on some " | ||
"execution path are currently marked as \"do not inline\" even if this " | ||
"option is disabled (but this may change in future versions). This option " | ||
"is a dumb and arbitrary restriction on inlining, but disabling it would " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd remove "dumb" from the user facing string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was annoyed by the situation when I wrote this description, but I was already planning to rewrite these before finalizing the commit.
EDIT: done.
Engine.FunctionSummaries->markReachedMaxBlockCount(CalleeSF->getDecl()); | ||
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) { | ||
// FIXME: This will unconditionally prevent inlining this function (even | ||
// from other entrypoints), which is not a reasonable heuristic: even if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think entry points are two distinct words.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this! For some reason I tend to automatically write "entrypoint"; before uploading the PR I fixed it in many locations, but apparently I failed to catch this one...
EDIT: fixed this and two other occurrences.
} else if (AMgr.options.LegacyInliningPrevention) { | ||
// FIXME: There is an ancient and very arbitrary heuristic in | ||
// `ExprEngine::processCFGBlockEntrance` which prevents all further | ||
// inlining of a function if it finds an execution path within that | ||
// function which reaches the `MaxBlockVisitOnPath` limit (a/k/a | ||
// `analyzer-max-loop`, by default four iterations in a loop). Adding | ||
// this "don't assume third iteration" logic significantly increased | ||
// the analysis runtime on some inputs because less functions were | ||
// arbitrarily excluded from being inlined, so more entrypoints used | ||
// up their full allocated budget. As a hacky compensation for this, | ||
// here we apply the "should not inline" mark in cases when the loop | ||
// could potentially reach the `MaxBlockVisitOnPath` limit without the | ||
// "don't assume third iteration" logic. This slightly overcompensates | ||
// (activates if the third iteration can be entered, and will not | ||
// recognize cases where the fourth iteration would't be completed), but | ||
// should be good enough for practical purposes. | ||
if (const LocationContext *LC = getInlinedLocationContext(Pred, G)) { | ||
Engine.FunctionSummaries->markShouldNotInline( | ||
LC->getStackFrame()->getDecl()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I am not sure we should have legacy in the name of something that is enabled by default.
@balazs-benics-sonarsource Unfortunately our current CI (which runs the static analyzer via CodeChecker) cannot provide file-level or entrypoint-level timing information, but I do have runtime measurements of the full analysis (on our Jenkins node with 16 parallel threads):
Both measurements were performed on the commit under review with different config: If needed, I can measure timing differences compared to other commits as well. I think the most natural comparison would be between
and
because that way the comparison wouldn't be influenced by other changes that were merged in the last few months. EDIT: I started a CI job to compare these two revisions, I'll post results tomorrow. |
I pushed two commits (including d57946b which is hidden among the earlier review comments based on its creation time), and these in theory address most review comments, with the exception of two areas:
I'll try to address these tomorrow. |
Execution time comparisons between the commit under review and just reverting bb27d5e ("Don't assume third iteration") atop a84a6f7 (the parent of the commit under review):
🤔 The version with the commit under review is surprisingly fast and I don't exactly know why. My most plausible theory is that bb27d5e ("Don't assume third iteration") has two effects on the analysis runtime:
This would explain why just reverting that commit (which undoes both effects) produces slower analysis than applying the commit under review (which undoes the slowdown and keeps the speedup). |
This makes sense, and I was expecting even back in the day. But I was shocked that sometimes intuition fails, and we didn't check the RT for that patch. Now that you did the work, it leaves me in a lot more relaxed situation. Thanks! |
This is a legacy project with many skeletons hidden in various closets... The main moral of this story is probably that we should be more paranoid about measuring runtime (I'm thinking about whether we can set up some regular or semi-regular runtime measurements in our CI), but this also shows why we should work on reducing the technical debt. |
With the two recent commits I think I addressed all review comments except for the "on-by-default config option" vs "unconditionally active logic" question. In this area (which was discussed in inline comments) I'm still leaning towards the currently implemented
If we keep the config option, we can also think about renaming it. I agree that @balazs-benics-sonarsource @Xazax-hun WDYT about this area? Do you have any other question/remark? |
I can see your points. I think they indeed moves the needle slightly for having this (or a similar) flag but barely. |
I did yet another measurement where I measured the runtime of bb27d5e ("Don't assume third iteration") compared to its parent commit (i.e. the measurement that we should've done before merging it) and it turns out that on average it does not increase the runtime and tmux (where Endre observed a significant slowdown) is an extreme outlier:
By the way Endre Fülöp picked tmux arbitrarily as a "random small open source project that is easy to build" when he tried to reproduce the slowdown reported by the user and as it immediately reproduced a significant slowdown, we assumed that this is an universal phenomenon -- but it turns out that this was just a coincidence. These results show that most users won't experience a brutal slowdown (especially if they have a large varied codebase where the law of large numbers guarantees average results), which reduces the urgency of this PR a bit. (And I will rewrite the commit message, clarifying that the runtime increases are mostly concentrated on individual TUs and aren't too large on average.) |
@balazs-benics-sonarsource Gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. There were two points unaddressed:
- Finding a name for the flag without the
legacy-
prefix - Find out if we can ever have multiple root nodes in an exploded graph.
I discussed the name with Endre (@gamesh411) in person and our best idea is (Alternatively, @balazs-benics-sonarsource @ anybody else |
@Xazax-hun WDYT of the proposed alternative flag name? (as you challenged the currently proposed name) |
|
LGTM, given we use the new flag name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Thanks for the fix, I'll proceed with the backport if you still believe it's worthy. @NagyDonat |
This is not an automatic "every project is affected" slowdown, but on smaller outlier projects like tmux the runtime may double, so I'd guess that it's reasonable to backport this commit (although I'm not familiar with the exact conditions when a commit should be backported). Thanks for doing the backport (if you decide to do it). |
…llvm#136720) Recently some users reported that they observed large increases of runtime (up to +600% on some translation units) when they upgraded to a more recent (slightly patched, internal) clang version. Bisection revealed that the bulk of this increase was probably caused by my earlier commit bb27d5e ("Don't assume third iteration in loops"). As I evaluated that earlier commit on several open source project, it turns out that on average it's runtime-neutral (or slightly helpful: it reduced the total analysis time by 1.5%) but it can cause runtime spikes on some code: in particular it more than doubled the time to analyze `tmux` (one of the smaller test projects). Further profiling and investigation proved that these spikes were caused by an _increase of analysis scope_ because there was an heuristic that placed functions on a "don't inline this" blacklist if they reached the `-analyzer-max-loop` limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.) I feel that this heuristic for the "don't inline" blacklist is unjustified and arbitrary, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the runtime increases are annoying for the users whose project is affected, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of ambiguous loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it). Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári (tigbr) who helped me in profiling. (cherry picked from commit 9600a12)
Backport requested in #139597 |
…llvm#136720) Recently some users reported that they observed large increases of runtime (up to +600% on some translation units) when they upgraded to a more recent (slightly patched, internal) clang version. Bisection revealed that the bulk of this increase was probably caused by my earlier commit bb27d5e ("Don't assume third iteration in loops"). As I evaluated that earlier commit on several open source project, it turns out that on average it's runtime-neutral (or slightly helpful: it reduced the total analysis time by 1.5%) but it can cause runtime spikes on some code: in particular it more than doubled the time to analyze `tmux` (one of the smaller test projects). Further profiling and investigation proved that these spikes were caused by an _increase of analysis scope_ because there was an heuristic that placed functions on a "don't inline this" blacklist if they reached the `-analyzer-max-loop` limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.) I feel that this heuristic for the "don't inline" blacklist is unjustified and arbitrary, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope. However, the runtime increases are annoying for the users whose project is affected, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of ambiguous loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it). Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári (tigbr) who helped me in profiling. (cherry picked from commit 9600a12)
Recently some users reported that they observed large increases of runtime (up to +600% on some translation units) when they upgraded to a more recent (slightly patched, internal) clang version. Bisection revealed that the bulk of this increase was probably caused by my earlier commit bb27d5e ("Don't assume third iteration in loops").
As I evaluated that earlier commit on several open source project, it turns out that on average it's runtime-neutral (or slightly helpful: it reduced the total analysis time by 1.5%) but it can cause runtime spikes on some code: in particular it more than doubled the time to analyze
tmux
(one of the smaller test projects).Further profiling and investigation proved that these spikes were caused by an increase of analysis scope because there was an heuristic that placed functions on a "don't inline this" blacklist if they reached the
-analyzer-max-loop
limit (anywhere, on any one execution path) -- which became significantly rarer when my commit ensured the analyzer no longer "just assumes" four iterations. (With more inlining significantly more entry points use up their allocated budgets, which leads to the increased runtime.)I feel that this heuristic for the "don't inline" blacklist is unjustified and arbitrary, because reaching the "retry without inlining" limit on one path does not imply that inlining the function won't be valuable on other paths -- so I hope that we can eventually replace it with more "natural" limits of the analysis scope.
However, the runtime increases are annoying for the users whose project is affected, so I created this quick workaround commit that approximates the "don't inline" blacklist effects of ambiguous loops (where the analyzer doesn't understand the loop condition) without fully reverting the "Don't assume third iteration" commit (to avoid reintroducing the false positives that were eliminated by it).
Investigating this issue was a team effort: I'm grateful to Endre Fülöp (gamesh411) who did the bisection and shared his time measurement setup, and Gábor Tóthvári (tigbr) who helped me in profiling.