-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[asan][test] Disable asan_lsan_deadlock test #141145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-compiler-rt-sanitizer Author: Paul Kirth (ilovepi) ChangesWhile the current test checks that a deadlock doesn't occur, it also Full diff: https://github.com/llvm/llvm-project/pull/141145.diff 1 Files Affected:
diff --git a/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp b/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
index b856b12325bf6..d9a6f4c07f698 100644
--- a/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
+++ b/compiler-rt/test/asan/TestCases/asan_lsan_deadlock.cpp
@@ -55,7 +55,7 @@ int main(int argc, char **argv) {
* 1. ASan's thread registry lock (B) during the reporting
* 2. dl_iterate_phdr lock (A) during symbolization
*/
- // CHECK: SUMMARY: AddressSanitizer: stack-buffer-overflow
+ // CHECK: AddressSanitizer: stack-buffer-overflow
arr[argc] = 1; // Deliberate OOB access
});
|
Before landing I do want to get confirmation that the SUMMARY bit wasn't intended to be proof that the deadlock can't happen. |
I am pretty sure when it deadlocks it will just time out. |
That's also my take, but wanted to confirm. :) |
CC @thurstond (as build watcher for sanitizers) |
@ilovepi Thanks for your patch. I added this test case to verify a deadlock patch in PR #131756. Unfortunately, the "Summary:" prefix is indeed used to prove that the deadlock has been fixed.
If the deadlock is fixed, the "Summary:" will be printed after the stack trace, as follows:
Therefore, your current fix cannot tell whether the deadlock does occur. I'm also confused as to why such a simple test case would cause flakiness. However, I managed to fail to reproduce it locally :(, and due to a lack of information, I don't know what the cause of this flakiness is (does the limited resources of the build bot cause ASan bug reporting to time out? or is there another deadlock path?). |
@Camsyn I'm not sure I follow. The failures in our bots don't appear to be due to a timeout. I see some of those on the public build bot, but not on our CI https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8714404311534072321/overview (logs https://logs.chromium.org/logs/fuchsia/buildbucket/cr-buildbucket/8714404311534072321/+/u/clang/test/stdout). I also wonder about a few things after looking at this test.
The thread that is doing the OOB access probably needs to be parked via std::latch until you're ready to start the critical sections. Otherwise I don't see what prevents the OOB thread from immediately completing in which case, you won't exercise the deadlock scenario. In fact, I wonder if the test just usually doesn't test the deadlock scenario because the launched thread completes quickly and then the summary is reported per expectation. In any case, I don't think this test is giving a useful signal, so I guess we should disable it until it can be fixed. |
If all went well, then "SUMMARY" should have been printed after "[STACK TRACE]", which is the expected behavior of ASan's reporting -- if no deadlock occurs, ASan should give a complete bug report in this test case. However, your CI report does not even have "[STACK TRACE]" in. I think this just means that the deadlock was triggered by the test case and was killed by the
As mentioned in PR #131756, we cannot finely control lock acquisition order under ASan ( Yes, even without the relevant fix, the test case does not reliably reproduce the deadlock (but when running on an idle system, it is NEARLY stable to trigger the bug). Anyway, for your instance, if the new spawn thread ends early before we enter Overall, I think this test case clearly exposes the true deadlock flaw of ASan/LSan, and am not sure if it should be disabled. |
if the deadlock still happens then this test only passes by accident. it either needs to deterministically fail or pass to be enabled. if it the deadlock always triggered, then we could mark it xfail until it was fixed. as is it gives no signal, and in fact seems to give a false sense that the issue is fixed. |
09b9d1e
to
561c4ba
Compare
561c4ba
to
e88d9f3
Compare
I'm going to go ahead and merge this, since the failure just hit our CI again https://ci.chromium.org/ui/p/fuchsia/builders/toolchain.ci/clang-linux-x64/b8714055276410139825/overview. @Camsyn this test will still be in tree, so when we get the deadlock issue fixed (or we figure out a deterministic way to trigger the deadlock on every run) this can be re-enabled. |
While the current test exercised the deadlock behavior prior to #131756,
deadlock still can occur intermittently. Since this results in test
flakes in CI, we disable this test until the locking behavior can be
fixed in the runtime. See #140646 for details.