-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[lldb] Fix Linux core file tests hanging on Windows #142143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
After llvm#141670, TestLinuxCore.py was timing out on our Windows on Arm bot. Non-Linux core files were ok, as were Linux core files unless it was ppc64le, riscv64 or loongarch. I eventually noticed that it was attempting to create PlatformLinux many times before trying PlatformAndroid, PlatformMac etc., which it should never need to do. The tests passed on a Linux host too, to add to the mystery. Turns out, all I needed to do was mark those architectures as supported in the PlatformLinux constructor. If they're not listed there we get stuck here: ``` // Wait for a stopped event since we just posted one above... printf("waiting for process to stop...\n"); lldb::EventSP event_sp; StateType state = WaitForProcessToStop(std::nullopt, &event_sp, true, listener_sp, nullptr, true, SelectMostRelevantFrame); printf("process stopped\n"); ``` Waiting for a stop event that never comes, because it appears we try to treat the core as a real process? ``` DynamicLoaderPOSIXDYLD::virtual DynamicLoaderPOSIXDYLD::DidAttach pid 28147 executable '<null executable>', load_offset 0xffffffffffffffff <...> Process::ShouldBroadcastEvent (000002ABC43FF4A0) Restarting process from state: stopped Process::PrivateResume() m_stop_id = 1, public state: unloaded private state: stopped Process::PrivateResume() got an error "error: elf-core does not support resuming processes". Process::ShouldBroadcastEvent (000002ABC43FF4A0) => new state: stopped, last broadcast state: invalid - NO ``` Some sort of actionable feedback here would be nice, but all I care about for now is that the tests run again. I have not added riscv32 as that appears to only be supported for Darwin at the moment (though I expect someone will get burned by this when it is). I think debug on these architectures worked if they were also the host arch, if someone tried to remote debug them, I think it would fail.
@llvm/pr-subscribers-lldb Author: David Spickett (DavidSpickett) ChangesAfter #141670, TestLinuxCore.py was timing out on our Windows on Arm bot. Non-Linux core files were ok, as were Linux core files unless it was ppc64le, riscv64 or loongarch. I eventually noticed that it was attempting to create PlatformLinux many times before trying PlatformAndroid, PlatformMac etc., which it should never need to do. The tests passed on a Linux host too, to add to the mystery. Turns out, all I needed to do was mark those architectures as supported in the PlatformLinux constructor. If they're not listed there we get stuck here:
Waiting for a stop event that never comes, because it appears we try to treat the core as a real process?
Some actionable feedback here would be nice, but all I care about for now is that the tests run again. I have not added riscv32 as that appears to only be supported for Darwin at the moment (I expect someone will get burned by this when it is). I think debug on these architectures worked if they were also the host arch, if someone tried to remote debug them, I think it would have failed. Full diff: https://github.com/llvm/llvm-project/pull/142143.diff 1 Files Affected:
diff --git a/lldb/source/Plugins/Platform/Linux/PlatformLinux.cpp b/lldb/source/Plugins/Platform/Linux/PlatformLinux.cpp
index dd6490c7141e5..269105208a87a 100644
--- a/lldb/source/Plugins/Platform/Linux/PlatformLinux.cpp
+++ b/lldb/source/Plugins/Platform/Linux/PlatformLinux.cpp
@@ -138,7 +138,9 @@ PlatformLinux::PlatformLinux(bool is_host)
{llvm::Triple::x86_64, llvm::Triple::x86, llvm::Triple::arm,
llvm::Triple::aarch64, llvm::Triple::mips64, llvm::Triple::mips64,
llvm::Triple::hexagon, llvm::Triple::mips, llvm::Triple::mips64el,
- llvm::Triple::mipsel, llvm::Triple::msp430, llvm::Triple::systemz},
+ llvm::Triple::mipsel, llvm::Triple::msp430, llvm::Triple::systemz,
+ llvm::Triple::loongarch64, llvm::Triple::ppc64le,
+ llvm::Triple::riscv64},
llvm::Triple::Linux);
}
}
|
Hey @DavidSpickett thanks for fixing the test. I think you fixed it the right way, but I wanted to share my understanding of how this probably happened In ProcessElfCore, we check the siginfo status of all the threads
So in our case, I've changed this behavior, where Then in ThreadElfCore I think we'd fail to calculate stop info because we don't have a valid SIGNO. I think a few fixes here: Currently, we set the stop reason even for This same infinite loop also can happen for Minidump, and I haven't had time to fix it. I think in general we can wrap all of this into a nice postmortem thread and have the postmortem thread know it should always stop, but optionally get the signal description. CC: @labath |
This PR is correct. It would be nice to avoid this resume-core-file footgun, but I don't have an opinion on the proposed fix. |
After llvm#141670, TestLinuxCore.py was timing out on our Windows on Arm bot. Non-Linux core files were ok, as were Linux core files unless it was ppc64le, riscv64 or loongarch. I eventually noticed that it was attempting to create PlatformLinux many times before trying PlatformAndroid, PlatformMac etc., which it should never need to do. The tests passed on a Linux host too, to add to the mystery. Turns out, all I needed to do was mark those architectures as supported in the PlatformLinux constructor. If they're not listed there we get stuck here: ``` // Wait for a stopped event since we just posted one above... printf("waiting for process to stop...\n"); lldb::EventSP event_sp; StateType state = WaitForProcessToStop(std::nullopt, &event_sp, true, listener_sp, nullptr, true, SelectMostRelevantFrame); printf("process stopped\n"); ``` Waiting for a stop event that never comes, because it appears we try to treat the core as a real process? ``` DynamicLoaderPOSIXDYLD::virtual DynamicLoaderPOSIXDYLD::DidAttach pid 28147 executable '<null executable>', load_offset 0xffffffffffffffff <...> Process::ShouldBroadcastEvent (000002ABC43FF4A0) Restarting process from state: stopped Process::PrivateResume() m_stop_id = 1, public state: unloaded private state: stopped Process::PrivateResume() got an error "error: elf-core does not support resuming processes". Process::ShouldBroadcastEvent (000002ABC43FF4A0) => new state: stopped, last broadcast state: invalid - NO ``` Some actionable feedback here would be nice, but all I care about for now is that the tests run again. I have not added riscv32 as that appears to only be supported for Darwin at the moment (I expect someone will get burned by this when it is). I think debug on these architectures worked if they were also the host arch, if someone tried to remote debug them, I think it would have failed.
After #141670, TestLinuxCore.py was timing out on our Windows on Arm bot.
Non-Linux core files were ok, as were Linux core files unless it was ppc64le, riscv64 or loongarch.
I eventually noticed that it was attempting to create PlatformLinux many times before trying PlatformAndroid, PlatformMac etc., which it should never need to do.
The tests passed on a Linux host too, to add to the mystery.
Turns out, all I needed to do was mark those architectures as supported in the PlatformLinux constructor. If they're not listed there we get stuck here:
Waiting for a stop event that never comes, because it appears we try to treat the core as a real process?
Some actionable feedback here would be nice, but all I care about for now is that the tests run again.
I have not added riscv32 as that appears to only be supported for Darwin at the moment (I expect someone will get burned by this when it is).
I think debug on these architectures worked if they were also the host arch, if someone tried to remote debug them, I think it would have failed.