Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break into the debugger (if attached) on panics (Windows, Linux, macOS, FreeBSD) #129019

Merged
merged 1 commit into from
Sep 8, 2024

Conversation

kromych
Copy link
Contributor

@kromych kromych commented Aug 12, 2024

The developer experience for panics is to provide the backtrace and
exit the program. When running under debugger, that might be improved
by breaking into the debugger once the code panics thus enabling
the developer to examine the program state at the exact time when
the code panicked.

Let the developer catch the panic in the debugger if it is attached.
If the debugger is not attached, nothing changes. Providing this feature
inside the standard library facilitates better debugging experience.

Validated under Windows, Linux, macOS 14.6, and FreeBSD 13.3..14.1.

@rustbot
Copy link
Collaborator

rustbot commented Aug 12, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @workingjubilee (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 12, 2024
@kromych
Copy link
Contributor Author

kromych commented Aug 12, 2024

r? @wesleywiser

library/std/src/panicking.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@workingjubilee
Copy link
Member

@kromych Please do not both add public API to the standard library and change the internal behavior of unrelated code in the same PR.

@workingjubilee
Copy link
Member

This is a library change, not a compiler one.

r? @workingjubilee

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend you run check builds locally.

library/std/src/os/mod.rs Outdated Show resolved Hide resolved
library/std/src/os/windows/dbg.rs Outdated Show resolved Hide resolved
library/std/src/os/windows/dbg.rs Outdated Show resolved Hide resolved
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 12, 2024
@kromych
Copy link
Contributor Author

kromych commented Aug 12, 2024

@kromych Please do not both add public API to the standard library and change the internal behavior of unrelated code in the same PR.

Thanks for your help! Should I open two PR instead (adding the dbg_breakpoint API and using it for the easier panic debugging) and close this one?

@workingjubilee
Copy link
Member

Why did you ask for wesleywiser's review?

@kromych
Copy link
Contributor Author

kromych commented Aug 12, 2024

Why did you ask for wesleywiser's review?

The change relies on SEH which is handled normally by the compiler, thought there might be some assumption about setting the stack frame up for SEH. This is my first attempt at contribution to the Rust repo, haven't known about the proper process, learning as I go.

@workingjubilee
Copy link
Member

Hm, yes, but why specifically Wesley, the compiler lead that has the least time to review things?

In any case, you're probably thinking of code generation of destructors. Those shouldn't be relevant to your PR because that code was already written a long time ago. Most of the SEH code is in the library, here:

//! mechanism is Structured Exception Handling (SEH). This is quite different

I recommend you just remove the parts that expose this to public API from this PR, as exposing new API is its own process, and code review for this will be exciting enough.

@kromych
Copy link
Contributor Author

kromych commented Aug 12, 2024

Hm, yes, but why specifically Wesley, the compiler lead that has the least time to review things?

Wesley's also on the wg-debugging group: https://blog.rust-lang.org/inside-rust/2022/02/22/compiler-team-ambitions-2022.html#debugging-initiatives-. That looked to me as an opportunity to bring attention to the value of that PR for the debugging experience besides examining the sanity from the compiler perspective.

In any case, you're probably thinking of code generation of destructors. Those shouldn't be relevant to your PR because that code was already written a long time ago. Most of the SEH code is in the library, here:

//! mechanism is Structured Exception Handling (SEH). This is quite different

I recommend you just remove the parts that expose this to public API from this PR, as exposing new API is its own process, and code review for this will be exciting enough.

Appreciated, will do!

@kromych
Copy link
Contributor Author

kromych commented Aug 12, 2024

I strongly recommend you run check builds locally.

Ran that command locally, thanks! There was one issue that I could not explain: if something is used inside the panic handler only, that something is considered a dead code.

@workingjubilee
Copy link
Member

Ah, that makes more sense.

I don't mean to be so nosy but I would like to see this functionality land and thus want to make sure it happens in the way that is smoothest. Generally T-compiler members don't approve library PRs and vice versa. Thus often a PR that tries to do both compiler changes and internal stdlib changes and expose new library API is doomed because of this split responsibility. We aren't unthinking servants of process but it's best not to fudge anything when it's unnecessary and we can simply pipeline things.

More briefly: let's build the bikeshed before we have an argument over how to paint it.

For more information on process in general, you may wish to consult the rustc dev guide and the std dev guide. You will want to consult the former anyways as it explains how to add new tests and you may need a fairly specialized test to make sure "break into debugger" works, as I don't think either the normal UI test suite or debuginfo test suite support that. Our library tests only work for code that doesn't need to do weird stuff with a process.


If we do need special frame-by-frame code generation for this, it should probably not be implemented as including global_asm! but rather be implemented using an intrinsic, which are located hereabouts: https://github.com/rust-lang/rust/blob/91376f416222a238227c84a848d168835ede2cc3/library/core/src/intrinsics.rs

And those are in core, not std. But I don't know that we do. The Rust compiler already has the necessary infrastructure to handle SEH, because that is how unwinding on Windows works. That is, the following C, calling these two functions:

void try_code(void);
void except_code(void);

__declspec(noinline) int try_break_into_debugger()
{
 __try
  {
    try_code();
    return 0;
  }
  __except (1)
  {
    except_code();
    return 1;
  }
  return 0;
}

should be functionally identical to:

use other_crate::{try_code, except_code};
    
if let Err(_) = catch_unwind(|| try_code()) {
    except_code();
};

Intrinsics themselves are actually language features and so require rubberstamps from a different group. Isn't process fun?

@rust-log-analyzer

This comment has been minimized.

@romank-msft
Copy link

Something broke down again in the CI, will try to replicate locally

@workingjubilee
Copy link
Member

...? huh, approving it twice feels weird, I'm gonna...

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Sep 8, 2024
@workingjubilee
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Sep 8, 2024

📌 Commit fc28a2a has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 8, 2024
@bors
Copy link
Contributor

bors commented Sep 8, 2024

⌛ Testing commit fc28a2a with merge 7b18b3e...

@bors
Copy link
Contributor

bors commented Sep 8, 2024

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 7b18b3e to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 8, 2024
@bors bors merged commit 7b18b3e into rust-lang:master Sep 8, 2024
7 checks passed
@rustbot rustbot added this to the 1.83.0 milestone Sep 8, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (7b18b3e): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (secondary 2.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.3% [1.5%, 3.6%] 8
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (primary 0.1%, secondary 0.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.3%] 12
Regressions ❌
(secondary)
0.3% [0.2%, 0.3%] 38
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [0.0%, 0.3%] 12

Bootstrap: 756.659s -> 754.738s (-0.25%)
Artifact size: 341.09 MiB -> 341.16 MiB (0.02%)

@rustbot rustbot removed the perf-regression Performance regression. label Sep 8, 2024
Amanieu pushed a commit to madsmtm/stdarch that referenced this pull request Sep 14, 2024
Tested in the simulator and on the device I had lying around, a 1st
generation iPad Mini (which isn't Aarch64, but shows that the
`sysctlbyname` calls still work even there, even if they return false).

`sysctlbyname` _should_ be safe to use without causing rejections from
the app store, as its usage is documented in:
https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics

Also, the standard library will use these soon anyhow, so this shouldn't
affect the situation:
rust-lang/rust#129019
Amanieu pushed a commit to rust-lang/stdarch that referenced this pull request Sep 14, 2024
Tested in the simulator and on the device I had lying around, a 1st
generation iPad Mini (which isn't Aarch64, but shows that the
`sysctlbyname` calls still work even there, even if they return false).

`sysctlbyname` _should_ be safe to use without causing rejections from
the app store, as its usage is documented in:
https://developer.apple.com/documentation/kernel/1387446-sysctlbyname/determining_instruction_set_characteristics

Also, the standard library will use these soon anyhow, so this shouldn't
affect the situation:
rust-lang/rust#129019
@khuey
Copy link
Contributor

khuey commented Sep 23, 2024

My apologies if this was debated in the 166 comments I didn't read, but I don't think the quality of implementation here is suitable for shipping on Linux, even in nightly. This behavior is triggered even if the panic is ultimately handled via catch_unwind, and on Linux it triggers for any ptracer, even things that aren't interactive debuggers (e.g. strace). The net result is that Rust programs that use panics at all no longer function under tools like strace or rr after this change. I think this should be reverted.

@kromych
Copy link
Contributor Author

kromych commented Sep 24, 2024

My apologies if this was debated in the 166 comments I didn't read, but I don't think the quality of implementation here is suitable for shipping on Linux, even in nightly. This behavior is triggered even if the panic is ultimately handled via catch_unwind, and on Linux it triggers for any ptracer, even things that aren't interactive debuggers (e.g. strace). The net result is that Rust programs that use panics at all no longer function under tools like strace or rr after this change. I think this should be reverted.

In my view, your arguments are enough to revert this for Linux, especially if no one finds that useful at all there. If the latter cannot be known for sure, perhaps wrap the logic in rust_panic in a conditional something like

if let Some(bp) = env::get("RUST_BREAKPOINT_ON_PANIC") {
    if bp == "1"  {
        let _ = breakpoint_if_debugging()
    }
}

i.e. one would need to set an env. variable RUST_BREAKPOINT_ON_PANIC to get this code to run the breakpoint instruction, even if the presence of a debugger (tracer) is detected.

@kromych
Copy link
Contributor Author

kromych commented Sep 24, 2024

This change makes a ptraced/straced program exit with SIG_TRAP when the program panics, and the core dump is generated. Assuming that panicking implies there is no way of continuing computation, that behaviour doesn't look as a deal breaker to me.

That assumption of panic being a catastrophic failure (hence meaning the program exits) is broken by std::panic::catch_unwind. Its usage pattern seems to be

std::panic::catch_unwind(|| /* something that panics */);

so that the program does not exit upon panicking although it - quite weirdly - does print the panic message when using panic!, etc. When the program is run under a ptrace-using tool, this PR might make the program exit instead of continuing if the tool doesn't have the gdb/lldb smarts.

I cannot comment if std::panic::catch_unwind is guaranteed to work under ptrace and not handling SIG_TRAP. It appears that its documentation enumerates various existing footguns with that function. Instead of feature-gating via the env var or reverting the logic for Linux, one might document that behaviour thus adding to the list of precautions of std::panic::catch_unwind one more item.

@workingjubilee
Copy link
Member

workingjubilee commented Sep 24, 2024

@khuey Yeah, I noticed that (after it was merged, and from people discussing this elsewhere) and wondered if that would be a practical issue since the program would often die anyways? But if it's causing problems in rr, I think we should remove the Linux impl.

kromych added a commit to kromych/rust that referenced this pull request Sep 25, 2024
@kromych
Copy link
Contributor Author

kromych commented Sep 25, 2024

Here is the PR #130810

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Sep 25, 2024
Don't trap into the debugger on panics under Linux

This breaks `rr`, see rust-lang#129019 (comment) for the discussion

CC `@khuey` `@workingjubilee`
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Sep 25, 2024
Rollup merge of rust-lang#130810 - kromych:master, r=workingjubilee

Don't trap into the debugger on panics under Linux

This breaks `rr`, see rust-lang#129019 (comment) for the discussion

CC `@khuey` `@workingjubilee`
@kromych
Copy link
Contributor Author

kromych commented Sep 25, 2024

As this is being reverted in #130846, for the folks, who liked this, I've published https://crates.io/crates/dbg_breakpoint:

Breakpoints when the debugger is attached

Set breakpoints with the breakpoint!() macro on many target architectures
and popular OSes like FreeBSD, macOS, iOS, Linux distro's, Windows without
using the nightly toolchain. Break into the debugger with an easy
breakpoint_if_debugging() call, too!

Well, sure, but why?

  • It might be more convinient to add the call to breakpoint_if_debugging from inside
    the comfort of your editor than to remember the incantion in the debugger,
  • Some callsites like lambdas and async routines/coroutines can be tricky to set a
    breakpoint to in the debugger due to name mangling or because the toolchain doesn't
    give them a name that is easily-discovered/human-friendly,
  • Can add this to your #[panic_handler] to break into the debugger on a panic.

This model might be reminiscent of "semihosting" where the execution environment
includes a host or a debugger who's services might be requested by the program.

Here is the example of how one can make use of this: runme.rs.
Do exercise extreme caution when using any of this in the production environment, i.e.
out of the inner development loop. Heisenbugs and crashes might be sighted.

Platform- and target-specific notes follow.

Windows

The library provides breakpoint_if_debugging() and breakpoint_if_debugging_seh()
The latter might be useful to detect the debugger if it is trying to hide its presence
via some cheap tricks.

Linux, macOS and FreeBSD

The debugger detection logic will detect any tracer like strace as the debugger, and
if the tracer isn't able to skip over the breakpoint CPU instruction, the program will
crash. That can be fixed by handling SIGTRAP inside your program.

arm64

brk #imm16 is used for breakpoint on arm64.

Just FYI, the #imm16 value can be inside the Linux kernel 6.1
at the time of writing:

  • 0x004: for installing kprobes
  • 0x005: for installing uprobes
  • 0x006: for kprobe software single-step
  • 0x400 - 0x7ff: kgdb
  • 0x100: for triggering a fault on purpose (reserved)
  • 0x400: for dynamic BRK instruction
  • 0x401: for compile time BRK instruction
  • 0x800: kernel-mode BUG() and WARN() traps
  • 0x9xx: tag-based KASAN trap (allowed values 0x900 - 0x9ff)
  • 0x8xxx: Control-Flow Integrity traps

Here, we're talking the user mode yet the above illustrates the point
that the value supplied after brk influences what to expect.

For __builtin_trap(), gcc produces brk #0x3e8, clang generates brk #1.
This library uses 0xf000 as the debuggers on Windows and macOS skip over the debug
trap automatically in this case by advancing the instruction pointer behind the
curtain.

See also

@jieyouxu jieyouxu added the CI-spurious-fail-msvc CI spurious failure: target env msvc label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-spurious-fail-msvc CI spurious failure: target env msvc merged-by-bors This PR was explicitly merged by bors. O-freebsd Operating system: FreeBSD O-linux Operating system: Linux O-macos Operating system: macOS O-windows Operating system: Windows perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.