-
-
Notifications
You must be signed in to change notification settings - Fork 192
feat: Add before crash hook to allow modifying other callbacks before crash #1366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
options->before_crash_func(options->before_crash_data); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug: The before_crash_func
is called from a signal handler before the SDK's signal-safety mechanisms are enabled via sentry__enter_signal_handler()
, bypassing existing protections.
-
Description: The new
before_crash_func
is invoked from within a signal handler context in backends likeinproc
,breakpad
, andcrashpad
. However, it is called before the SDK's own signal-safety mechanisms are enabled viasentry__enter_signal_handler()
. This function is responsible for setting up spinlocks and a special page allocator to ensure subsequent operations are async-signal-safe. By calling the user-provided function before this setup, the SDK bypasses its own protections. This is inconsistent with existing callbacks likeon_crash_func
, which are called after the safety mechanisms are active. If a user'sbefore_crash_func
performs any non-async-signal-safe operations, it will likely lead to a deadlock or memory corruption, preventing the original crash from being reported. -
Suggested fix: Move the invocation of
before_crash_func
to be after the call tosentry__enter_signal_handler()
in all relevant backends (sentry_backend_inproc.c
,sentry_backend_breakpad.cpp
, andsentry_backend_crashpad.cpp
). This ensures the SDK's signal-safety infrastructure is active before executing user-provided code, consistent with other crash callbacks.
severity: 0.85, confidence: 0.9
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mujacica. I am sorry to be that guy, because you've already detailed the prose quite a bit, but I am still not clear on the purpose of this hook.
- What do you mean by "allow modifying other callbacks before crash"?
- How does this new hook "prevent calling internal engine code while handling crashes" where the
on_crash
/before_send
hooks fail to do so?
If this:
which is necessary to avoid calling SENTRY_X with an internal engine logger for our in-crash-handling log statements
is the core issue (perhaps also addressed in point 2 above), I propose an alternative approach that is orthogonal to the hooks and more broadly applicable:
- expose an option
sentry_options_set_handler_logging_enabled
(or whatever), which signals whether logging should be active while the crash handler is active (i.e., while we process a crash) - introduce a new (atomic) global in the logger module
logging_enabled
(or ...) - set this global via a
sentry__logger_enable()
/sentry__logger_disable()
first and last thing from every crash handler if the option says the logger should be disabled
There are multiple use cases where you want to prevent a client-provided logger function from being invoked when a crash is in progress. A dynamic logger-enable/disable mechanism could be helpful in other parts of the code.
Of course, if it goes beyond the logging callbacks, that won't suffice.
Hey @supervacuus, thanks for a thorough review! I agree regarding having dynamic logger enable/disable (and I don't have a problem extending the PR to include it). However, we also need a way, in downstream SDKs, to adapt the behavior during crash handling. Two examples (that I am working on supporting):
Reason why console logs are mentioned here is because the console logs from the native backends always caused secondary crashes, while the other potential issues are a little bit more hidden inside of downstream SDKs (e.g. before_send handling). If you have an idea on how to support both use-cases (logging vs. rest), I am happy to implement it. A possibility would be to:
|
I also think the logging part could be handled as a configuration, other than the reasons above, also because it's cleaner to know what's happening and that this must be checked first in the handler to configure logging. BTW, in sentry-playstation, I've done something similar WRT to logging - the handler replaces the logger: The logging changes could be done in a separate PR |
I will start working on logging changes in separate PR, and we can continue discussion here on how to handle notifying downstream SDKs about crash handling. |
Maybe as a premise before my response to your examples: Having an early hook exposed for all handlers we provide can be a good idea. I would probably call it early or even unsafe- It is also not the same as having a hook into the raw OS-level handlers because a lot of the backend machinery already happened when the handler of the Native SDK runs (for instance, Therefore, I wanted to narrow down the potential uses downstream, where applying changes here makes more sense (as opposed to downstream in a hook that a native provides). Of course, this was also the case because I was not aware of any downstream drivers.
But the Native SDK itself does not allocate using any specific outside allocator (however, we do switch to a page-allocator when POSIX signal handling is involved). So any So Attachments should be entirely shielded from allocations of the Unreal allocator as well, even when you pass in buffers from engine types, since we copy all incoming buffers into memory that we internally allocated. It also depends on what the cure for a corrupted engine memory allocator would be and how you identify this. Would you check the crash context and set a flag that prevents the other hooks from relying on any allocation triggering code (or do nothing)? Then I think the
We already introduced additional handling of stack overflows into the Native SDK ( a) it itself would require stack space to execute
I am not aware of any other callbacks (besides the logger) that would be invoked before the
That would be its intended use. However, I am very aware that we have some quite intertwined machinery that can result in the issues you are seeing. It is worth it to find a solution for each issue before we introduce a very broad workaround (even if we end up with that in the end). From my POV, there are three ways to go forward (not at all mutually exclusive):
|
That is also a perfectly valid solution to disable the logger.
Yes, the logging changes are orthogonal |
on_crash_hook
by being the first thing to get called when handling a crash (which is necessary to avoid calling SENTRY_X with an internal engine logger for our in-crash-handling log statements).