Skip to content

Conversation

mujacica
Copy link
Contributor

@mujacica mujacica commented Sep 9, 2025

  • Add before crash hook to allow modifying other callbacks before crash
  • Hooks needed for game engines to prevent calling internal engine code while handling crashes
  • differs from on_crash_hook by being the first thing to get called when handling a crash (which is necessary to avoid calling SENTRY_X with an internal engine logger for our in-crash-handling log statements).

Copy link

github-actions bot commented Sep 9, 2025

Messages
📖 Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against f53c146

@mujacica mujacica marked this pull request as ready for review September 11, 2025 07:21
Comment on lines +528 to +529
options->before_crash_func(options->before_crash_data);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential bug: The before_crash_func is called from a signal handler before the SDK's signal-safety mechanisms are enabled via sentry__enter_signal_handler(), bypassing existing protections.
  • Description: The new before_crash_func is invoked from within a signal handler context in backends like inproc, breakpad, and crashpad. However, it is called before the SDK's own signal-safety mechanisms are enabled via sentry__enter_signal_handler(). This function is responsible for setting up spinlocks and a special page allocator to ensure subsequent operations are async-signal-safe. By calling the user-provided function before this setup, the SDK bypasses its own protections. This is inconsistent with existing callbacks like on_crash_func, which are called after the safety mechanisms are active. If a user's before_crash_func performs any non-async-signal-safe operations, it will likely lead to a deadlock or memory corruption, preventing the original crash from being reported.

  • Suggested fix: Move the invocation of before_crash_func to be after the call to sentry__enter_signal_handler() in all relevant backends (sentry_backend_inproc.c, sentry_backend_breakpad.cpp, and sentry_backend_crashpad.cpp). This ensures the SDK's signal-safety infrastructure is active before executing user-provided code, consistent with other crash callbacks.
    severity: 0.85, confidence: 0.9

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Collaborator

@supervacuus supervacuus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mujacica. I am sorry to be that guy, because you've already detailed the prose quite a bit, but I am still not clear on the purpose of this hook.

  • What do you mean by "allow modifying other callbacks before crash"?
  • How does this new hook "prevent calling internal engine code while handling crashes" where the on_crash/before_send hooks fail to do so?

If this:

which is necessary to avoid calling SENTRY_X with an internal engine logger for our in-crash-handling log statements

is the core issue (perhaps also addressed in point 2 above), I propose an alternative approach that is orthogonal to the hooks and more broadly applicable:

  • expose an option sentry_options_set_handler_logging_enabled (or whatever), which signals whether logging should be active while the crash handler is active (i.e., while we process a crash)
  • introduce a new (atomic) global in the logger module logging_enabled (or ...)
  • set this global via a sentry__logger_enable()/sentry__logger_disable() first and last thing from every crash handler if the option says the logger should be disabled

There are multiple use cases where you want to prevent a client-provided logger function from being invoked when a crash is in progress. A dynamic logger-enable/disable mechanism could be helpful in other parts of the code.

Of course, if it goes beyond the logging callbacks, that won't suffice.

@mujacica
Copy link
Contributor Author

Hey @supervacuus, thanks for a thorough review!

I agree regarding having dynamic logger enable/disable (and I don't have a problem extending the PR to include it).

However, we also need a way, in downstream SDKs, to adapt the behavior during crash handling. Two examples (that I am working on supporting):

  1. Engine memory allocator is corrupted - any memory allocations during Sentry event processing (Logging, Attachments, basically any memory allocation using engine data types FString etc.) would cause Sentry to crash too
  2. Engine stack overflow - trying to allocate anything further on the stack while processing event would cause Sentry to crash too (not as bad as 1, but still needs different behavior)

Reason why console logs are mentioned here is because the console logs from the native backends always caused secondary crashes, while the other potential issues are a little bit more hidden inside of downstream SDKs (e.g. before_send handling).

If you have an idea on how to support both use-cases (logging vs. rest), I am happy to implement it. A possibility would be to:

  1. Use the proposal you made for enable/disable logging
  2. Ensure that no callbacks are being triggered between start of crash handling and on_crash callback call
  3. Use the on_crash callback to "notify" downstream SDKs about crash being processed

@vaind
Copy link
Collaborator

vaind commented Sep 15, 2025

I also think the logging part could be handled as a configuration, other than the reasons above, also because it's cleaner to know what's happening and that this must be checked first in the handler to configure logging. BTW, in sentry-playstation, I've done something similar WRT to logging - the handler replaces the logger: sentry__logger_set_global({ nullptr, nullptr, SENTRY_LEVEL_ERROR });

The logging changes could be done in a separate PR

@mujacica
Copy link
Contributor Author

I will start working on logging changes in separate PR, and we can continue discussion here on how to handle notifying downstream SDKs about crash handling.

@supervacuus
Copy link
Collaborator

supervacuus commented Sep 15, 2025

Maybe as a premise before my response to your examples:

Having an early hook exposed for all handlers we provide can be a good idea. I would probably call it early or even unsafe-on_crash rather than before crash, to clearly signal that it still happens inside our handlers but without any synchronization with SDK internals, thus making interactions with the SDK unsafe (but for your use-cases, that is the idea since you don't need safety wrt the SDK).

It is also not the same as having a hook into the raw OS-level handlers because a lot of the backend machinery already happened when the handler of the Native SDK runs (for instance, breakpad already gathered the snapshot of the crashed application, and not all threads might be correctly running anymore).

Therefore, I wanted to narrow down the potential uses downstream, where applying changes here makes more sense (as opposed to downstream in a hook that a native provides). Of course, this was also the case because I was not aware of any downstream drivers.

However, we also need a way, in downstream SDKs, to adapt the behavior during crash handling. Two examples (that I am working on supporting):

  1. Engine memory allocator is corrupted - any memory allocations during Sentry event processing (Logging, Attachments, basically any memory allocation using engine data types FString etc.) would cause Sentry to crash too

But the Native SDK itself does not allocate using any specific outside allocator (however, we do switch to a page-allocator when POSIX signal handling is involved). So any FString usage should be confined to either of the two existing crash hooks or the already mentioned logger callback.

So on_crash should be early enough for any rectification, considering logging is disabled during the handler execution, so as not to trigger any calls into the Unreal allocator. Am I getting this right, or am I missing something?

Attachments should be entirely shielded from allocations of the Unreal allocator as well, even when you pass in buffers from engine types, since we copy all incoming buffers into memory that we internally allocated.

It also depends on what the cure for a corrupted engine memory allocator would be and how you identify this. Would you check the crash context and set a flag that prevents the other hooks from relying on any allocation triggering code (or do nothing)? Then I think the on_crash hook, under the premise of not calling into the engine allocator in the code path before, as the logger would, would still be early enough?

  1. Engine stack overflow - trying to allocate anything further on the stack while processing event would cause Sentry to crash too (not as bad as 1, but still needs different behavior)

We already introduced additional handling of stack overflows into the Native SDK (sigaltstack on POSIX systems and Windows stack guarantees). If upstreaming is technically and legally feasible, additional handling is more than welcome in the Native SDK. I am not sure how an additional early hook would prevent any issues in the stack overflow case, though, since

a) it itself would require stack space to execute
b) our handler is called from backend code, which also requires stack space to execute, so at the point your early handler gets invoked, we are relatively late in the handler stack (even if early/first in our callback)

Reason why console logs are mentioned here is because the console logs from the native backends always caused secondary crashes, while the other potential issues are a little bit more hidden inside of downstream SDKs (e.g. before_send handling).

If you have an idea on how to support both use-cases (logging vs. rest), I am happy to implement it. A possibility would be to:

  1. Use the proposal you made for enable/disable logging
  2. Ensure that no callbacks are being triggered between start of crash handling and on_crash callback call

I am not aware of any other callbacks (besides the logger) that would be invoked before the on_crash callback gets invoked. But if there are, then we should handle them similarly to the logging callback.

  1. Use the on_crash callback to "notify" downstream SDKs about crash being processed

That would be its intended use. However, I am very aware that we have some quite intertwined machinery that can result in the issues you are seeing. It is worth it to find a solution for each issue before we introduce a very broad workaround (even if we end up with that in the end).

From my POV, there are three ways to go forward (not at all mutually exclusive):

  • expose the hook (with sufficient documentation in the header docs that highlights the expected interaction profile, lack of guarantees, and limitations mentioned above). Since this is a public API, it should also clarify to users that they most likely want to use on_crash.
  • expose options to disable the logger once the crash handler is active (+ identify other active hooks that could conflict during the handler, and thus could be disabled, but I am unaware of any)
  • upstream specific infrastructure that makes sense for a broader audience (an additional mechanism for handling stack usage during crash, configuration of external allocators, opening the crash allocator).

@supervacuus
Copy link
Collaborator

I also think the logging part could be handled as a configuration, other than the reasons above, also because it's cleaner to know what's happening and that this must be checked first in the handler to configure logging. BTW, in sentry-playstation, I've done something similar WRT to logging - the handler replaces the logger: sentry__logger_set_global({ nullptr, nullptr, SENTRY_LEVEL_ERROR });

That is also a perfectly valid solution to disable the logger.

The logging changes could be done in a separate PR

Yes, the logging changes are orthogonal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants