-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Unify and simplify command and system error handling #18351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify and simplify command and system error handling #18351
Conversation
You added a new example but didn't add metadata for it. Please update the root Cargo.toml file. |
Re. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good direction to move error handling towards. There's definitely room for iteration, but this is the most flexible and most approachable form of error handling thus far. For example, it may be nice to have fallible plugins (e.g., Plugin::build
could return a Result
), etc. in the future.
Some minor typos and a suggestion to use bevy_platform_support::sync::OnceLock
for no_std
compatibility reasons. I do also raise the idea of making the OnceLock
itself private and just having a function to initialize the handler. While I think that's a nicer API, I don't see it as a blocker.
/// # Warning | ||
/// | ||
/// As this can *never* be overwritten, library code should never set this value. | ||
pub static GLOBAL_ERROR_HANDLER: OnceLock<fn(BevyError, ErrorContext)> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fact that this is a OnceLock
should be a hidden implementation detail users don't interact with, allowing us to experiment with the implementation without breaking user code needlessly. Perhaps exposing a simple set_error_handler(...)
function to go with the default_error_handler()
function that is already added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other direction we can go is to remove default_error_handler completely. I don't think it adds a ton without the feature flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to cut default_error_handler: it doesn't make this more misuse resistant, and end users generally won't be creating custom error handling code so the convenience argument doesn't hold water for me either.
pub last_run: Tick, | ||
/// Context for a [`BevyError`] to aid in debugging. | ||
#[derive(Debug, PartialEq, Eq, Clone)] | ||
pub enum ErrorContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future PR: Experimenting with this context type will be very helpful in the future. For example, systems and observers could include the SystemId
, allowing a more rich error handler to have differing behaviour based on the system that failed, but entirely up to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I think that sticking complex app-specific error handling in the global error handler is likely to be a bit of a mess. Instead, users should be trying to handle errors as close to where they're generated as possible, and we should provide them the tools required to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Alice here: complex conditional logic in the default handler will probably be an anti-pattern. We can expect users to want to have complex error handling for their code but much less precision is possible for engine code or ecosystem code. For code you write yourself, there's probably better ways to handle errors (like piping). What the default handler gives is consistency: you don't have to worry about one ecosystem developer choosing to panic and another choosing to silently fail without logging.
That said, we should still probably let users do it. Providing the system ID can't hurt. As an aside, I'd probably like to see the source-location-tracking exposed here. But all of that is firmly in the category of follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agreed on source location tracking, and no objection to exposing system id etc. Seems handy for tooling! Just slap on some "please don't omni-error handling" advice in the docs :)
Co-authored-by: Zachary Harrold <zac@harrold.com.au>
Co-authored-by: Zachary Harrold <zac@harrold.com.au>
You added a new example but didn't add metadata for it. Please update the root Cargo.toml file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First code review from me in a while, sorry if I'm a bit rusty. I like seeing all the red, and I think what you've ended up on is really simple and clear. I don't really have any faults with the implementation, but I do still have some questions.
Do you see a clear way to allow commands to retain &mut World access...
The issue is not that we don't have world-access when handling errors in commands or systems, right? The issue is observers. Is there a reason we can't go with something like the following:
fn error_handler(error: BevyError, ctx: ErrorContext, world: DeferredWorld)
Commands and systems appear to have &mut World
access when they call GLOBAL_ERROR_HANDLER.get_or_init(|| panic)
, which means it should be safe to transmute into UnsafeWorldCell
and DeferredWorld
, no? And observers already have a deferred world they can use. I'm guess I am probably missing something, but it seems like this would be a way to preserve access to commands.
I've removed the feature gate on the default_error_handler...
I think this is probably the right call, but we should benchmark as it could have perf implications. Unfortunately I do think we should wait to merge this until we get either positive benchmarks or hear Cart's stance, which is the only reason I'm marking this as changes requested.
OnceLock is in std...
The once_cell
crate has no_std
support via critical_section
. It would also let us swap out the handler at runtime. Perhaps we should consider using it instead, although that can be follow-up IMO.
Do you have ideas for more automated tests for this collection of features?
Honestly, I think we've landed on a pretty simple implementation. There doesn't seem to be much room for logic errors. Performance is the major concern.
/// # Warning | ||
/// | ||
/// As this can *never* be overwritten, library code should never set this value. | ||
pub static GLOBAL_ERROR_HANDLER: OnceLock<fn(BevyError, ErrorContext)> = OnceLock::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on hiding this from users and having a App::set_error_handler
fn which can be called multiple times (overwriting the previous value), and then which actually sets this in App::start
? We could even make this private and expose it only through a get_error_handler()
function.
I think this might be closer to what bevy users expect from our API, and would eliminate the problems caused by calling it conflicting handlers or trying to change it while the app is running. Or IDK maybe we want to be able to swap it out at runtime for the editor, in which case a OnceLock isn't the right primitive either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work for standalone bevy_ecs users: they don't have access to App. We also really want this to be truly global, not set per SubApp.
That said, I am down to make this a bit more opaque to give us flexibility and a nicer API. Let me see about newtyping it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case I think the current is probably fine TBH. The API works, it was just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work, but I don't like it! 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could always provide a wrapper as part of App
that hides the less nice ECS implementation from most users. But keep the current thing (or something like it) in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO there should be one way to set this: no need to complicate things.
Originally yes. I'm increasingly convinced that allowing &mut World access in the global handler is a bit of a trap though. Without a really good motivating use case I'm worried that providing that will encourage omni-handlers. However, I'd really like it in That can easily wait for followup though.
Valid. I can try and spend some time benching this. |
…' into from-the-other-direction
This reverts commit b20dc8e.
@NthTensor I'm seeing in the ballpark of 10% perf regressions on some of the command benchmarks with the configurable error handling enabled. I'm not 100% convinced that's real, but that's enough to sell me on feature-flagging this again for this PR. |
You added a new feature but didn't add a description for it. Please update the root Cargo.toml file. |
/// The default error handler. This defaults to [`panic()`], | ||
/// but if set, the [`GLOBAL_ERROR_HANDLER`] will be used instead, enabling error handler customization. | ||
/// The `configurable_error_handler` feature must be enabled to change this from the panicking default behavior, | ||
/// as there may be runtime overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want to #[inline]
this instead of marking it #[cold]
? It's the performance of the branch that dosn't make this call that we have to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was Cart's original design 🤷🏽♀️ I think that's reasonable, but I don't feel like running more benches here right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this can happen during the RC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's in a good state for merge now, especially given that we can still remove the feature flag during the RC if we change our minds.
I wish cargo's feature overrides were easier so we could justify this being the default. I also wonder if using cold
might move the performance needle at all. Either way, I very much want this in for the first RC.
Merging: let's see if I managed to fix CI finally... |
cargo run -p build-templated-pages -- update features
# Objective - ECS error handling is a lovely flagship feature for Bevy 0.16, all in the name of reducing panics and encouraging better error handling (#14275). - Currently though, command and system error handling are completely disjoint and use different mechanisms. - Additionally, there's a number of distinct ways to set the default/fallback/global error handler that have limited value. As far as I can tell, this will be cfg flagged to toggle between dev and production builds in 99.9% of cases, with no real value in more granular settings or helpers. - Fixes #17272 ## Solution - Standardize error handling on the OnceLock global error mechanisms ironed out in #17215 - As discussed there, there are serious performance concerns there, especially for commands - I also think this is a better fit for the use cases, as it's truly global - Move from `SystemErrorContext` to a more general purpose `ErrorContext`, which can handle observers and commands more clearly - Cut the superfluous setter methods on `App` and `SubApp` - Rename the limited (and unhelpful) `fallible_systems` example to `error_handling`, and add an example of command error handling ## Testing Ran the `error_handling` example. ## Notes for reviewers - Do you see a clear way to allow commands to retain &mut World access in the per-command custom error handlers? IMO that's a key feature here (allowing the ad-hoc creation of custom commands), but I'm not sure how to get there without exploding complexity. - I've removed the feature gate on the default_error_handler: contrary to @cart's opinion in #17215 I think that virtually all apps will want to use this. Can you think of a category of app that a) is extremely performance sensitive b) is fine with shipping to production with the panic error handler? If so, I can try to gather performance numbers and/or reintroduce the feature flag. UPDATE: see benches at the end of this message. - ~~`OnceLock` is in `std`: @bushrat011899 what should we do here?~~ - Do you have ideas for more automated tests for this collection of features? ## Benchmarks I checked the impact of the feature flag introduced: benchmarks might show regressions. This bears more investigation. I'm still skeptical that there are users who are well-served by a fast always panicking approach, but I'm going to re-add the feature flag here to avoid stalling this out.  --------- Co-authored-by: Zachary Harrold <zac@harrold.com.au>
…ia the GLOBAL_ERROR_HANDLER (#18454) # Objective There are two related problems here: 1. Users should be able to change the fallback behavior of *all* ECS-based errors in their application by setting the `GLOBAL_ERROR_HANDLER`. See #18351 for earlier work in this vein. 2. The existing solution (#15500) for customizing this behavior is high on boilerplate, not global and adds a great deal of complexity. The consensus is that the default behavior when a parameter fails validation should be set based on the kind of system parameter in question: `Single` / `Populated` should silently skip the system, but `Res` should panic. Setting this behavior at the system level is a bandaid that makes getting to that ideal behavior more painful, and can mask real failures (if a resource is missing but you've ignored a system to make the Single stop panicking you're going to have a bad day). ## Solution I've removed the existing `ParamWarnPolicy`-based configuration, and wired up the `GLOBAL_ERROR_HANDLER`/`default_error_handler` to the various schedule executors to properly plumb through errors . Additionally, I've done a small cleanup pass on the corresponding example. ## Testing I've run the `fallible_params` example, with both the default and a custom global error handler. The former panics (as expected), and the latter spams the error console with warnings 🥲 ## Questions for reviewers 1. Currently, failed system param validation will result in endless console spam. Do you want me to implement a solution for warn_once-style debouncing somehow? 2. Currently, the error reporting for failed system param validation is very limited: all we get is that a system param failed validation and the name of the system. Do you want me to implement improved error reporting by bubbling up errors in this PR? 3. There is broad consensus that the default behavior for failed system param validation should be set on a per-system param basis. Would you like me to implement that in this PR? My gut instinct is that we absolutely want to solve 2 and 3, but it will be much easier to do that work (and review it) if we split the PRs apart. ## Migration Guide `ParamWarnPolicy` and the `WithParamWarnPolicy` have been removed completely. Failures during system param validation are now handled via the `GLOBAL_ERROR_HANDLER`: please see the `bevy_ecs::error` module docs for more information. --------- Co-authored-by: MiniaczQ <xnetroidpl@gmail.com>
…ia the GLOBAL_ERROR_HANDLER (#18454) There are two related problems here: 1. Users should be able to change the fallback behavior of *all* ECS-based errors in their application by setting the `GLOBAL_ERROR_HANDLER`. See #18351 for earlier work in this vein. 2. The existing solution (#15500) for customizing this behavior is high on boilerplate, not global and adds a great deal of complexity. The consensus is that the default behavior when a parameter fails validation should be set based on the kind of system parameter in question: `Single` / `Populated` should silently skip the system, but `Res` should panic. Setting this behavior at the system level is a bandaid that makes getting to that ideal behavior more painful, and can mask real failures (if a resource is missing but you've ignored a system to make the Single stop panicking you're going to have a bad day). I've removed the existing `ParamWarnPolicy`-based configuration, and wired up the `GLOBAL_ERROR_HANDLER`/`default_error_handler` to the various schedule executors to properly plumb through errors . Additionally, I've done a small cleanup pass on the corresponding example. I've run the `fallible_params` example, with both the default and a custom global error handler. The former panics (as expected), and the latter spams the error console with warnings 🥲 1. Currently, failed system param validation will result in endless console spam. Do you want me to implement a solution for warn_once-style debouncing somehow? 2. Currently, the error reporting for failed system param validation is very limited: all we get is that a system param failed validation and the name of the system. Do you want me to implement improved error reporting by bubbling up errors in this PR? 3. There is broad consensus that the default behavior for failed system param validation should be set on a per-system param basis. Would you like me to implement that in this PR? My gut instinct is that we absolutely want to solve 2 and 3, but it will be much easier to do that work (and review it) if we split the PRs apart. `ParamWarnPolicy` and the `WithParamWarnPolicy` have been removed completely. Failures during system param validation are now handled via the `GLOBAL_ERROR_HANDLER`: please see the `bevy_ecs::error` module docs for more information. --------- Co-authored-by: MiniaczQ <xnetroidpl@gmail.com>
Objective
Solution
SystemErrorContext
to a more general purposeErrorContext
, which can handle observers and commands more clearlyApp
andSubApp
fallible_systems
example toerror_handling
, and add an example of command error handlingTesting
Ran the
error_handling
example.Notes for reviewers
OnceLock
is instd
: @bushrat011899 what should we do here?Benchmarks
I checked the impact of the feature flag introduced: benchmarks might show regressions. This bears more investigation. I'm still skeptical that there are users who are well-served by a fast always panicking approach, but I'm going to re-add the feature flag here to avoid stalling this out.