-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[Feature] Improved Panic Handling #2927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kaimast
wants to merge
3
commits into
staging
Choose a base branch
from
feat/track-error
base: staging
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
34a44e4 to
cec41dd
Compare
raychu86
reviewed
Sep 17, 2025
raychu86
reviewed
Sep 17, 2025
vicsn
reviewed
Sep 18, 2025
vicsn
reviewed
Sep 18, 2025
vicsn
reviewed
Sep 18, 2025
Collaborator
Is this the main and only usecase for the thread local panic logic of this PR? I'm trying to better understand whether the refactor is worth it at the moment. I thought we printed the error in the VM panic handler. Therefore:
|
ljedrz
reviewed
Sep 22, 2025
ljedrz
reviewed
Sep 22, 2025
ljedrz
reviewed
Sep 22, 2025
1fc8711 to
5b52e96
Compare
56de046 to
ac0550c
Compare
ac0550c to
6fc9ebb
Compare
Collaborator
Author
|
@vicsn I extended the PR description |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a follow-up on #2813.
Context: Handling Fatal Errors in snarkVM
Generally, snarkVM and snarkOS aim to never panic if it is possible to avoid. However, there is always the possibility of bugs that were overlooked or ending up in a state where the code cannot recover safely.
If the code panics, there are two different scenarios:
try_vm_runtime) we can safely log it and continue operating normally. snarkVM will simply treat this as a failed execution.We had several instances where a thread or tokio task panicked, and, while information about the panic was printed to
stdout, there was no mention of it in the logs.While the former is already implemented. The latter was partially introduced by #2813, but this PR adds some important improvement and fixes. The behavior of
try_vm_runtimeremains unchanged by this, albeit its implementation changes slightly.Improved Panic Handling Mechanisms
Panic handling as implemented in #2813 (and also before but to a limited extent) has a major shortcoming: it did not work well with multi-threading. That is because the panic handler is set per-process but, while executing a snarkVM program another thread, unrelated to execution, might panic, and snarkVM may falsely log it as a VM error.
The proposed mechanism in this PR is to store backtraces and error messages in a thread-local variable.
try_vm_runtimeis changed to retrieve the error message from this thread-local variable.Additionally, there is now a
catch_unwindfunction that wraps around the function of the same name from the standard library and returns the error message and backtrace on error. The latter is useful when printing error messages about panics, for example, in snarkOS.Utilities for Task Management
tokioallows propagating panics throughJoinHandles. This PR also provides wrappers around tokio'sspawnandspawn_blockingfunctions that contain the boilerplate for propagating such panics.This replaces logic that before resided in
snarkos-node-bftbut will be useful to other crates.Usage
In snarkVM
snarkVM has one notable instance of spawning a background task, the sequential ops thread (see the changes in #2975). 80c81d8 adds
catch_unwindto that thread to ensure future panics in the storage thread are logged properly.In snarkOS
For snarkOS, the plan is to use
catch_unwindfor important background tasks such as block synchronization. Again, this is to ensure that a panic in block synchronization does not happen silently and block synchronization stops for no apparent reason.ProvableHQ/snarkOS#3874 contains the full set of proposed changes for snarkOS.
Notes
JoinHandles) does not work well (yet). Future improvements to tokio will change this, but we also do not use detached tasks much.