-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception stacks #28878
Exception stacks #28878
Conversation
I won't comment on the technical aspects of the solution but I like it conceptually! And I'm happy to see those issues resolved, they made error handling in Dispatcher and stacktrace generation in Memento tricky business. |
@iamed2 Hah yes I originally came across this when working on logging. To add to the commentary above, One of my biggest difficulties with this patch was the need to GC root objects in the bt buffer, and also GC the buffer itself in the case that it's attached to a Actually I hope to delete most of the GC changes which were done here as they seem like overkill. My current best idea is to use the new efficient array-of-small- Is this a sound idea? Advice on how to improve the memory management in this patch would be particularly welcome. |
af0afe3
to
f76eb32
Compare
Ok, CI failures are because I've got something wrong in GC land and the debug build is failing on an assertion. My best guess is that I've broken whatever invariants Anyway, the release build does "work" so the branch is in a state where the curious can play with it until the real problem is fixed. |
Looking into alternative storage for backtrace buffers, the |
Thanks for tackling this; great work! I think this is basically the right thing. I'll review in detail at some point. A bit of custom code for GC marking doesn't seem so bad to me. |
aca0df5
to
9d51768
Compare
src/gc.c
Outdated
@@ -1901,7 +1901,7 @@ excstack: { | |||
size_t i = stackitr->i; | |||
while (itr > 0) { | |||
size_t bt_size = jl_exc_stack_bt_size(exc_stack, itr); | |||
uint64_t *bt_data = jl_exc_stack_bt_data(exc_stack, itr); | |||
uintptr_t *bt_data = jl_exc_stack_bt_data(exc_stack, itr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, this was a hilariously stupid bug which cost a lot of time due to randomly busted GC on 32 bit, I don't know what I was thinking using uint64_t
here :-/
It would have been caught early if we were able to use -Werror
on CI builds but I suspect that is quite disruptive in other ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been caught early if we were able to use
-Werror
on CI builds but I suspect that is quite disruptive in other ways.
Please, we should absolutely do this. We've had a creeping increase of compiler warnings in the past month or so.
Hmm, just encountered a spurious test failure on 32 bit due to #27788. |
b853caf
to
bcc7bab
Compare
Tests have finally passed at least once on all systems and I've done a fair amount of cleanup. Should be good for a closer look. I've exported |
Ok, tests have passed after restarting the appveyor x86_64 build three times. There is an intermittent failure in a single one of the spawn tests on that platform related to deleting a directory, see https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.30766/job/wnxk1pxa27xte6un for example. This seems to be the test from #26687 coming back again after #28107 was merged. The fact that it's recurring here more frequently could be related to the changes in this PR. |
ebf472a
to
66e934c
Compare
I just rebased this again to clean up some recent conflicts, and to consolidate a lot of WIP commits into a small number of consistent changes. This reminded me of an interesting side effect of this patch which I should point out — code generated by |
The logic for this was repeated in three quite different ways, in JL_CATCH, interpreter and codegen but putting it into the runtime in jl_eh_restore_state should be sufficient. Also move jl_eh_restore_state out of main header - seems unnecessary to expose such implementation detail, and inconsistent with other eh functions.
* A new lowered expression head :pop_exc is introduced and emitted in any location where a catch block exits normally (either by stepping out, or using return, break, goto). The semantics of :pop_exc are to pop the exception stack back to the state of the associated enter. * Make Expr(:enter) return a token which may be consumed by :pop_exc, thereby allowing the interpreter and codegen to know which :enter state should be used to pop the exception stack. I tried various alternatives for this association, but this was by far the nicest in terms of non-disruptive integration into the SSAIR processing code, and supporting both the interpreter and codegen.
Runtime ------- * An exception stack type, `jl_exc_stack_t` and associated manipulation functions has been added. Conceptually this stores a stack of (exception,backtrace) pairs. It's stored in a contiguous buffer so we can avoid reallocating when throwing and catching the same exception or set of exceptions repeatedly. Space for the exception and backtrace is allocated on demand inside `throw_internal`, after changing GC mode. Several variations were tried for allocating this sgorage, including allocating up front with malloc on the thread local state and copying during task switching. Keeping everything on the task seemed the simplest as it involves the least copying and keeps track of the exception stack buffers in a unified way. * The exception in transit is no longer a single pointer in the thread local storage. It's stored in `ptls->current_task->exc_stack` instead, along with associated backtrace. * `jl_current_exception()` is now the method to retreive the current exception from within a `JL_CATCH` dynamic context. * Several places where manual restoration of the exception and backtrace was done are no longer necessary because the stack system handles these automatically. This code is removed, including `jl_apply_with_saved_exception_state`. * `jl_eh_restore_state` has become non-inline. It seemed good to get this out of the public header. * `jl_sig_throw` is now used with `jl_throw_in_ctx` from signal handlers, to make the special circumstances clear and to avoid conflation with rethrow which now simply rethrows the existing exception stack. * Make rethrow outside a catch block into an error. * Use `ptls->previous_exception` to support `jl_exception_occurred` in embedding API. * finally lowering no longer includes a `foreigncall`, so expressions using finally can now be interpreted. Interpreter / Codegen --------------------- Mostly small changes here, to track the `:enter` and `:pop_exc` association with the SSAValue token. The token SSAValue slot is ideal here for storing the state of the exception stack at the `:enter`. GC -- Integrate exception and raw backtrace scanning as a special case into GC mark loop. This is necessary now that Task objects can refer to exception and backtrace data in raw form.
66e934c
to
5333b07
Compare
Should be fixed now. The currently running build looks poised to complete successfully. |
@ararslan Thanks so much! Pending CI passing once more and the (github outage related) JuliaLang/julia merge lockout being removed, I will go ahead and merge this. Hopefully later today. |
Congrats on this!! I'm excited to use these when I can! |
Fix uninitialized previous_exception bug introduced in #28878
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
Addition of NEWS and compat admonitions for important changes between Julia 1.0 and 1.1, including: - Custom .css-style for compat admonitions. - Information about compat annotations to CONTRIBUTING.md. - NEWS.md entry for PRs #30090, #30035, #30022, #29978, #29969, #29858, #29845, #29754, #29638, #29636, #29615, #29600, #29506, #29469, #29316, #29259, #29178, #29153, #29033, #28902, #28761, #28745, #28708, #28696, #29997, #28790, #29092, #29108, #29782 - Compat annotation for PRs #30090, #30013, #29978, #29890, #29858, #29827, #29754, #29679, #29636, #29623, #29600, #29440, #29316, #29259, #29178, #29157, #29153, #29033, #28902, #28878, #28761, #28708, #28156, #29733, #29670, #29997, #28790, #29092, #29108, #29782, #25278 - Documentation for broadcasting CartesianIndices (#30230). - Documentation for Base.julia_cmd(). - Documentation for colon constructor of CartesianIndices (#29440). - Documentation for ^(::Matrix, ::Number) and ^(::Number, ::Matrix). - Run NEWS-update.jl. Co-authored-by: Morten Piibeleht <morten.piibeleht@gmail.com> Co-authored-by: Fredrik Ekre <ekrefredrik@gmail.com>
An exception stack system for julia
Here's my attempt to fix #12485 / #19979 by introducing an exception stack
system. The basic problem is that you may throw a new exception while handling
an exception. What do we do then? The existing runtime simply overwrites the
first exception, loosing any information it might have had about the root
cause.
A distinct problem discussed in #12485 is that task switching looses all
exception and backtrace information. I've tackled this at the same time
though in hindsight it might have been better to do the exception stack
first, followed by fixing task switching.
Design
To solve the first problem, I've implemented an exception stack mechanism.
Conceptually:
throw()
pushes an(exception,backtrace)
pair onto a stack of currentexceptions.
catch
block normally, any exception generated within thecorresponding
try
is considered to have been handled. The stack is thereforepopped back to the state at the entry of the corresponding
try
.Base.catch_stack()
.For example:
Produces
Implementation
Lowering
A new lowered expression head
:pop_exc
is introduced and emitted in anylocation where a catch block exits normally (either by falling out the bottom,
or using return, break, goto). This pops the exception stack back to the state
of the associated
enter
. There needs to be some way to track the associatedenter
in the IR, so I chose to emit an SSAValue token fromenter
which isconsumed by
pop_exc
. I tried various alternatives, but this was by far thenicest in terms of non-disruptive integration into the SSAIR processing code,
and supporting both the interpreter and codegen.
finally
lowering no longer includes aforeigncall
, so expressions usingfinally can now be interpreted. This includes the output of
@test
so maygive a nice speed boost.
Runtime
The exception in transit is no longer a single pointer in the thread local
storage. It's stored in
ptls->current_task->exc_stack
instead, along withassociated backtrace. Space for the exception and backtrace is allocated
on demand inside
throw_internal
, after changing GC mode.The exception stack is
copied out onto thenaturally preserved during task switching.Task
I've been back and forth on several variations here, as the exact detail
of how this is done depends on how the memory for the exception
stack is allocated (GC vs plain malloc).
jl_current_exception()
is now the method to retreive the current exceptionfrom within a
JL_CATCH
dynamic context._resetstkoflw()
handling is consolidated intojl_eh_restore_state
.Needs proper testing on windows.
Several places where manual restoration of the exception and backtrace was
done are no longer necessary because the stack system handles these
automatically.
An exception stack type,
jl_exc_stack_t
and associated manipulationfunctions has been added. Conceptually stores a stack of
(exception,backtrace) pairs. It's stored in a shared buffer so we can avoid
reallocating when throwing and catching the same exception or set of
exceptions repeatedly.
jl_eh_restore_state
has become non-inline. It seemed good to get this outof the public header.
jl_sig_throw
is now used withjl_throw_in_ctx
from signal handlers, tomake the special circumstances clear and to avoid conflation with rethrow
which now simply rethrows the existing exception stack.
Interpreter / Codegen
Some minor rearrangement in the interpreter to make the eh buffer management
and
leave
handling a little easier to understand as I found this surprisinglyconfusing.
Mostly small changes in codegen. Representing the enter token as an SSAValue
really integrates well here when you need to retreive it for the associated
pop_exc
.GC
What? How did this rabbit hole lead down into the GC?
Unfortunately the raw backtrace buffers contain references to GC allocated
objects. Since these buffers can now live on a
Task
, their contents need tobe GC rooted. Unfortunately the raw buffers are also in a non-julia native
format, so I had to write specialized GC code for them.
I'm not sure this was a good idea, especially having extra code to maintain in
this rather sensitive part of the codebase. Perhaps it would be better to
modify our raw backtrace buffer format so that it can be understood by the GC
without major changes.
TODO
Base.catch_stack()
API and export it._resetstkoflw
handling on windows (checked on cross compiled version via wine on ubuntu)bt_data
Task
s still works correctly and figure out how to access the full exception stack for those.Optional
Reconsider best practices for. Can be done separatelyrethrow(exc)
vs just throwing a new exceptionjl_apply_with_saved_exception_state
may be unnecessary nowjl_rethrow_other
is probably unnecessary nowis required forrethrow(ex)