core: Restore calling the on_panic hook when a reducer call panics
#2624
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #2550 removed performing the reducer call inside
spawn_blocking,thereby introducing the regression to never invoke the
on_paniccallback.
This causes a database to continue accepting writes even if it is unable
to persist the transactions via its
Durabilityimpl. Instead, theon_paniccallback should remove the database from theHostController, which renders this database unavailable, but doesnot affect any other databases managed by the controller.
The behavior is fine in
panic = abortenvironments, yet until we havea better way to propagate durability errors as values to the
ModuleHost, we can't abort the server in multi-tenant environments.Thus, restore the original behavior (using
spawninstead of the moreexpensive
spawn_blocking).API and ABI breaking changes
Expected complexity level and risk
1
Testing
I don't know how to write an automated test for this, because I don't know how
to produce a module wasm blob inside a test.
Instead, the behavior can be tested by modifying the code:
append_txonceself.durable_tx_offset()returns, say> 10Note that standalone's lazy database instantiation logic will cause the database
to be re-started on each subsequent reducer call after the durability impl, only
to then fail the call. That's ok, because standalone actually aborts the server,
so we'll never do this in practice. In a replicated setting, the crash should
cause a new leader to be elected.