core: Restore calling the `on_panic` hook when a reducer call panics #2624

kim · 2025-04-16T08:02:38Z

PR #2550 removed performing the reducer call inside spawn_blocking,
thereby introducing the regression to never invoke the on_panic
callback.

This causes a database to continue accepting writes even if it is unable
to persist the transactions via its Durability impl. Instead, the
on_panic callback should remove the database from the
HostController, which renders this database unavailable, but does
not affect any other databases managed by the controller.

The behavior is fine in panic = abort environments, yet until we have
a better way to propagate durability errors as values to the
ModuleHost, we can't abort the server in multi-tenant environments.

Thus, restore the original behavior (using spawn instead of the more
expensive spawn_blocking).

API and ABI breaking changes

no

Expected complexity level and risk

1

Testing

I don't know how to write an automated test for this, because I don't know how
to produce a module wasm blob inside a test.

Instead, the behavior can be tested by modifying the code:

remove the panic hook in standalone main
modify the local durability impl to panic append_tx once
self.durable_tx_offset() returns, say > 10
deploy a module and run a number of reducer calls until the panic occurs
observe that subsequent calls return a server error

Note that standalone's lazy database instantiation logic will cause the database
to be re-started on each subsequent reducer call after the durability impl, only
to then fail the call. That's ok, because standalone actually aborts the server,
so we'll never do this in practice. In a replicated setting, the crash should
cause a new leader to be elected.

PR #2550 removed performing the reducer call inside `spawn_blocking`, thereby introducing the regression to never invoke the `on_panic` callback. This causes a database to continue accepting writes even if it is unable to persist the transactions via its `Durability` impl. Instead, the `on_panic` callback should remove the database from the `HostController`, which renders _this_ database unavailable, but does not affect any other databases managed by the controller. The behavior is fine in `panic = abort` environments, yet until we have a better way to propagate durability errors as values to the `ModuleHost`, we can't abort the server in multi-tenant environments. Thus, restore the original behavior (using `spawn` instead of the more expensive `spawn_blocking`).

kim requested review from coolreader18, gefjon and jsdt April 16, 2025 08:02

jsdt approved these changes Apr 16, 2025

View reviewed changes

kim added this pull request to the merge queue Apr 16, 2025

Merged via the queue into master with commit a3614ce Apr 16, 2025
15 checks passed

jsdt mentioned this pull request May 14, 2025

Call reducers on the blocking pool. #2737

Merged

renovate bot mentioned this pull request Jul 10, 2025

chore(deps): update all dependencies Enn3Developer/OpenPolytopia#45

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: Restore calling the `on_panic` hook when a reducer call panics #2624

core: Restore calling the `on_panic` hook when a reducer call panics #2624

Uh oh!

kim commented Apr 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

core: Restore calling the on_panic hook when a reducer call panics #2624

core: Restore calling the on_panic hook when a reducer call panics #2624

Uh oh!

Conversation

kim commented Apr 16, 2025

API and ABI breaking changes

Expected complexity level and risk

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

core: Restore calling the `on_panic` hook when a reducer call panics #2624

core: Restore calling the `on_panic` hook when a reducer call panics #2624