Skip to content

Comments

Add a fault-tolerant commitlog replay mode for use in debugging#3887

Merged
gefjon merged 3 commits intomasterfrom
phoebe/fault-tolerant-replay-for-debugging
Dec 17, 2025
Merged

Add a fault-tolerant commitlog replay mode for use in debugging#3887
gefjon merged 3 commits intomasterfrom
phoebe/fault-tolerant-replay-for-debugging

Conversation

@gefjon
Copy link
Contributor

@gefjon gefjon commented Dec 16, 2025

Description of Changes

When debugging broken commitlogs, we want to inspect the whole commitlog, including the part after the first error.
This is in contrast with the way we want to replay in prod, where we'd rather get a hard error than an incorrect state.

This commit adds a new flag to commitlog replay, ErrorBehavior. The core crate passes ErrorBehavior::FailFast
when replaying commitlogs to reconstruct databases. Internal tooling (not in this repository) uses ErrorBehavior::Warn to print the entirety of a broken commitlog.

API and ABI breaking changes

Changes internal APIs only.

Expected complexity level and risk

1 - no change to behavior of SpacetimeDB.

Testing

None.

When debugging broken commitlogs, we want to inspect the whole commitlog,
including the part after the first error.
This is in contrast with the way we want to replay in prod,
where we'd rather get a hard error than an incorrect state.

This commit adds a new flag to commitlog replay, `ErrorBehavior`.
The `core` crate passes `ErrorBehavior::FailFast`
when replaying commitlogs to reconstruct databases.
Internal tooling (not in this repository) uses `ErrorBehavior::Warn`
to print the entirety of a broken commitlog.
@gefjon gefjon requested a review from kim December 16, 2025 16:15
@egormanga
Copy link
Contributor

@gefjon gefjon added this pull request to the merge queue Dec 17, 2025
Merged via the queue into master with commit 13614d7 Dec 17, 2025
40 of 42 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Dec 17, 2025
# Description of Changes

Based on #3887 . Review starting from commit 233b48c.

We've encountered a commitlog which includes inserts into `st_table`,
`st_column`, &c of the rows which describe `st_view`, `st_view_param`,
&c. This caused replay to fail, as those rows were already inserted
during bootstrapping,
so we got set-semantic duplicate errors. With this commit, we ignore
set-semantic duplicate errors when replaying a commitlog specifically
for rows in system tables which describe system tables.

We also have to do an additional fixup for sequences. This is described
in-depth in comments added at the relevant locations.

# API and ABI breaking changes

N/a

# Expected complexity level and risk

1 - I was careful not to swallow any errors which aren't obviously safe.

# Testing

- [x] Manually replayed commitlog which includes the above mentioned
inserts, got error prior to this commit, no error with this commit.
aasoni pushed a commit that referenced this pull request Feb 5, 2026
Based on #3887 . Review starting from commit 233b48c.

We've encountered a commitlog which includes inserts into `st_table`,
`st_column`, &c of the rows which describe `st_view`, `st_view_param`,
&c. This caused replay to fail, as those rows were already inserted
during bootstrapping,
so we got set-semantic duplicate errors. With this commit, we ignore
set-semantic duplicate errors when replaying a commitlog specifically
for rows in system tables which describe system tables.

We also have to do an additional fixup for sequences. This is described
in-depth in comments added at the relevant locations.

N/a

1 - I was careful not to swallow any errors which aren't obviously safe.

- [x] Manually replayed commitlog which includes the above mentioned
inserts, got error prior to this commit, no error with this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants